極限套娃,Agent自動設(shè)計Agentic系統(tǒng)!
Agent智能體系統(tǒng)正在作為通用工具被廣泛研究和應(yīng)用,解決復(fù)雜問題通常需要由多個組件組成的復(fù)合智能體系統(tǒng),而手工設(shè)計的解決方案最終會被學(xué)習(xí)到的更高效的解決方案所取代。
為此,提出了自動化設(shè)計智能體系統(tǒng)(ADAS:Automated Design of Agentic Systems,已開源)的新研究領(lǐng)域,目標是自動創(chuàng)建強大的智能體系統(tǒng)設(shè)計。
通過代碼定義整個智能體系統(tǒng),并由一個“元Agent”自動發(fā)現(xiàn)新的智能體,理論上允許ADAS算法發(fā)現(xiàn)任何可能的構(gòu)建塊和智能體系統(tǒng)。
元Agent搜索的概述以及發(fā)現(xiàn)的Agent示例。指導(dǎo)元Agent迭代地編程新代理,測試它們在任務(wù)上的性能,將它們添加到已發(fā)現(xiàn)Agent的存檔中,并使用這個存檔來通知后續(xù)迭代中的元Agent。展示了三次運行中的三個示例Agent,所有名稱都由元Agent生成。
自動化設(shè)計智能體系統(tǒng)(Automated Design of Agentic Systems):
ADAS的定義和目標
- ADAS旨在自動發(fā)明新的構(gòu)建塊,并設(shè)計功能強大的智能體系統(tǒng)。智能體系統(tǒng)涉及使用基礎(chǔ)模型(Foundation Models,簡稱FMs)作為模塊,通過規(guī)劃、使用工具和執(zhí)行多步驟的迭代處理來完成任務(wù)。
ADAS的三個關(guān)鍵組成部分
自動化智能體系統(tǒng)設(shè)計(ADAS)的三個關(guān)鍵組成部分。搜索空間決定了ADAS中可以表示哪些Agent系統(tǒng)。搜索算法指定了ADAS方法如何探索搜索空間。評估函數(shù)定義了如何根據(jù)目標目標(如性能)評估候選Agent。
- 搜索空間(Search Space):定義了ADAS中可以表示哪些智能體系統(tǒng)。例如,一些研究只變異智能體的文本提示,而其他組件(如控制流)保持不變。
- 搜索算法(Search Algorithm):指定了ADAS方法如何探索搜索空間。由于搜索空間通常非常大甚至無界,需要考慮探索與利用的權(quán)衡。
- 評估函數(shù)(Evaluation Function):根據(jù)ADAS算法的應(yīng)用,可能考慮不同的目標來優(yōu)化,如性能、成本、延遲或智能體的安全性。評估函數(shù)定義了如何在這些目標上評估候選智能體。
通過在編碼、科學(xué)和數(shù)學(xué)等多個領(lǐng)域的廣泛實驗,展示了該算法能夠逐步發(fā)明具有新穎設(shè)計的智能體,這些智能體的性能大大超過了手工設(shè)計的最先進智能體。
元智能體搜索在ARC挑戰(zhàn)上的結(jié)果。(a) 元智能體搜索基于不斷增長的先前發(fā)現(xiàn)的存檔,逐步發(fā)現(xiàn)高性能智能體。通過五次評估智能體,在保留的測試集上報告中位數(shù)準確度和95%的自舉置信區(qū)間。(b) 元智能體搜索在ARC挑戰(zhàn)上發(fā)現(xiàn)的最佳智能體的可視化。
來自ARC挑戰(zhàn)的一個示例任務(wù)。給定輸入-輸出網(wǎng)格示例,人工智能系統(tǒng)被要求學(xué)習(xí)轉(zhuǎn)換規(guī)則,然后將這些學(xué)到的規(guī)則應(yīng)用于測試網(wǎng)格,以預(yù)測最終答案。
Meta Agent Search與多個領(lǐng)域內(nèi)最先進的手工設(shè)計智能體之間的性能比較。Meta Agent Search在每個領(lǐng)域中發(fā)現(xiàn)的智能體都優(yōu)于基線。報告了在保留的測試集上的測試準確度和95%自舉置信區(qū)間。每個領(lǐng)域的搜索是獨立進行的。
將MGSM中的頂級智能體轉(zhuǎn)移到其他數(shù)學(xué)領(lǐng)域時的性能。元智能體搜索發(fā)現(xiàn)的智能體在不同數(shù)學(xué)領(lǐng)域中始終優(yōu)于基線。我們報告了測試準確度和95%自舉置信區(qū)間。頂級智能體的名稱由元智能體搜索生成。
附錄
Meta Agent系統(tǒng)Prompt
You are a helpful assistant. Make sure to return in a WELL-FORMED JSON object.
使用以下提示來指導(dǎo)元智能體基于先前發(fā)現(xiàn)的智能體存檔來設(shè)計新智能體。
Meta Agent核心Prompt
You are an expert machine learning researcher testing various agentic systems. Your objective is to design
building blocks such as prompts and control flows within these systems to solve complex tasks. Your aim
is to design an optimal agent performing well on [Brief Description of the Domain].
[Framework Code]
[Output Instructions and Examples]
[Discovered Agent Archive] (initialized with baselines, updated at every iteration)
# Your task
You are deeply familiar with prompting techniques and the agent works from the literature. Your goal is
to maximize the specified performance metrics by proposing interestingly new agents.
Observe the discovered agents carefully and think about what insights, lessons, or stepping stones can be
learned from them.
Be creative when thinking about the next interesting agent to try. You are encouraged to draw inspiration
from related agent papers or academic papers from other research areas.
Use the knowledge from the archive and inspiration from academic literature to propose the next
interesting agentic system design.
THINK OUTSIDE THE BOX.
使用以下提示來指導(dǎo)和格式化元智能體的輸出。在這里,收集并呈現(xiàn)了元智能體在提示中可能犯的一些常見錯誤,這在提高生成代碼的質(zhì)量方面是有效的。
輸出指令和示例
# Output Instruction and Example:
The first key should be (“thought”), and it should capture your thought process for designing the
next function. In the “thought” section, first reason about what the next interesting agent to try
should be, then describe your reasoning and the overall concept behind the agent design, and
finally detail the implementation steps. The second key (“name”) corresponds to the name of
your next agent architecture. Finally, the last key (“code”) corresponds to the exact “forward()”
function in Python code that you would like to try. You must write COMPLETE CODE in “code”:
Your code will be part of the entire project, so please implement complete, reliable, reusable code snippets.
Here is an example of the output format for the next agent:
{“thought”: “**Insights:** Your insights on what should be the next interesting agent. **Overall Idea:**
your reasoning and the overall concept behind the agent design. **Implementation:** describe the
implementation step by step.”,
“name”: “Name of your proposed agent”,
“code”: “def forward(self, taskInfo): # Your code here”}
## WRONG Implementation examples:
[Examples of potential mistakes the meta agent may make in implementation]
在元智能體的第一次響應(yīng)之后,進行兩輪自我反思,以使生成的智能體新穎且無錯誤。
自我反思第一輪的提示
[Generated Agent from Previous Iteration]
Carefully review the proposed new architecture and reflect on the following points:
1. **Interestingness**: Assess whether your proposed architecture is interesting or innovative compared
to existing methods in the archive. If you determine that the proposed architecture is not interesting,
suggest a new architecture that addresses these shortcomings.
- Make sure to check the difference between the proposed architecture and previous attempts.
- Compare the proposal and the architectures in the archive CAREFULLY, including their actual differences
in the implementation.
- Decide whether the current architecture is innovative.
- USE CRITICAL THINKING!
2. **Implementation Mistakes**: Identify any mistakes you may have made in the implementation.
Review the code carefully, debug any issues you find, and provide a corrected version. REMEMBER
checking "## WRONG Implementation examples" in the prompt.
3. **Improvement**: Based on the proposed architecture, suggest improvements in the detailed
implementation that could increase its performance or effectiveness. In this step, focus on refining and
optimizing the existing implementation without altering the overall design framework, except if you
want to propose a different architecture if the current is not interesting.
- Observe carefully about whether the implementation is actually doing what it is supposed to do.
- Check if there is redundant code or unnecessary steps in the implementation. Replace them with
effective implementation.
- Try to avoid the implementation being too similar to the previous agent.
And then, you need to improve or revise the implementation, or implement the new proposed architecture
based on the reflection.
Your response should be organized as follows:
"reflection": Provide your thoughts on the interestingness of the architecture, identify any mistakes in the
implementation, and suggest improvements.
"thought": Revise your previous proposal or propose a new architecture if necessary, using the same
format as the example response.
"name": Provide a name for the revised or new architecture. (Don’t put words like "new" or "improved"
in the name.)
"code": Provide the corrected code or an improved implementation. Make sure you actually implement
your fix and improvement in this code.
自我反思第二輪的提示
Using the tips in “## WRONG Implementation examples” section, further revise the code.
Your response should be organized as follows:
Include your updated reflections in the “reflection”. Repeat the previous “thought” and “name”. Update
the corrected version of the code in the “code” section.
當在執(zhí)行生成的代碼期間遇到錯誤時,會進行反思并重新運行代碼。如果錯誤持續(xù)存在,這個過程會重復(fù)進行,最多五次。以下是用于自我反思任何運行時錯誤的提示:
運行時錯誤發(fā)生時的自我反思提示
Error during evaluation:
[Runtime errors]
Carefully consider where you went wrong in your latest implementation. Using insights from previous
attempts, try to debug the current code to implement the same thought. Repeat your previous thought in
“thought”, and put your thinking for debugging in “debug_thought”.
https://arxiv.org/pdf/2408.08435
Automated Design of Agentic Systems
https://github.com/ShengranHu/ADAS
本文轉(zhuǎn)載自??PaperAgent??
