jedanny

jedanny

Agent and Prompt Practice

What is an AI Agent System#

  • An Agent system is a type of system that can independently complete tasks. Unlike traditional software, it can execute workflows with a high degree of independence under user authorization.
  • The core of the Agent system lies in utilizing LLM for workflow management and decision-making, and it can dynamically select tools to interact with external systems as needed.

When to Build an AI Agent System#

  • Agent systems are suitable for complex, multi-step tasks that traditional automation methods struggle to handle, such as scenarios requiring complex decision-making, difficult-to-maintain rules, or reliance on unstructured data.
  • For example, in payment fraud analysis, an Agent system can assess context and identify suspicious activities like an experienced investigator.

Basics of AI Agent Design#

  • An Agent system consists of three core components: model (LLM), tools (external APIs), and instructions (clear behavioral guidelines).
  • Model selection should be based on a trade-off between task complexity, latency, and cost.
  • Tools should be standardized to allow flexible use across agent systems.
  • Instructions need to be clear and unambiguous to reduce ambiguity and improve decision quality.

Orchestration Patterns#

  • Single Agent System: A single model equipped with appropriate tools and instructions to execute workflows in a loop.
  • Multi-Agent System: Distributes workflows among multiple coordinated Agents, divided into "managed" (a central Agent coordinates multiple specialized Agents) and "decentralized" (multiple Agents hand off tasks to each other equally).

Safeguards#

  • Safeguards are key to ensuring the safe operation of Agent systems, managing data privacy risks and reputational risks.
  • Safeguards can include relevance classifiers, security classifiers, PII filters, content moderation, and tool security assessments.
  • The setup of safeguards should be based on actual risks and continuously adjusted as new vulnerabilities are discovered.

Implementation Recommendations#

  • Start with a single Agent system and gradually expand to a multi-Agent system.
  • Use flexible prompt templates to simplify maintenance and evaluation.
  • Implement human intervention mechanisms in the Agent system to address high-risk or failure situations.

Agent Prompting Practices#

openai Agent Prompting Best Practices

Persistence#

Tell the model this is a multi-turn interaction that needs to continue until the issue is resolved, not just answer once.

You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.

-> 

You are an intelligent assistant - please continue working until the user's issue is completely resolved, then end your turn. Only stop when you are sure the problem is solved.

Tool-calling#

Encourage the model to proactively use tools for queries when uncertain, rather than guessing.

If you are not sure about file content or codebase structure pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.

-> 

If you are unsure about file content or code structure, please use tools to query relevant information: absolutely do not guess or fabricate answers.

Planning#

Guide the model to plan before each tool call and reflect afterward. This allows the model to "articulate the thought process," enhancing problem-solving ability (without relying on reasoning models).

You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.

-> 

You must plan in detail before each function call and reflect on the results afterward. Do not rely solely on function calls to solve the entire problem.

Correct Approach to Tool Calling#

  • It is strongly recommended to use the API's tools field: stop manually inserting tool descriptions into prompts! Official tests show that passing tool definitions through the API can reduce errors and improve model performance (SWE-bench improvement of 2%).
  • Choose a good name and clearly state the purpose: tool names and descriptions should be clear, as should parameters, to help the model use tools correctly.
  • Place complex tool usage in examples: if the tool is complex, it’s best to create a dedicated "# Example" section in the system prompt to keep descriptions concise.
  • Start with overall requirements: list basic requirements under the title "Instructions" or "Response Rules."
  • Detail specifics point by point: use subheadings to explain specific behaviors in detail.
  • Clarify the order of steps: if a specific process is required, clearly mark it with an ordered list.
  • Debugging and optimization:
    • Check for contradictions between instructions.
    • Provide clear examples demonstrating the desired outcome.
    • Use caution with all caps, exclamation marks, and other emphasis techniques, as they may lead the model to over-focus on these points.

Common Pitfalls and Solutions#

  • Must XX trap: for example, forcing "tools must be called before every response" may lead to erratic tool calls. Solution: add clarification that "if information is insufficient, first ask the user."
  • Copying examples issue: the model may directly copy the examples you provide. Solution: clearly state "reference but not limited to these examples, adjust flexibly based on the situation."
  • Too much talk issue: sometimes the model outputs too much explanation or unnecessary formatting. Solution: explicitly request conciseness and specific formatting in the instructions.

Context Handling#

  • Capability boundaries: while basic capabilities are strong, performance may decline when retrieving a lot of information from vast data or performing complex reasoning requiring global information.
  • Best practices: repeat instructions at both the beginning and end of the context.
  • You can explicitly instruct the model whether it can only use the information you provide to answer or if it can combine it with its own knowledge base.

Thought Chain#

Guide the model to "think" like a human, breaking complex problems into smaller steps to solve simple thought chain instructions (without relying on reasoning models):

...First, think carefully step by step about what documents are needed to answer the query. Then, print out the TITLE and ID of each document. Then, format the IDs into a list.

-> 

...First, carefully think step by step about what documents are needed to answer the query. Then, list the TITLE and ID of each document. Finally, format the IDs into a list.

Advanced CoT: If you find the model's thought process deviating, you can standardize its thinking strategy with more specific instructions. For example, the guide provides an example requiring the model to first perform query analysis, then context analysis, and finally synthesis.

# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps (e.g., Chain of Thought instructions)
# Output Format
# Examples
## Example 1
# Context (if any)
# Final instructions and prompt to think step by step (e.g., the CoT starter)

-> 

# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps (e.g., Chain of Thought instructions)
# Output Format
# Examples
## Example 1
# Context (if any)
# Final instructions and prompt to think step by step

Separator Selection#

  • Preferred Markdown: headings, lists, code blocks, etc., are clear and intuitive.
  • XML is also good: suitable for precisely wrapping content, facilitating nesting.
  • JSON is relatively cumbersome: strong structure but may require escaping in prompts.
  • Long document scenarios: XML (<doc id=1 title=”...”>...</doc>) and tabular-like formats (ID: 1 | TITLE: ... | CONTENT: ...) work well, while JSON performs poorly.
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.