一键导入
ai-agent-development
AI agent architecture and development patterns including tool use, memory systems, planning loops, and multi-agent orchestration. Use when building AI agents, designing tool interfaces, or implementing agent evaluation.
菜单
AI agent architecture and development patterns including tool use, memory systems, planning loops, and multi-agent orchestration. Use when building AI agents, designing tool interfaces, or implementing agent evaluation.
Prompt engineering principles and techniques for LLM applications including system prompts, chain-of-thought, few-shot learning, and prompt evaluation. Use when designing prompts, optimizing LLM outputs, or building prompt pipelines.
API design principles for REST, GraphQL, and gRPC including versioning, pagination, error handling, and documentation. Use when designing new APIs, reviewing API contracts, or migrating between API styles.
System architecture design including requirements analysis, trade-off evaluation, ADRs, and system decomposition. Use when designing new systems, evaluating architectures, or documenting design decisions.
Business analysis expertise for translating business needs into technical requirements. Use when eliciting requirements from stakeholders, modeling business processes, writing functional specifications, performing gap analysis, defining data dictionaries, or creating acceptance test scenarios from business rules.
Product management analysis for engineering-informed decision framing. Use when a task needs product framing, feature prioritization based on user impact and engineering reality, scope control to prevent complexity creep, or structured now/next/later sequencing with explicit tradeoffs.
Scrum and agile facilitation expertise for engineering teams. Use when planning sprints, facilitating retrospectives, removing blockers, tracking velocity and burndown, improving team processes, or coaching teams on agile principles and practices.
| name | ai-agent-development |
| description | AI agent architecture and development patterns including tool use, memory systems, planning loops, and multi-agent orchestration. Use when building AI agents, designing tool interfaces, or implementing agent evaluation. |
| summary_l0 | Build AI agents with tool use, memory, planning loops, and multi-agent orchestration |
| overview_l1 | This skill provides comprehensive architecture and implementation guidance for building AI agents that reason, plan, use tools, and collaborate. Use it when designing agent architectures (ReAct, plan-and-execute, reflection), implementing tool-use patterns with function calling or MCP, building memory systems (short-term, long-term, episodic, semantic), creating planning and reasoning loops, orchestrating multi-agent systems, or adding guardrails and evaluation. Key capabilities include architecture pattern selection, tool integration schemas, memory system design, planning loop implementation, multi-agent topologies (supervisor, peer-to-peer, pipeline), guardrail enforcement, trajectory-based evaluation, and observability instrumentation. The expected output is production-grade agent code with structured logging, error handling, and cost tracking. Trigger phrases: build an agent, agent architecture, tool use, function calling, agent memory, planning loop, multi-agent, agent orchestration, agent evaluation, agent guardrails, ReAct pattern, MCP server. |
Comprehensive patterns and implementation guidance for building AI agents that reason, plan, use tools, and collaborate. Covers the full spectrum from single-agent architectures to multi-agent orchestration, with production-grade error handling, observability, and evaluation.
Effort-level note: when agents run inside loops or parallel fan-out, default the
effortLeveltohigh, notmax. Iterative and concurrent runs compound effort-level cost without matching quality gains. See the Effort-Level Strategy section of prompt-engineering/SKILL.md for the decision table. A full Opus 4.7 Anti-Patterns table for agent workloads is maintained alongside this skill.
Use this skill for:
Trigger phrases: "build an agent", "agent architecture", "tool use", "function calling", "agent memory", "planning loop", "multi-agent", "agent orchestration", "agent evaluation", "agent guardrails", "ReAct pattern", "MCP server"
Provides agent development expertise including:
Select an architecture based on task complexity, latency requirements, and reliability needs.
Architecture Comparison:
| Architecture | Best For | Latency | Reliability | Complexity |
|---|---|---|---|---|
| ReAct | Simple tool-use tasks | Low | Medium | Low |
| Plan-and-Execute | Multi-step workflows | Medium | High | Medium |
| Reflection | Quality-critical outputs | High | High | Medium |
| Multi-Agent | Complex domain tasks | High | Variable | High |
ReAct (Reason + Act) Pattern:
The agent interleaves reasoning and action in a loop: think about what to do, execute a tool, observe the result, repeat.
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "search_codebase",
"description": (
"Search the codebase for files matching a regex pattern. "
"Returns file paths and matching line contents. "
"Use when looking for function definitions, imports, or string patterns."
),
"input_schema": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Regex pattern to search for."
},
"file_glob": {
"type": "string",
"description": "Optional glob to filter files (e.g., '*.py').",
"default": "*"
}
},
"required": ["pattern"]
}
},
{
"name": "read_file",
"description": (
"Read the full contents of a file by path. "
"Use when you need to examine a specific file found via search."
),
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or workspace-relative file path."
}
},
"required": ["path"]
}
}
]
def run_react_agent(user_query: str, max_turns: int = 10) -> str:
"""Run a ReAct agent loop with Claude."""
messages = [{"role": "user", "content": user_query}]
for turn in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=(
"You are a code analysis agent. Think step-by-step about what "
"information you need, use tools to gather it, then provide "
"your analysis. Always explain your reasoning before acting."
),
tools=tools,
messages=messages,
)
# If the model wants to use tools, execute them
if response.stop_reason == "tool_use":
# Append the assistant's response (contains tool_use blocks)
messages.append({"role": "assistant", "content": response.content})
# Execute each tool call and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
else:
# Model produced a final text response
return extract_text(response.content)
return "Agent reached maximum turn limit without completing."
Plan-and-Execute Pattern:
Separate planning from execution. The planner decomposes the goal into steps; the executor handles each step independently.
from dataclasses import dataclass, field
@dataclass
class Plan:
goal: str
steps: list[str] = field(default_factory=list)
completed: list[int] = field(default_factory=list)
results: dict[int, str] = field(default_factory=dict)
def create_plan(goal: str) -> Plan:
"""Use the LLM to decompose a goal into executable steps."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=(
"You are a planning agent. Decompose the user's goal into "
"a numbered list of concrete, independent steps. Each step "
"should be actionable with available tools. Output ONLY the "
"numbered list, nothing else."
),
messages=[{"role": "user", "content": goal}],
)
steps = parse_numbered_list(extract_text(response.content))
return Plan(goal=goal, steps=steps)
def execute_plan(plan: Plan) -> str:
"""Execute each step, replanning if a step fails."""
for i, step in enumerate(plan.steps):
if i in plan.completed:
continue
result = run_react_agent(
f"Execute this step: {step}\n\n"
f"Context from previous steps:\n{format_results(plan.results)}"
)
if is_failure(result):
# Replan from this point forward
revised = replan(plan, i, result)
plan.steps = plan.steps[:i] + revised
continue
plan.results[i] = result
plan.completed.append(i)
return synthesize_results(plan)
Tools are the agent's interface to the external world. Design them for LLM consumption (see the tool-design skill for detailed guidance).
Function Calling Schema Best Practices:
# Good: Specific, documented, constrained
file_edit_tool = {
"name": "edit_file",
"description": (
"Replace a specific string in a file with new content. "
"The old_string must appear exactly once in the file. "
"Use when you need to modify existing code. "
"Do NOT use for creating new files (use write_file instead)."
),
"input_schema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Absolute path to the file to edit."
},
"old_string": {
"type": "string",
"description": "The exact text to find and replace. Must be unique in the file."
},
"new_string": {
"type": "string",
"description": "The replacement text. Must differ from old_string."
}
},
"required": ["file_path", "old_string", "new_string"]
}
}
MCP Server Integration:
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def connect_mcp_server(command: str, args: list[str]) -> ClientSession:
"""Connect to an MCP server and return a session."""
server_params = StdioServerParameters(command=command, args=args)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# List available tools
tools_response = await session.list_tools()
for tool in tools_response.tools:
print(f" {tool.name}: {tool.description}")
return session
async def call_mcp_tool(session: ClientSession, name: str, arguments: dict):
"""Call a tool on the MCP server and return the result."""
result = await session.call_tool(name, arguments=arguments)
return result.content
Tool Execution with Error Handling:
import json
import traceback
def execute_tool(name: str, arguments: dict) -> str:
"""Execute a tool call with structured error handling."""
try:
if name == "search_codebase":
return search_codebase(**arguments)
elif name == "read_file":
return read_file(**arguments)
elif name == "edit_file":
return edit_file(**arguments)
else:
return json.dumps({
"error": f"Unknown tool: {name}",
"available_tools": ["search_codebase", "read_file", "edit_file"],
"suggestion": "Check the tool name and try again."
})
except FileNotFoundError as e:
return json.dumps({
"error": f"File not found: {e}",
"suggestion": "Use search_codebase to find the correct file path."
})
except PermissionError as e:
return json.dumps({
"error": f"Permission denied: {e}",
"recoverable": False,
"suggestion": "This file cannot be modified. Ask the user for guidance."
})
except Exception as e:
return json.dumps({
"error": f"{type(e).__name__}: {e}",
"traceback": traceback.format_exc(),
"suggestion": "An unexpected error occurred. Try a different approach."
})
Agents need memory to maintain context across turns, learn from past interactions, and recall relevant information.
Memory Type Overview:
| Type | Scope | Storage | Use Case |
|---|---|---|---|
| Working | Current conversation | Message history | Immediate context |
| Episodic | Past interactions | Database | Recall similar tasks |
| Semantic | Domain knowledge | Vector store | Fact retrieval |
| Procedural | Learned workflows | Key-value store | Skill reuse |
Working Memory (Conversation Buffer with Summarization):
from dataclasses import dataclass
@dataclass
class WorkingMemory:
messages: list[dict]
max_tokens: int = 100_000
summary: str = ""
def add(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
if self.estimate_tokens() > self.max_tokens * 0.8:
self.compact()
def compact(self):
"""Summarize older messages to free context space."""
# Keep the most recent messages intact
keep_recent = 6
old_messages = self.messages[:-keep_recent]
recent_messages = self.messages[-keep_recent:]
summary_prompt = (
f"Previous summary: {self.summary}\n\n"
f"New messages to summarize:\n"
f"{format_messages(old_messages)}\n\n"
"Produce a concise summary preserving key decisions, "
"findings, and action items."
)
self.summary = call_llm_for_summary(summary_prompt)
self.messages = recent_messages
def get_context(self) -> list[dict]:
"""Return messages with summary prepended if available."""
if self.summary:
summary_msg = {
"role": "user",
"content": f"[Context from earlier in this conversation]\n{self.summary}"
}
return [summary_msg] + self.messages
return self.messages
def estimate_tokens(self) -> int:
return sum(len(m["content"]) // 4 for m in self.messages)
Long-Term Episodic Memory:
import hashlib
from datetime import datetime
class EpisodicMemory:
"""Store and recall past agent interactions by similarity."""
def __init__(self, vector_store, embedding_model):
self.store = vector_store
self.embedder = embedding_model
def record_episode(self, task: str, trajectory: list[dict], outcome: str):
"""Save a completed task episode for future recall."""
episode = {
"task": task,
"trajectory_summary": summarize_trajectory(trajectory),
"outcome": outcome,
"timestamp": datetime.utcnow().isoformat(),
"id": hashlib.sha256(f"{task}{datetime.utcnow()}".encode()).hexdigest()[:16],
}
embedding = self.embedder.embed(f"{task} {outcome}")
self.store.upsert(id=episode["id"], vector=embedding, metadata=episode)
def recall(self, current_task: str, top_k: int = 3) -> list[dict]:
"""Recall past episodes similar to the current task."""
query_embedding = self.embedder.embed(current_task)
results = self.store.query(vector=query_embedding, top_k=top_k)
return [
{
"task": r.metadata["task"],
"approach": r.metadata["trajectory_summary"],
"outcome": r.metadata["outcome"],
"similarity": r.score,
}
for r in results
]
Reflection Pattern (Self-Critique Loop):
def reflect_and_improve(task: str, max_iterations: int = 3) -> str:
"""Generate output, critique it, and iterate until quality threshold is met."""
draft = generate_initial_response(task)
for iteration in range(max_iterations):
critique = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[
{
"role": "user",
"content": (
f"Task: {task}\n\n"
f"Current output:\n{draft}\n\n"
"Critique this output. Identify specific problems with:\n"
"1. Correctness: Are there factual or logical errors?\n"
"2. Completeness: Is anything missing?\n"
"3. Quality: Could the structure or clarity improve?\n\n"
"If the output is satisfactory, respond with ONLY 'APPROVED'.\n"
"Otherwise, list specific improvements needed."
),
}
],
)
critique_text = extract_text(critique.content)
if "APPROVED" in critique_text:
return draft
# Revise based on critique
revision = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": (
f"Task: {task}\n\n"
f"Previous output:\n{draft}\n\n"
f"Critique:\n{critique_text}\n\n"
"Revise the output to address every critique point. "
"Produce the complete revised version."
),
}
],
)
draft = extract_text(revision.content)
return draft
Goal Decomposition with Dependency Tracking:
@dataclass
class TaskNode:
id: str
description: str
dependencies: list[str] = field(default_factory=list)
status: str = "pending" # pending | running | completed | failed
result: str | None = None
def build_task_graph(goal: str) -> list[TaskNode]:
"""Decompose a goal into a dependency-aware task graph."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=(
"Decompose the goal into tasks. For each task, specify an ID, "
"description, and list of dependency IDs (tasks that must complete "
"first). Output as JSON array."
),
messages=[{"role": "user", "content": goal}],
)
raw_tasks = json.loads(extract_text(response.content))
return [TaskNode(**t) for t in raw_tasks]
def execute_task_graph(tasks: list[TaskNode]) -> dict[str, str]:
"""Execute tasks respecting dependency order."""
results = {}
while any(t.status == "pending" for t in tasks):
ready = [
t for t in tasks
if t.status == "pending"
and all(
dep_task.status == "completed"
for dep_task in tasks
if dep_task.id in t.dependencies
)
]
for task in ready:
task.status = "running"
context = {d: results[d] for d in task.dependencies if d in results}
task.result = run_react_agent(
f"Execute: {task.description}\nContext: {json.dumps(context)}"
)
task.status = "completed"
results[task.id] = task.result
return results
Supervisor Pattern (Hub-and-Spoke):
@dataclass
class AgentConfig:
name: str
system_prompt: str
tools: list[dict]
model: str = "claude-sonnet-4-20250514"
class SupervisorOrchestrator:
"""A supervisor agent delegates tasks to specialist agents."""
def __init__(self, specialists: list[AgentConfig]):
self.specialists = {s.name: s for s in specialists}
def run(self, goal: str) -> str:
"""Supervisor decomposes goal and delegates to specialists."""
specialist_descriptions = "\n".join(
f"- {s.name}: {s.system_prompt[:100]}..."
for s in self.specialists.values()
)
# Supervisor decides which specialists to invoke and in what order
plan_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=(
"You are a supervisor agent. You delegate tasks to specialists.\n"
f"Available specialists:\n{specialist_descriptions}\n\n"
"For the given goal, output a JSON array of delegation steps:\n"
'[{"specialist": "name", "task": "what to do", "depends_on": []}]'
),
messages=[{"role": "user", "content": goal}],
)
delegations = json.loads(extract_text(plan_response.content))
results = {}
for step in delegations:
specialist = self.specialists[step["specialist"]]
context = {d: results[d] for d in step.get("depends_on", []) if d in results}
result = self._run_specialist(
specialist, step["task"], context
)
results[step["specialist"]] = result
return self._synthesize(goal, results)
def _run_specialist(
self, config: AgentConfig, task: str, context: dict
) -> str:
"""Run a single specialist agent."""
messages = [
{
"role": "user",
"content": f"Task: {task}\nContext: {json.dumps(context)}",
}
]
response = client.messages.create(
model=config.model,
max_tokens=4096,
system=config.system_prompt,
tools=config.tools,
messages=messages,
)
return extract_text(response.content)
def _synthesize(self, goal: str, results: dict) -> str:
"""Combine specialist results into a final answer."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": (
f"Original goal: {goal}\n\n"
f"Specialist results:\n{json.dumps(results, indent=2)}\n\n"
"Synthesize these results into a coherent final answer."
),
}
],
)
return extract_text(response.content)
Input Validation Layer:
@dataclass
class GuardrailResult:
allowed: bool
reason: str = ""
modified_input: str | None = None
class AgentGuardrails:
"""Safety layer wrapping agent execution."""
def __init__(self, max_tool_calls: int = 50, max_cost_usd: float = 1.0):
self.max_tool_calls = max_tool_calls
self.max_cost_usd = max_cost_usd
self.tool_call_count = 0
self.estimated_cost = 0.0
self.blocked_patterns = [
r"rm\s+-rf\s+/", # Destructive filesystem commands
r"DROP\s+TABLE", # SQL injection attempts
r"curl.*\|\s*bash", # Remote code execution
]
def check_input(self, user_input: str) -> GuardrailResult:
"""Validate user input before agent processing."""
import re
for pattern in self.blocked_patterns:
if re.search(pattern, user_input, re.IGNORECASE):
return GuardrailResult(
allowed=False,
reason=f"Input contains blocked pattern: {pattern}"
)
return GuardrailResult(allowed=True)
def check_tool_call(self, tool_name: str, arguments: dict) -> GuardrailResult:
"""Validate a tool call before execution."""
self.tool_call_count += 1
if self.tool_call_count > self.max_tool_calls:
return GuardrailResult(
allowed=False,
reason=f"Tool call limit reached ({self.max_tool_calls})"
)
# Block destructive file operations outside workspace
if tool_name in ("edit_file", "write_file", "delete_file"):
path = arguments.get("file_path", "")
if not path.startswith(("/workspace/", "./", "src/")):
return GuardrailResult(
allowed=False,
reason=f"File operation outside workspace: {path}"
)
return GuardrailResult(allowed=True)
def check_output(self, output: str) -> GuardrailResult:
"""Validate agent output before returning to user."""
# Check for leaked secrets or sensitive data patterns
import re
sensitive_patterns = [
r"(?:api[_-]?key|secret|password|token)\s*[:=]\s*\S+",
r"sk-[a-zA-Z0-9]{20,}",
r"-----BEGIN (?:RSA )?PRIVATE KEY-----",
]
for pattern in sensitive_patterns:
if re.search(pattern, output, re.IGNORECASE):
return GuardrailResult(
allowed=False,
reason="Output may contain sensitive data. Redacting."
)
return GuardrailResult(allowed=True)
Evaluation Framework:
@dataclass
class EvalCase:
"""A single evaluation case for an agent."""
name: str
input: str
expected_output: str | None = None # For exact match
expected_contains: list[str] | None = None # For partial match
max_tool_calls: int = 20
max_seconds: float = 60.0
tags: list[str] = field(default_factory=list)
@dataclass
class EvalResult:
case_name: str
passed: bool
output: str
tool_calls: int
duration_seconds: float
failure_reason: str = ""
def run_evaluation(agent_fn, cases: list[EvalCase]) -> list[EvalResult]:
"""Run an evaluation suite against an agent function."""
import time
results = []
for case in cases:
start = time.time()
try:
output = agent_fn(case.input)
duration = time.time() - start
passed = True
reason = ""
if case.expected_output and output.strip() != case.expected_output.strip():
passed = False
reason = f"Expected '{case.expected_output}', got '{output[:200]}'"
if case.expected_contains:
missing = [s for s in case.expected_contains if s not in output]
if missing:
passed = False
reason = f"Output missing expected strings: {missing}"
if duration > case.max_seconds:
passed = False
reason = f"Timeout: {duration:.1f}s > {case.max_seconds}s"
results.append(EvalResult(
case_name=case.name,
passed=passed,
output=output[:500],
tool_calls=0, # Instrumented via guardrails
duration_seconds=duration,
failure_reason=reason,
))
except Exception as e:
results.append(EvalResult(
case_name=case.name,
passed=False,
output="",
tool_calls=0,
duration_seconds=time.time() - start,
failure_reason=str(e),
))
return results
def print_eval_report(results: list[EvalResult]):
"""Print a summary report of evaluation results."""
passed = sum(1 for r in results if r.passed)
total = len(results)
print(f"\nAgent Evaluation: {passed}/{total} passed ({100*passed/total:.0f}%)\n")
for r in results:
status = "PASS" if r.passed else "FAIL"
print(f" [{status}] {r.case_name} ({r.duration_seconds:.1f}s)")
if not r.passed:
print(f" Reason: {r.failure_reason}")
Structured Logging and Tracing:
import logging
import uuid
from contextvars import ContextVar
from functools import wraps
trace_id_var: ContextVar[str] = ContextVar("trace_id", default="")
logger = logging.getLogger("agent")
def new_trace() -> str:
"""Start a new trace and return its ID."""
tid = uuid.uuid4().hex[:12]
trace_id_var.set(tid)
return tid
def traced(func):
"""Decorator that adds trace context to log messages."""
@wraps(func)
def wrapper(*args, **kwargs):
trace_id = trace_id_var.get()
logger.info(
"step_start",
extra={
"trace_id": trace_id,
"step": func.__name__,
"args_summary": str(args)[:200],
},
)
try:
result = func(*args, **kwargs)
logger.info(
"step_end",
extra={
"trace_id": trace_id,
"step": func.__name__,
"result_summary": str(result)[:200],
},
)
return result
except Exception as e:
logger.error(
"step_error",
extra={
"trace_id": trace_id,
"step": func.__name__,
"error": str(e),
},
)
raise
return wrapper
@traced
def agent_step(task: str) -> str:
"""Example instrumented agent step."""
return run_react_agent(task)
Cost Tracking:
@dataclass
class UsageTracker:
"""Track token usage and estimated cost across agent runs."""
input_tokens: int = 0
output_tokens: int = 0
tool_calls: int = 0
# Approximate pricing per million tokens (adjust to current rates)
INPUT_COST_PER_M = 3.0
OUTPUT_COST_PER_M = 15.0
def record(self, response):
"""Record usage from an API response."""
self.input_tokens += response.usage.input_tokens
self.output_tokens += response.usage.output_tokens
self.tool_calls += sum(
1 for b in response.content if getattr(b, "type", "") == "tool_use"
)
@property
def estimated_cost(self) -> float:
return (
(self.input_tokens / 1_000_000) * self.INPUT_COST_PER_M
+ (self.output_tokens / 1_000_000) * self.OUTPUT_COST_PER_M
)
def summary(self) -> str:
return (
f"Tokens: {self.input_tokens:,} in / {self.output_tokens:,} out | "
f"Tool calls: {self.tool_calls} | "
f"Est. cost: ${self.estimated_cost:.4f}"
)
tool-design skill)When a tool call fails due to transient errors, retry with exponential backoff rather than immediately asking the user.
import time
def retry_tool_call(name: str, arguments: dict, max_retries: int = 3) -> str:
"""Retry a tool call with exponential backoff."""
for attempt in range(max_retries):
result = execute_tool(name, arguments)
parsed = json.loads(result) if result.startswith("{") else {"output": result}
if "error" not in parsed or not parsed.get("recoverable", True):
return result
wait = 2 ** attempt
logger.warning(f"Tool {name} failed (attempt {attempt+1}), retrying in {wait}s")
time.sleep(wait)
return result # Return last result even if failed
For high-stakes actions, pause and request confirmation before executing.
def human_in_the_loop(action_description: str, risk_level: str) -> bool:
"""Request human confirmation for risky actions."""
if risk_level == "low":
return True # Auto-approve low-risk actions
print(f"\n--- Agent requests approval ---")
print(f"Action: {action_description}")
print(f"Risk level: {risk_level}")
response = input("Approve? [y/N]: ").strip().lower()
return response == "y"
Use one agent as a tool for another, creating hierarchical agent systems.
research_tool = {
"name": "deep_research",
"description": (
"Perform in-depth research on a technical topic. "
"Returns a detailed analysis with sources. "
"Use for questions requiring multiple search steps."
),
"input_schema": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The research question to investigate."
}
},
"required": ["question"]
}
}
def execute_research_agent(question: str) -> str:
"""Dedicated research sub-agent with web search tools."""
return run_react_agent(
f"Research this thoroughly and provide a detailed answer: {question}",
)
| Rationalization | Reality |
|---|---|
| "We don't need turn limits — the agent will stop when it's done" | Agents in infinite loops or with misconfigured tools have run for hours and incurred thousands of dollars in API costs before operators noticed; hard turn and cost limits are non-negotiable safety mechanisms, not optional guardrails. |
| "Guardrails slow down the agent and hurt task performance" | Guardrails that reject invalid tool calls prevent irreversible side effects (deleting production data, sending emails to real users) that no subsequent action can undo; the performance cost of validation is orders of magnitude cheaper than remediation. |
| "We'll add observability after we validate the agent works" | Agent failures in production are non-deterministic and context-dependent; without structured logs and trace IDs from the first deployment, diagnosing why the agent chose an unexpected action path is practically impossible. |
| "A single monolithic agent is simpler than multi-agent orchestration" | A single agent with 50+ tools saturates the context window with tool descriptions, degrading routing accuracy; multi-agent systems with 5-10 focused tools per agent consistently outperform monolithic agents on complex tasks. |
| "Memory is optional for agents that run on short tasks" | Short-task agents that interact with the same user across sessions without memory force users to re-explain context every time; compounding user friction is the primary reason real-world agent adoption stalls. |
| "We'll handle planning failures by restarting the agent" | Restart without replanning repeats the same failure; agents need explicit failure detection and a replanning step that incorporates the failure as new context, not a blind retry of the same plan. |
Three anti-patterns that specifically hurt Opus 4.7 agent workloads. These are distinct from the Common Rationalizations above: they are patterns that feel correct and used to be correct for Opus 4.6, but compound cost without matching quality gains on 4.7.
| Anti-pattern | Why it feels right | Why it's wrong in Opus 4.7 | What to do instead |
|---|---|---|---|
Fixed thinking budgets (max_thinking_tokens=20000) | Predictable cost per turn; easy to reason about. | Opus 4.7 scales thinking adaptively per turn. A fixed budget truncates reasoning on hard turns and wastes budget on easy ones. | Omit max_thinking_tokens; set effortLevel instead. Drop one tier (e.g., xhigh -> high) for cost control. |
| Excessive tool-calling as "thorough investigation" | "The agent looked at everything, so the answer is well-grounded." | Opus 4.7 prefers reasoning; too many tool calls burn tokens without adding signal and can crowd out the actual problem solving. | Reason first; invoke tools only when external state is needed. Explicit tool-invocation prompts beat implicit "check everything." |
max effort level on extended runs (loop-operator, temporal workflows, multi-iteration agents) | "Best results always." | max compounds per iteration - 20 loop turns at max can 2-3x cost vs xhigh with no observable quality win on routine subtasks. | Default xhigh for runs; reserve max for single hard one-shot analyses. For parallel fan-out, de-escalate to high. |
Cross-links:
Version: 1.0.0 Last Updated: March 2026
This skill is optimized for an iterative approach: