一键导入
agent-orchestration
Design multi-agent systems with robust tool interfaces, state management, and failure handling
菜单
Design multi-agent systems with robust tool interfaces, state management, and failure handling
Build UIs that work for all users including keyboard navigation, screen readers, and WCAG 2.2
Build ML systems with disciplined training, evaluation, deployment, and safety practices
Design APIs that are stable, ergonomic, and evolvable
Design systems at the right scale with explicit trade-off documentation
Design services that are reliable, observable, secure, and maintainable
Design CI/CD pipelines with fast feedback, quality gates, and reliable deployments
| name | agent-orchestration |
| description | Design multi-agent systems with robust tool interfaces, state management, and failure handling |
| difficulty | staff |
| domains | ["ai-ml"] |
Multi-agent systems fail loudly or silently. Loudly: an agent calls a tool that doesn't exist. Silently: an agent completes with a subtly wrong result and the orchestrator never notices. This skill designs agent systems that are auditable, recoverable, and deterministic about what succeeded and what failed.
Agents work best on well-scoped tasks with clear completion criteria. Avoid: "make the application better." Use: "fix all TypeScript type errors in src/components/." Ambiguous tasks produce ambiguous results.
Tools are the agent's API to the world. Each tool must have:
Every tool call must be logged: which tool, what inputs, what outputs, how long it took, did it succeed. This is non-negotiable — you cannot debug an agent you cannot observe.
Tools that create or modify state must be idempotent where possible. If an agent retries a tool call (due to failure), the second call must not create duplicate state.
Define: what does the agent do on each step? How does it decide it's done? What is the maximum number of steps? (Always set a maximum — unbounded loops are production incidents.)
If multiple agents coordinate: define exactly what one agent passes to the next. Use structured data, not natural language, for inter-agent communication. Natural language is lossy.
For each tool:
Define: retry strategy, escalation path, graceful degradation. "The agent will figure it out" is not a failure handling strategy.
For consequential actions (deleting data, sending emails, making payments): require human approval. Implement a confirmation step before execution.
Build a test harness that: replays tasks, verifies outputs, measures success rate, tracks which tools were called and in what order. Run it in CI.
"The agent is smart enough to handle edge cases" The agent has never seen your edge cases. Write tests for them.
"We don't need a maximum step limit" You always need a maximum step limit. Agents enter failure loops. Budget limits save you.
"Human-in-the-loop slows things down" For irreversible actions: slowing down is the feature, not a bug.