with one click
model-routing-patterns
// Multi-model pipelines (Haiku/Sonnet/Opus): cost routing, escalation, fallback chains. Triggers: model routing, Haiku, Sonnet, Opus, escalation, fallback chain.
// Multi-model pipelines (Haiku/Sonnet/Opus): cost routing, escalation, fallback chains. Triggers: model routing, Haiku, Sonnet, Opus, escalation, fallback chain.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | model-routing-patterns |
| description | Multi-model pipelines (Haiku/Sonnet/Opus): cost routing, escalation, fallback chains. Triggers: model routing, Haiku, Sonnet, Opus, escalation, fallback chain. |
| effort | medium |
| user-invocable | false |
| allowed-tools | Read |
Three Claude tiers. Using Opus for everything is 10-40x more expensive than it needs to be. Using Haiku for everything loses accuracy on hard tasks. The craft is routing.
| Model | Latency | Cost (rel.) | Strengths | When |
|---|---|---|---|---|
| Haiku 4.5 | Fastest | 1x | Classification, extraction, simple tools, moderation | Bulk processing, triage, labels |
| Sonnet 4.6 | Medium | 3-5x | General coding, reasoning, most agent tasks | Default workhorse |
| Opus 4.7 | Slowest | 15-30x | Complex reasoning, orchestration, architecture, large context | Hard, rare, high-stakes |
Ratios are approximate and shift between releases. Re-check pricing before committing a production path.
Cheap model classifies the request, then routes to the right tier:
def route(user_message: str) -> str:
complexity = classify_with_haiku(user_message) # returns: simple | medium | hard
return {"simple": "haiku", "medium": "sonnet", "hard": "opus"}[complexity]
Good when ~60% of traffic is simple. Overhead: one Haiku call per request (~100 tokens).
Try the cheap model first, escalate only when it hesitates:
def solve(problem: str):
haiku = call_haiku(problem)
if haiku.confidence > 0.85:
return haiku.answer
sonnet = call_sonnet(problem + haiku.reasoning)
if sonnet.confidence > 0.8:
return sonnet.answer
return call_opus(problem)
Haiku must be prompted to output confidence (e.g. via tool-use structured output — see json-mode-patterns). Pure self-reported confidence is noisy; combine with a heuristic (output length, tool calls, hedging words).
Orchestrator reasons about the plan, workers execute atomic steps:
Opus (planner)
├── Haiku (extract_dates_from_doc_1)
├── Haiku (extract_dates_from_doc_2)
├── Haiku (extract_dates_from_doc_3)
└── Opus (synthesize all extractions into timeline)
Real example: /orchestrate in ai-toolkit runs Opus as planner, subagents (model per agent's frontmatter) as workers. See app/agents/*.md — each agent sets model: explicitly.
When primary is rate-limited or errors, degrade gracefully:
def call_with_fallback(messages):
for model in ["claude-opus-4-7", "claude-sonnet-4-6", "claude-haiku-4-5"]:
try:
return client.messages.create(model=model, messages=messages, ...)
except (RateLimitError, OverloadedError):
continue
raise AllModelsExhausted()
Useful in production, not for cost optimization — you lose quality on fallback.
Skip generic complexity scoring when you know the task type:
| Task | Route |
|---|---|
| Commit message from diff | Haiku |
| Summarize 5-10 lines | Haiku |
| Classify intent | Haiku |
| Fix a failing test | Sonnet |
| Write new feature | Sonnet |
| Code review, architecture decision | Opus |
| Multi-agent orchestration | Opus |
| Complex debugging across systems | Opus |
Encode this as a map in code, not a prompt.
| Anti-pattern | Consequence | Fix |
|---|---|---|
| Opus for everything | 10-40x bill | Start with Sonnet, measure, demote |
| Haiku for code review | Misses subtle bugs | Sonnet minimum for code quality |
| Router overhead > savings | Haiku classifier eats the margin | Skip router if >80% of traffic is one tier |
| Different prompts per tier | Maintenance nightmare | Same prompt, just swap model |
| No telemetry | Can't optimize | Log model + tokens + cost per request |
Track per-route:
Target: move the Pareto curve — cheaper at equal quality OR better at equal cost.
llm-ops-engineer agent — production routing strategyprompt-caching-patterns — stack caching on top of routingjson-mode-patterns — structured confidence from Haiku