name	model-routing-patterns
description	Multi-model pipelines (Haiku/Sonnet/Opus): cost routing, escalation, fallback chains. Triggers: model routing, Haiku, Sonnet, Opus, escalation, fallback chain.
effort	medium
user-invocable	false
allowed-tools	Read

Model Routing Patterns

Three Claude tiers. Using Opus for everything is 10-40x more expensive than it needs to be. Using Haiku for everything loses accuracy on hard tasks. The craft is routing.

Model Characteristics (2026)

Model	Latency	Cost (rel.)	Strengths	When
Haiku 4.5	Fastest	1x	Classification, extraction, simple tools, moderation	Bulk processing, triage, labels
Sonnet 4.6	Medium	3-5x	General coding, reasoning, most agent tasks	Default workhorse
Opus 4.7	Slowest	15-30x	Complex reasoning, orchestration, architecture, large context	Hard, rare, high-stakes

Ratios are approximate and shift between releases. Re-check pricing before committing a production path.

Pattern 1 — Complexity Router (pre-classify)

Cheap model classifies the request, then routes to the right tier:

def route(user_message: str) -> str:
    complexity = classify_with_haiku(user_message)  # returns: simple | medium | hard
    return {"simple": "haiku", "medium": "sonnet", "hard": "opus"}[complexity]

Good when ~60% of traffic is simple. Overhead: one Haiku call per request (~100 tokens).

Pattern 2 — Confidence-Based Escalation

Try the cheap model first, escalate only when it hesitates:

def solve(problem: str):
    haiku = call_haiku(problem)
    if haiku.confidence > 0.85:
        return haiku.answer
    sonnet = call_sonnet(problem + haiku.reasoning)
    if sonnet.confidence > 0.8:
        return sonnet.answer
    return call_opus(problem)

Haiku must be prompted to output confidence (e.g. via tool-use structured output — see json-mode-patterns). Pure self-reported confidence is noisy; combine with a heuristic (output length, tool calls, hedging words).

Pattern 3 — Sub-agent Delegation (Opus orchestrates, Haiku workers)

Orchestrator reasons about the plan, workers execute atomic steps:

Opus (planner)
  ├── Haiku (extract_dates_from_doc_1)
  ├── Haiku (extract_dates_from_doc_2)
  ├── Haiku (extract_dates_from_doc_3)
  └── Opus (synthesize all extractions into timeline)

Real example: /orchestrate in ai-toolkit runs Opus as planner, subagents (model per agent's frontmatter) as workers. See app/agents/*.md — each agent sets model: explicitly.

Pattern 4 — Fallback Chain (resilience, not cost)

When primary is rate-limited or errors, degrade gracefully:

def call_with_fallback(messages):
    for model in ["claude-opus-4-7", "claude-sonnet-4-6", "claude-haiku-4-5"]:
        try:
            return client.messages.create(model=model, messages=messages, ...)
        except (RateLimitError, OverloadedError):
            continue
    raise AllModelsExhausted()

Useful in production, not for cost optimization — you lose quality on fallback.

Pattern 5 — Task-Specific Routing

Skip generic complexity scoring when you know the task type:

Task	Route
Commit message from diff	Haiku
Summarize 5-10 lines	Haiku
Classify intent	Haiku
Fix a failing test	Sonnet
Write new feature	Sonnet
Code review, architecture decision	Opus
Multi-agent orchestration	Opus
Complex debugging across systems	Opus

Encode this as a map in code, not a prompt.

Anti-patterns

Anti-pattern	Consequence	Fix
Opus for everything	10-40x bill	Start with Sonnet, measure, demote
Haiku for code review	Misses subtle bugs	Sonnet minimum for code quality
Router overhead > savings	Haiku classifier eats the margin	Skip router if >80% of traffic is one tier
Different prompts per tier	Maintenance nightmare	Same prompt, just swap model
No telemetry	Can't optimize	Log model + tokens + cost per request

Measuring

Track per-route:

Cost per request
Latency p50/p95
Quality score (human-labeled or auto-evaluated)
Escalation rate (how often you fell back to a bigger model)

Target: move the Pareto curve — cheaper at equal quality OR better at equal cost.

llm-ops-engineer agent — production routing strategy
prompt-caching-patterns — stack caching on top of routing
json-mode-patterns — structured confidence from Haiku
Anthropic cookbook: https://github.com/anthropics/claude-cookbooks — see "sub-agents" notebook

name	model-routing-patterns
description	Multi-model pipelines (Haiku/Sonnet/Opus): cost routing, escalation, fallback chains. Triggers: model routing, Haiku, Sonnet, Opus, escalation, fallback chain.
effort	medium
user-invocable	false
allowed-tools	Read