with one click
claude-code-effort-models
Use when deciding how to work on a task - choosing effort level, model, or fast mode to balance reasoning depth, speed, and token cost.
Menu
Use when deciding how to work on a task - choosing effort level, model, or fast mode to balance reasoning depth, speed, and token cost.
Based on SOC occupation classification
Use when unsure which Claude Code capability fits a task, when about to single-thread large/cross-cutting/repetitive work, or when you or the user ask what Claude Code can do — loads the full capability map (signal → capability) and points to the deep-dive skills.
Use when building custom AI agents headlessly, embedding Claude Code tools in an app, or running scripted/cron agents—need SDK package names, setup, core API entry points, supported languages, and relation to Claude Code CLI and Messages...
Use when needing to understand or manage background agents, long-running tasks, polling, scheduling, or async work in Claude Code without blocking the session.
Use when you need to undo code edits, explore alternatives without losing a starting point, recover from a bad edit path, or restore previous conversation state — press Esc Esc or run /rewind to open checkpoints menu.
Use when a task is too big for one context — broad audits, multi-file migrations, multi-source research, or parallel multi-dimension review — and you want to fan out and orchestrate many subagents deterministically with the Workflow tool (requires ultracode / explicit opt-in).
Use when setting up or debugging Claude Code GitHub Actions automation, or deciding whether to route CI/CD tasks to GitHub Actions vs local session.
| name | claude-code-effort-models |
| description | Use when deciding how to work on a task - choosing effort level, model, or fast mode to balance reasoning depth, speed, and token cost. |
| user-invocable | true |
Quick reference for tuning how Claude Code works: model selection, effort levels, and speed mode. Read this mid-session when deciding how to approach a task.
Switch with /model (opens picker) or /model <name> to set directly. Persists to next session.
Available models:
opus → Claude Opus 4.8 (latest, strongest reasoning)sonnet → Claude Sonnet 4.6 (daily coding, balanced)haiku → Claude Haiku 4.5 (fast, simple tasks)best → currently opusopusplan → opus during planning, auto-switches to sonnet for executionopus[1m] / sonnet[1m] → same model with 1M token context window (long sessions only)Model IDs (full names):
claude-opus-4-8claude-sonnet-4-6claude-haiku-4-5-20251001When to pick:
Controls adaptive reasoning depth per message. Raise it for complex problems; lower it for routine tasks. Persistent across sessions unless overridden by env var.
Available levels (varies by model):
low — minimal thinking, fastest, cheapest. Use: latency-sensitive, low-complexity tasks.medium — lighter reasoning, cost-conscious work that trades some intelligence.high — default on Opus 4.8, Opus 4.6, Sonnet 4.6. Balances tokens and capability.xhigh — deeper reasoning, higher token spend. Default on Opus 4.7. Use: tricky architecture, intricate bugs.max — deepest reasoning, unbounded tokens, session-only. Can overthink; test first.Special: /effort ultracode (Opus only, session-only) sends xhigh to model AND orchestrates dynamic workflows for substantive tasks. Reserved for ambitious multi-phase work.
Usage:
/effort — open slider picker/effort high — set directly/effort auto — reset to model defaultCLAUDE_CODE_EFFORT_LEVEL=xhigheffort: xhighToken tradeoff: low < medium < high < xhigh < max. Each step costs more tokens but enables deeper reasoning for complex tasks.
Opus only. Same model quality, ~2.5x faster output, higher cost per token. Toggle with /fast or "fastMode": true in settings.json.
Pricing (per MTok):
How it works:
When to use:
Cost gotcha: enabling fast mode mid-conversation re-caches full history at fast-mode price. Enable at session start for best cost.
Requirements:
Model choice drives cost:
Effort and fast mode interact:
Prompt caching: a stable warm prefix re-reads at ~10% of input price. Keep the prefix stable (don't rewrite early messages mid-session). Switching /model mid-session invalidates the cache for the next turn.
Task is routine (format, search, simple edit)? → haiku, low effort, standard mode.
Task is typical coding (features, tests, refactors)? → sonnet, high effort (or medium to cut cost), standard mode.
Task is hard (architecture, complex bug, design)? → opus, xhigh effort, standard mode.
You need output in seconds, not minutes? → Use fast mode on Opus (higher cost, lower latency). Not a model change.
You need the deepest reasoning on an ambitious task? → opus, max effort, standard mode. Session-only, unbounded tokens.
ANTHROPIC_MODEL=<name> — set model for this session only.CLAUDE_CODE_EFFORT_LEVEL=<level> — effort level; overrides session choice.CLAUDE_CODE_DISABLE_FAST_MODE=1 — disable fast mode entirely.ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL — pin specific model versions (useful on Bedrock, Vertex, Foundry).