Use when encountering any bug, test failure, or unexpected behavior. Enforces a strict reproduce-first, root-cause-first, failing-test-first debugging workflow before fixing.
Orchestrates multi-day execution of complex tasks through milestones. Each milestone goes through plan-crafting, run-plan (worker-validator), and review-work phases with checkpoint/recovery. Triggers when the user says "long run", "start long run", "execute milestones", or "run all milestones".
Use when you have a written implementation plan to execute. Loads the plan, reviews critically, executes tasks in dependency order, and reports completion. Triggers when the user says "run the plan", "execute the plan", or "let's start implementing".
Corrective cleanup of AI-generated code — removes LLM-specific patterns while preserving behavior. Use when the user says "clean up", "deslop", "slop", "clean AI code", or when you spot LLM-generated code smells after any generation session.
Behavioral guardrails to prevent common LLM coding mistakes — enforces surgical changes, assumption verification, and scope discipline before and during implementation. Use when implementing features, modifying code, or when you notice yourself about to make changes without reading the existing code first.
Decomposes complex, multi-day tasks into optimized milestones using parallel reviewer agents (ultraplan). Spawns 5 independent reviewers that analyze the problem from different angles, then synthesizes their findings into a milestone dependency DAG. Triggers when the user says "plan milestones", "break this into milestones", "ultraplan", or when long-run harness needs milestone generation.
Use when a task's scope is clear and multi-step implementation is needed, before touching code. Triggered after clarification is complete, or when the user explicitly requests plan creation with a clear prompt.
Use when a user's request is vague, ambiguous, or underspecified. Launches an iterative Q&A loop to resolve ambiguity while a subagent explores the codebase in parallel. Outputs a clear, well-scoped context brief so the user can plan sharply. Triggers on "I want to...", "I need...", "let's build...", "can you help me...", "we should...", or any request where the full scope isn't immediately clear.