| name | build |
| description | ALWAYS invoke when the user wants to build, implement, or create a new feature, module, or component from scratch. Triggers: "build X", "implement Y", "add feature". When NOT to use: bug fixes (use debug), polishing existing code (use polish). |
Autonomous Development Pipeline (TDD)
Overview
Orchestrates a full test-driven development lifecycle — research, plan, write failing tests (RED), implement to pass them (GREEN), review, refactor, and verify — using specialized sub-agents. Takes a task description and produces reviewed, tested, green code with explicit user approval at the plan gate.
The pipeline enforces red-green-refactor: tests are written and verified failing before any implementation code is allowed. Coders make tests pass without modifying them; refactor happens only after the green bar is restored.
When to Use
- User asks to build, implement, or create a new feature/module/component.
- Triggers: "build X", "implement Y", "add feature Z", "develop".
- Work spans multiple files and benefits from a plan + review + test loop.
When NOT to use
- Fixing a known bug — use
debug.
- Making existing code meet quality criteria — use
polish.
- Read-only analysis or impact mapping — use
blast-radius or audit.
- One-line edits or trivial tweaks — just edit directly.
Input
$ARGUMENTS — The task description. If empty, ask the user what they want built.
Scope Parameter
See skills/shared/scope-parameter.md for the canonical scope modes (branch, global, working) and their git commands. Document any skill-specific overrides or restrictions below this line.
- Default:
global.
- Scope constrains edits only; researchers and verification commands may read the full project.
Constants
- MAX_REVIEW_ITERATIONS: 3 | MAX_RED_ITERATIONS: 2 | MAX_GREEN_ITERATIONS: 3 | TASK_DIR:
.mz/task/
Core Process
Phase Overview
| # | Phase | Reference | Loop? |
|---|
| 0 | Setup | inline below | — |
| 1 | Research | phases/research_and_planning.md | — |
| 2 | Planning + User Approval | phases/research_and_planning.md | plan review (max 3) |
| 3 | Test Writing (RED) | phases/testing.md | — |
| 4 | Test Review (3 parallel reviewers) | phases/testing.md | max 3 |
| 5 | Verify RED | phases/testing.md | max 2 |
| 6 | Implementation (GREEN, parallel waves) | phases/implementation_and_review.md | — |
| 7 | Code Review | phases/implementation_and_review.md | max 3 |
| 8 | Lint, Format, and Verify GREEN | phases/implementation_and_review.md | — |
| 9 | Final Code Review | phases/finalization.md | max 2 |
| 10 | Refactor (Optimization) | phases/finalization.md | max 2 |
| 11 | Completeness Check | phases/finalization.md | restart-from-phase |
Phase 0: Setup
Derive task name as <YYYY_MM_DD>_build_<slug> where <YYYY_MM_DD> is today's date (underscores) and slug is a snake_case summary (max 20 chars) of the description; on same-day collision append _v2, _v3. Create .mz/task/<task_name>/. Write state.md with Status, Phase, Started, Iterations. Use TaskCreate for per-phase tracking.
Then dispatch pipeline-tooling-detector to detect the project's test command, lint command, and formatter. Write to .mz/task/<task_name>/tooling.md. If pipeline-tooling-detector returns BLOCKED (no recognizable tooling), note it in state.md as tooling: not_detected and proceed — tooling failure is non-fatal at setup time.
Phase 1: Research
Gather codebase context, assess feasibility, compare 2-3 approaches in parallel. See phases/research_and_planning.md → Phase 1. Update state to research_complete.
Phase 2: Planning
Generate detailed plan, run plan-review loop, get user approval. See phases/research_and_planning.md → Phase 2.
2.3 User approval gate
This orchestrator (not a subagent) must present to the user via AskUserQuestion. This step is interactive and must not be delegated.
Mandatory pre-read: Read .mz/task/<task_name>/plan.md with the Read tool. Capture the full plan body (work units, test strategy, risks, verification criteria) into context. The plan must already have passed automated review before this gate fires.
Mandatory inline-verbatim presentation: The AskUserQuestion question body must contain the verbatim contents of plan.md. Never substitute a path, work-unit count, or <contents of plan.md> placeholder — the user must review the actual plan bytes in the question itself, not have to open the file separately.
Before invoking AskUserQuestion, emit a text block to the user:
**Plan ready for review**
The implementation plan passed automated review and is ready for your approval. It covers all phases, files affected, and estimated scope.
- **Approve** → proceed to Phase 3 (Test Writing — TDD/RED)
- **Reject** → task marked aborted, no files written
- **Feedback** → re-run planning with your input, loop back here
Use AskUserQuestion with:
The implementation plan is ready and passed review. Please review and approve:
<verbatim plan.md contents>
Type **Approve** to proceed, **Reject** to cancel, or type your feedback.
Response handling:
- "approve" → proceed to Phase 3.
- "reject" → update state to
aborted_by_user and stop. Do not proceed.
- Feedback → spawn
pipeline-planner with feedback, overwrite plan.md, re-present via AskUserQuestion. Do NOT re-run plan review — user's word is final. This is a loop — repeat until the user explicitly approves. Never proceed to Phase 3 without explicit approval.
Phase 3: Test Writing (RED)
Create tests before any implementation with pipeline-test-writer (model: opus). Tests assert the correct behavior the plan requires; since no production code exists yet, they will fail. See phases/testing.md → Phase 3.
Phase 4: Test Review
Spawn THREE review agents in parallel (model: sonnet): coverage, quality, code. The reviewers verify the tests cover every work unit from the plan and that they would catch real regressions. See phases/testing.md → Phase 4. Update state to test_review_passed.
Phase 5: Verify RED
Run the new tests and confirm they FAIL (or error with not implemented / missing-symbol errors). A test that passes against zero implementation is broken — it would not catch a regression. Re-dispatch pipeline-test-writer with the unexpectedly-passing test names if any pass. See phases/testing.md → Phase 5. Update state to red_verified.
Phase 6: Implementation (GREEN)
Parse work units into execution waves and dispatch parallel pipeline-coder agents (model: opus). Each coder is told the relevant failing tests and instructed to make them pass without modifying any test file. After each wave, update state with current_wave: N, per-coder STATUS, and cumulative files modified — required for safe resumption if context compacts. See phases/implementation_and_review.md → Phase 6. Update state to implementation_complete.
Phase 7: Code Review
Review with pipeline-code-reviewer (model: opus), iterate fixes up to 3 times. See phases/implementation_and_review.md → Phase 7. Update state to code_review_passed.
Phase 8: Lint, Format, and Verify GREEN
Detect tooling, run linters/formatters, then run the full test suite including the Phase 3 tests. All target tests must now pass; no pre-existing tests may regress. See phases/implementation_and_review.md → Phase 8. Update state to tests_passing.
Phase 9: Final Code Review
Last validation pass over ALL code with pipeline-code-reviewer. See phases/finalization.md → Phase 9. Update state to final_review_passed.
Phase 10: Refactor (Optimization)
Clean up dead code, debug artifacts, unused imports — the refactor leg of red-green-refactor. Tests must remain green; no behavior change. Re-verify then review. See phases/finalization.md → Phase 10. Update state to optimized.
Phase 11: Completeness Check
Final gate: pipeline-completeness-checker (model: opus) decides if the task is done. See phases/finalization.md → Phase 11. Max 2 iterations.
Techniques
Techniques: delegated to phase files — see Phase Overview table above.
Common Rationalizations
| Rationalization | Rebuttal |
|---|
| "plan is fine without review" | "plan review catches integration gaps that become 3 review cycles downstream" |
| "tests can wait until after first ship" | "missing tests on Day 1 become 'why is this flaky?' in Week 2" |
| "one big commit is easier" | "atomic commits are the only way to bisect a regression cheaply" |
| "I'll write code first, tests are easier after" | "tests written after the code mirror the implementation; tests written first describe the behavior" |
| "skip the RED check, of course they fail" | "tests that pass against missing code are silently broken — Phase 5 catches mock-only or import-only tests" |
Red Flags
- You dispatched coders without user approval of the plan.
- Plan review was skipped or truncated to save time.
- A coder agent was dispatched before tests were written and verified RED.
- A coder modified or deleted a test to make it pass.
- Phase 5 was skipped on the assumption that tests "must" fail without implementation.
Verification
Output the final state block: task dir path, all phases marked complete, review iteration counts, file list, and tests-passing status. If any phase is incomplete, print the blocker explicitly.
Error Handling
Agent failure → retry once, then escalate. No test framework → ask how to proceed. No linter → note in summary, don't block. Always save state before spawning agents.
State Management
After each phase, update state.md with current phase, iteration counts, files modified, and escalation notes. Allows resumption if interrupted.