| name | polish |
| description | ALWAYS invoke when the user wants to polish code against criteria — fix failing tests, meet quality standards. Triggers: "polish X", "make tests pass", "fix failing tests". When NOT to use: new feature (use build), single bug (use debug). |
Code Polishing Pipeline
Overview
Orchestrates iterative polish of existing code against specific completion criteria. Unlike build which builds from scratch, polish works with what's already there — running tests, diagnosing failures, fixing issues with review loops, and optimizing.
When a behavioral criterion is not yet covered by any test, polish writes the test first (TDD-style), verifies it fails against the current code, then fixes. This keeps every fix anchored to a regression-catching assertion.
When to Use
- User has existing code that needs to meet specific criteria or quality standards.
- Triggers: "polish X", "make tests pass", "fix failing tests", "clean up the code", "finish this implementation".
- Code exists but is failing tests, lint, or quality gates.
When NOT to use
- Starting a new feature from scratch — use
build.
- A single isolated bug with a reproducer — use
debug.
- If the root cause of failures is known (e.g., a specific bug was identified) — use
debug to fix the root cause first, then return to polish for quality criteria.
- Read-only verification with no fix intent — use
verify.
- Map-reduce dead-code cleanup — use
optimize.
Input
$ARGUMENTS — The completion criteria. This can be:
- "All tests in test_foo.py must pass"
- "Pre-commit hooks must pass on all changed files"
- "The WebSocket reconnection must handle timeout correctly"
- "Fix all failing tests and clean up the implementation"
- Any combination of pass/fail criteria and behavioral requirements
If empty, ask the user what needs to be polished.
Scope Parameter
See skills/shared/scope-parameter.md for the canonical scope modes (branch, global, working) and their git commands. Document any skill-specific overrides or restrictions below this line.
- Default (no
scope:): all project files eligible for edits.
scope: controls which files agents may edit. Tests and linters always run on the full project. Criteria determine what to verify; scope determines where fixes may be applied.
Constants
- MAX_FIX_ITERATIONS: 5 — max code-test-review cycles before escalating
- MAX_REVIEW_RETRIES: 3 — max times a review can fail before escalating
- TASK_DIR:
.mz/task/ in the project root
Core Process
Phase Overview
| Phase | Goal | Details |
|---|
| 0 | Setup | Inline below |
| 1 | Initial Assessment | phases/assess_and_fix.md |
| 1.5 | User Approval Gate | Inline below |
| 2 | Quick Fixes | phases/assess_and_fix.md |
| 3 | Research (if needed) | phases/assess_and_fix.md |
| 4 | Fix-Test-Review Loop | phases/fix_review_and_finalize.md |
| 5 | Optimization | phases/fix_review_and_finalize.md |
| 6 | Final Verification | phases/fix_review_and_finalize.md |
Read the relevant phase file when you reach that phase. Do not read both phase files upfront.
Phase 0: Setup
- Resolve scope — if
scope: extracted, resolve to a concrete file list and save to .mz/task/<task_name>/scope_files.txt. Otherwise all project files eligible.
- Parse criteria — break input into a checklist of discrete, verifiable criteria (e.g. "all tests pass", "pre-commit clean", "no debug prints in src/").
- Task name —
<YYYY_MM_DD>_polish_<slug> where <YYYY_MM_DD> is today's date (underscores) and slug is snake_case of criteria (max 20 chars); on same-day collision append _v2, _v3.
- Task dir & state — create
.mz/task/<task_name>/, write state.md with Status, Phase, Started, Iteration (0), and the criteria checklist.
- Task tracking — TaskCreate per pipeline phase. Then read
phases/assess_and_fix.md and proceed to Phase 1.
Phase 1.5: User Approval Gate
This orchestrator (not a subagent) must present to the user via AskUserQuestion. This step is interactive and must not be delegated.
Mandatory pre-read: Read .mz/task/<task_name>/assessment.md with the Read tool. Capture the full file contents (criteria checklist showing which items are failing, proposed quick-fix plan with one line per fix target, estimated file count in scope) into context.
Mandatory inline-verbatim presentation: The AskUserQuestion question body must contain the verbatim contents of assessment.md. Never substitute a path, status summary, line count, or <failing criteria list> / <proposed quick-fix plan> / <estimated file count> placeholders — the user must review the actual assessment in the question itself, not have to open the file separately.
Before invoking AskUserQuestion, emit a text block to the user:
**Assessment Ready for Review**
Initial assessment complete. The checklist below shows which criteria are passing and which are failing, along with the proposed quick-fix strategy.
- **Approve** → proceed to Phase 2 (Quick Fixes)
- **Reject** → mark task aborted, stop here
- **Feedback** → incorporate changes, re-run Phase 1, loop back to this gate
Invoke AskUserQuestion with this body (where <verbatim assessment.md contents> is replaced by the bytes you just read):
Phase 1 assessment complete. Please review:
<verbatim assessment.md contents>
Type **Approve** to proceed, **Reject** to cancel, or type your feedback.
Response handling:
- "approve" → update state, proceed to Phase 2.
- "reject" → update state to
aborted_by_user and stop. Do not proceed.
- Feedback → incorporate, re-run Phase 1 if needed, overwrite
assessment.md, return to this gate, re-read assessment.md, and re-present via AskUserQuestion with the full new contents — never diff-only, never summary-only, since context compaction may have destroyed the user's memory of earlier iterations. This is a loop — repeat until the user explicitly approves. Never proceed to Phase 2 without explicit approval.
Techniques
Techniques: delegated to phase files — see Phase Overview table above.
Common Rationalizations
| Rationalization | Rebuttal |
|---|
| "good enough, ship" | "polish is the last line of defense before users see it" |
| "edge cases are rare" | "every bug report you've ever gotten is an edge case" |
| "tests are green, refactor can wait" | "green-test refactor debt compounds" |
Red Flags
- Edge cases were deferred to "next sprint" instead of handled now.
- Code was declared "good enough" without a final criteria sweep.
- Polish was equated with refactor — criteria drifted mid-loop.
Verification
Output the final criteria checklist with every item checked, along with the test run status, lint status, and iteration count. Any unchecked item blocks completion.
Error Handling
- If a test framework isn't detected, ask the user how to run tests.
- If a criterion can't be verified programmatically, ask the user for a verification command.
- If research fails to identify root cause after 2 attempts, ask the user for context.
- Always save state before spawning agents.
- If a fix makes things worse (more criteria fail than before), revert the change immediately and try a different approach.
State Management
After each phase/iteration, update .mz/task/<task_name>/state.md with:
- Current phase and iteration count
- Criteria checklist (checked/unchecked)
- Files modified so far
- Any escalation notes
Track cumulative file changes across iterations so the optimizer knows the full scope.