name	polish
description	ALWAYS invoke when the user wants to polish code against criteria — fix failing tests, meet quality standards. Triggers: "polish X", "make tests pass", "fix failing tests". When NOT to use: new feature (use build), single bug (use debug).

Code Polishing Pipeline

Overview

Orchestrates iterative polish of existing code against specific completion criteria. Unlike build which builds from scratch, polish works with what's already there — running tests, diagnosing failures, fixing issues with review loops, and optimizing.

When a behavioral criterion is not yet covered by any test, polish writes the test first (TDD-style), verifies it fails against the current code, then fixes. This keeps every fix anchored to a regression-catching assertion.

When to Use

User has existing code that needs to meet specific criteria or quality standards.
Triggers: "polish X", "make tests pass", "fix failing tests", "clean up the code", "finish this implementation".
Code exists but is failing tests, lint, or quality gates.

When NOT to use

Starting a new feature from scratch — use build.
A single isolated bug with a reproducer — use debug.
If the root cause of failures is known (e.g., a specific bug was identified) — use debug to fix the root cause first, then return to polish for quality criteria.
Read-only verification with no fix intent — use verify.
Map-reduce dead-code cleanup — use optimize.

Input

$ARGUMENTS — The completion criteria. This can be:
- "All tests in test_foo.py must pass"
- "Pre-commit hooks must pass on all changed files"
- "The WebSocket reconnection must handle timeout correctly"
- "Fix all failing tests and clean up the implementation"
- Any combination of pass/fail criteria and behavioral requirements

If empty, ask the user what needs to be polished.

Scope Parameter

See skills/shared/scope-parameter.md for the canonical scope modes (branch, global, working) and their git commands. Document any skill-specific overrides or restrictions below this line.

Default (no scope:): all project files eligible for edits.
scope: controls which files agents may edit. Tests and linters always run on the full project. Criteria determine what to verify; scope determines where fixes may be applied.

Constants

MAX_FIX_ITERATIONS: 5 — max code-test-review cycles before escalating
MAX_REVIEW_RETRIES: 3 — max times a review can fail before escalating
TASK_DIR: .mz/task/ in the project root

Core Process

Phase Overview

Phase	Goal	Details
0	Setup	Inline below
1	Initial Assessment	`phases/assess_and_fix.md`
1.5	User Approval Gate	Inline below
2	Quick Fixes	`phases/assess_and_fix.md`
3	Research (if needed)	`phases/assess_and_fix.md`
4	Fix-Test-Review Loop	`phases/fix_review_and_finalize.md`
5	Optimization	`phases/fix_review_and_finalize.md`
6	Final Verification	`phases/fix_review_and_finalize.md`

Read the relevant phase file when you reach that phase. Do not read both phase files upfront.

Phase 0: Setup

Resolve scope — if scope: extracted, resolve to a concrete file list and save to .mz/task/<task_name>/scope_files.txt. Otherwise all project files eligible.
Parse criteria — break input into a checklist of discrete, verifiable criteria (e.g. "all tests pass", "pre-commit clean", "no debug prints in src/").
Task name — <YYYY_MM_DD>_polish_<slug> where <YYYY_MM_DD> is today's date (underscores) and slug is snake_case of criteria (max 20 chars); on same-day collision append _v2, _v3.
Task dir & state — create .mz/task/<task_name>/, write state.md with Status, Phase, Started, Iteration (0), and the criteria checklist.
Task tracking — TaskCreate per pipeline phase. Then read phases/assess_and_fix.md and proceed to Phase 1.

Phase 1.5: User Approval Gate

This orchestrator (not a subagent) must present to the user via AskUserQuestion. This step is interactive and must not be delegated.

Mandatory pre-read: Read .mz/task/<task_name>/assessment.md with the Read tool. Capture the full file contents (criteria checklist showing which items are failing, proposed quick-fix plan with one line per fix target, estimated file count in scope) into context.

Mandatory inline-verbatim presentation: The AskUserQuestion question body must contain the verbatim contents of assessment.md. Never substitute a path, status summary, line count, or <failing criteria list> / <proposed quick-fix plan> / <estimated file count> placeholders — the user must review the actual assessment in the question itself, not have to open the file separately.

Before invoking AskUserQuestion, emit a text block to the user:

**Assessment Ready for Review**
Initial assessment complete. The checklist below shows which criteria are passing and which are failing, along with the proposed quick-fix strategy.

- **Approve** → proceed to Phase 2 (Quick Fixes)
- **Reject** → mark task aborted, stop here
- **Feedback** → incorporate changes, re-run Phase 1, loop back to this gate

Invoke AskUserQuestion with this body (where <verbatim assessment.md contents> is replaced by the bytes you just read):

Phase 1 assessment complete. Please review:

<verbatim assessment.md contents>

Type **Approve** to proceed, **Reject** to cancel, or type your feedback.

Response handling:

"approve" → update state, proceed to Phase 2.
"reject" → update state to aborted_by_user and stop. Do not proceed.
Feedback → incorporate, re-run Phase 1 if needed, overwrite assessment.md, return to this gate, re-read assessment.md, and re-present via AskUserQuestion with the full new contents — never diff-only, never summary-only, since context compaction may have destroyed the user's memory of earlier iterations. This is a loop — repeat until the user explicitly approves. Never proceed to Phase 2 without explicit approval.

Techniques

Techniques: delegated to phase files — see Phase Overview table above.

Common Rationalizations

Rationalization	Rebuttal
"good enough, ship"	"polish is the last line of defense before users see it"
"edge cases are rare"	"every bug report you've ever gotten is an edge case"
"tests are green, refactor can wait"	"green-test refactor debt compounds"

Red Flags

Edge cases were deferred to "next sprint" instead of handled now.
Code was declared "good enough" without a final criteria sweep.
Polish was equated with refactor — criteria drifted mid-loop.

Verification

Output the final criteria checklist with every item checked, along with the test run status, lint status, and iteration count. Any unchecked item blocks completion.

Error Handling

If a test framework isn't detected, ask the user how to run tests.
If a criterion can't be verified programmatically, ask the user for a verification command.
If research fails to identify root cause after 2 attempts, ask the user for context.
Always save state before spawning agents.
If a fix makes things worse (more criteria fail than before), revert the change immediately and try a different approach.

State Management

After each phase/iteration, update .mz/task/<task_name>/state.md with:

Current phase and iteration count
Criteria checklist (checked/unchecked)
Files modified so far
Any escalation notes

Track cumulative file changes across iterations so the optimizer knows the full scope.

name	polish
description	ALWAYS invoke when the user wants to polish code against criteria — fix failing tests, meet quality standards. Triggers: "polish X", "make tests pass", "fix failing tests". When NOT to use: new feature (use build), single bug (use debug).