[rollback] → Phase 1:plan → [approval] → Phase 2:structure
  → Phase 3:implementation [loop if partial]
  → [quality-gate]
  → Phase 4:refactoring → Phase 5:deduplication
  → Phase 6:review (parallel scans → dedupe → fix)
  → Phase 7:testing → Phase 8:evaluation
  → [quality-gate]
  → [lessons written]

When to Use

Refactoring a module, quality pass on a component, technical debt cleanup

Don't use for: New features → /build | Simple changes → /quick-change

Flags

Flag	Purpose
`--dry-run`	Show the phase table and stop
`--rollback`	Restore from last improve stash

Phase Table

#	Skill	Model	Gate Marker	Notes
1	plan	sonnet	PLAN_COMPLETE	Pause for user approval. Loads rubrics.
2	structure	sonnet	STRUCTURE_COMPLETE	Map existing, design changes
3	implementation	opus	IMPLEMENTATION_COMPLETE	Loop if partial (max 5). Quality gate runs after.
4	refactoring	sonnet	REFACTORING_COMPLETE
5	deduplication	haiku	DEDUPLICATION_COMPLETE
6	review	sonnet	REVIEW_COMPLETE	Parallel scans → dedupe → fix
7	testing	sonnet	TESTING_COMPLETE
8	evaluation	Bash+sonnet	EVALUATION_COMPLETE	Codex scores 1-100, fix→rescore loop (max 3), lessons.

Orchestrator Rules

NEVER do phase work yourself — you are a sequencer, not an implementer
NEVER skip a phase — every phase runs in order
NEVER proceed without gate marker — the subagent result must contain the marker string
ALWAYS present Phase 1 plan to user for approval before continuing
ALWAYS create rollback point first before any phase runs
ALWAYS record metrics after each phase completes

Rollback

If --rollback flag is set:

git stash list | grep "improve:" | head -1
# Extract stash ref and pop it
git stash pop <ref>

Then stop.

Dry Run

If --dry-run, print the phase table and stop.

Execution

Step 0: Rollback Point

git stash push -m "improve:$(basename {TARGET}):$(date +%s)"

Report the stash ref to the user.

Step 1: Phases 1-3 (Plan + Structure + Implementation)

For each phase, spawn a single Task subagent (subagent_type: "general-purpose") with the model from the Phase Table.

Phase 1-2 prompt:

Read the skill file at .claude/phases/{SKILL_NAME}/SKILL.md
and execute ALL of its instructions against: {TARGET}

This is an IMPROVEMENT workflow on existing code. Focus on analysis,
refactoring, and enhancement rather than greenfield creation.

Follow every step in the skill. Do not skip any steps.
When complete, end your final message with the marker: {GATE_MARKER}

After Phase 1: Read the plan file. Present the summary to the user. Ask for approval (Approve / Reject / Revise). Do not proceed until approved.

Phase 3 prompt (implementation):

Read the skill file at .claude/phases/implementation/SKILL.md
and execute ALL of its instructions against: {TARGET}

This is an IMPROVEMENT workflow on existing code. Focus on analysis,
refactoring, and enhancement rather than greenfield creation.

IMPORTANT: Follow the compile loop. For each unit: refresh the relevant
canon principle, write the code, then compile-check before starting the
next unit. Do not write all code first and check later.

QUALITY GATE RULES — the gate runs immediately after this phase.
Every violation below causes a pipeline failure and a retry. Write code
that passes on the first attempt:

  SECURITY (instant fail):
  - No hardcoded secrets (API keys, passwords, tokens, private keys)
  - No exec()/execSync() with template literals — use spawn() with args
  - No path.join/resolve with user input without traversal validation
  - No eval(), innerHTML assignment, or document.write()

  NAMING:
  - No parameters named: data, info, result, item, obj, val, tmp, temp, ret, res
  - No single-letter parameters (except _, i, j, k, e)
  - No exported functions shorter than 4 characters
  - No files named: utils.ts, helpers.ts, misc.ts, common.ts, shared.ts
  - No abbreviations in exports: mgr, impl, proc, svc, repo

  SIZE LIMITS:
  - Functions: max 30 significant lines
  - Files: max 300 lines
  - Parameters per function: max 4
  - Exports per file: max 10 (index.ts exempt)
  - Project imports per file: max 8
  - Class methods: max 10
  - Inheritance depth: max 2

  CODE QUALITY:
  - No magic numbers (except -1, 0, 1, 2) — extract to named constants
  - No magic strings in conditionals — extract to constants
  - No circular imports
  - No console.error(err) — use err.message
  - No existsSync() then readFileSync() on same path — use try/catch
  - No readFileSync() right after writeFileSync() on same path
  - No truthy check on optional numbers (0 is falsy) — use !== undefined
  - No JSDoc that restates the function name
  - Types/interfaces must appear before functions in each file
  - No empty catch/except/rescue blocks
  - Max 3 TODO/FIXME/HACK markers per file
  - No http:// URLs (except localhost) — use https://
  - No hardcoded IP:port values — use config/env vars

Follow every step in the skill. Do not skip any steps.
When complete, end your final message with the marker: IMPLEMENTATION_COMPLETE

Phase 3 completion loop: If output contains IMPLEMENTATION_PARTIAL, re-run targeting only remaining items. Max 5 iterations. If items remain after 5, report to user and ask whether to continue or halt.

Step 2: Quality Gate (after Phase 3)

Run via Bash (no subagent):

tsx scripts/quality-gate.ts {TARGET}

If non-zero, pass error output to Phase 3 for correction (max 2 retries).

Step 3: Phases 4-5 (Refine)

Phase 4-5 prompt:

Read the skill file at .claude/phases/{SKILL_NAME}/SKILL.md
and execute ALL of its instructions against: {TARGET}

This is an IMPROVEMENT workflow on existing code.

Follow every step in the skill. Do not skip any steps.
When complete, end your final message with the marker: {GATE_MARKER}

Step 4: Phase 6 (Review)

All reviewers see the same code state. One fix pass at the end.

4a. Parallel scans — spawn 4 Task agents simultaneously:

Agent A (gemini-scan): model: sonnet

Read the skill at workflow-skills/utils/gemini-scan/SKILL.md.
Execute against: {TARGET}

Run Gemini TWICE:
1. focus: "general" — code quality, architecture, AI smells
2. focus: "security" — think like an attacker, find vulnerabilities

Combine findings from both passes. Output all findings as:
[file:line] — description (severity)

End with: GEMINI_SCAN_DONE

Agent B (codex-scan): model: sonnet

Read the skill at workflow-skills/utils/codex-scan/SKILL.md.
Execute against: {TARGET}

Output all findings as:
[file:line] — description (category)

End with: CODEX_SCAN_DONE

Agent C (qodana-scan): model: haiku

Read the skill at workflow-skills/utils/qodana-scan/SKILL.md.
Execute against: {TARGET}

Output all findings as:
[file:line] — description (severity)

End with: QODANA_SCAN_DONE

Agent D (ai-smell-scan): model: haiku

Read the skill at workflow-skills/utils/ai-smell-scan/SKILL.md.
Execute against: {TARGET}

Output all findings as:
[file:line] [smell type]: description

End with: AI_SMELL_SCAN_DONE

4b. Deduplicate findings — the orchestrator (not an agent) parses all 4 scan outputs:

Extract [file:line] description from each
Same file + line within 5 lines + similar description = one finding
Keep the most specific description

4c. Fix — if findings exist, spawn 1 fix agent (model: sonnet):

Fix these review findings in {TARGET}:

{DEDUPED_FINDINGS_LIST}

SCOPE CONSTRAINT: Only modify code directly related to findings.
Do not refactor, rename, or restructure code that was not flagged.

COMPLEXITY BUDGET: Do not increase overall complexity. Net-zero or
net-negative lines/functions/types.
EXCEPTION: Security fixes are exempt.

NO SILENT FAILURES: Do not change a throw/crash to a log-and-continue.

Apply each fix. Run tests after.
When complete, end with: REVIEW_COMPLETE

If no findings from any scan, skip the fix agent and emit REVIEW_COMPLETE.

Step 5: Phases 7-8 (Verify)

Phase 7 (testing) prompt:

Read the skill file at .claude/phases/testing/SKILL.md
and execute ALL of its instructions against: {TARGET}

This is an IMPROVEMENT workflow on existing code.

Follow every step in the skill. Do not skip any steps.
When complete, end your final message with the marker: TESTING_COMPLETE

Phase 8 (evaluation) — orchestrator-owned:

The orchestrator owns scoring and fix coordination. Scoring runs via Bash — never delegated to an agent.

Prepare: Read .claude/phases/evaluation/SKILL.md for the scorecard prompt, rescore prompt, classification tree, and report template. Load rubrics per its Rubric Loading section. Build {SCORECARD_PROMPT} by inserting {RUBRIC_CRITERIA} into the scorecard template.

8a. Score via Bash (no agent):

which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_UNAVAILABLE"

If CODEX_UNAVAILABLE: skip evaluation, note in report, continue to Step 6. Otherwise:

rm -f /tmp/lens-eval-scores.md
{SCORECARD_PROMPT}
cat /tmp/lens-eval-scores.md

8b. Orchestrator parses output — extract ISSUE: lines and the SCORE: NN/100 line. Save as {INITIAL_SCORE} and {ISSUES_LIST}.

8c–8d. Fix→Rescore loop (max 3 iterations):

Set {CURRENT_SCORE} = {INITIAL_SCORE}, {CURRENT_ISSUES} = {ISSUES_LIST}, {ALL_FIX_APPLIED} = empty.

For each iteration (while {CURRENT_ISSUES} is non-empty AND iteration <= 3):

8c. Fix all issues — spawn FIX agent (subagent_type: "general-purpose", model: sonnet):

Fix ALL of these issues in {TARGET}:

{CURRENT_ISSUES}

For each fix, print: FIX_APPLIED: {file:line} | {what changed}

After all fixes: npm test

PROHIBITED: committing, re-scoring, modifying code not cited in the issues
End with: FIX_COMPLETE

8d. Rescore via Bash — collect FIX_APPLIED lines from fix agent, append to {ALL_FIX_APPLIED}. Build {RESCORE_PROMPT} using the Rescore Prompt template, injecting {CURRENT_SCORE} and the new FIX_APPLIED lines:

rm -f /tmp/lens-eval-scores.md
{RESCORE_PROMPT}
cat /tmp/lens-eval-scores.md
rm -f /tmp/lens-eval-scores.md

Parse SCORE: NN/100 → update {CURRENT_SCORE}. Parse remaining ISSUE: lines → update {CURRENT_ISSUES}.

Exit loop if: no remaining issues, OR score did not improve from previous iteration, OR iteration limit reached.

End of loop. Set {FINAL_SCORE} = {CURRENT_SCORE}, {REMAINING_ISSUES} = {CURRENT_ISSUES}.

8e. Lessons — spawn LESSON agent (subagent_type: "general-purpose", model: sonnet):

Classify fixes and write evaluation outputs. Do NOT modify source code.

Initial score: {INITIAL_SCORE}/100
Final score: {FINAL_SCORE}/100
Issues found: {ISSUES_LIST}
Fixes applied: {ALL FIX_APPLIED LINES}
Remaining issues: {REMAINING_ISSUES}

Classify each fix using this tree:
- Code pattern to avoid? YES + general → LESSON in both .claude/lessons.md and .claude/universal-lessons.md
- Code pattern to avoid? YES + project-specific → LESSON in .claude/lessons.md only
- Suggests pipeline/tool change? → PROPOSAL in .claude/eval-proposals.md
- Neither → eval-report.md only

Category: LOGIC | DESIGN | CODE_QUALITY | DUPLICATION | AI_SMELL

Read .claude/lessons.md and .claude/universal-lessons.md — skip duplicates.
Write .claude/eval-report.md (replace file using template from
.claude/phases/evaluation/SKILL.md Report Template section).
Append to lessons + proposals.
Verify writes by reading each file.

End with: LESSONS_COMPLETE

The orchestrator checks for FIX_COMPLETE and LESSONS_COMPLETE markers. After the lesson agent completes, emit EVALUATION_COMPLETE.

Step 6: Quality Gate (final)

npm test && tsx scripts/quality-gate.ts {TARGET}

If non-zero, pass error to Phase 7 (testing) for correction (max 2 retries). After Phase 7 fixes and gate passes, do NOT re-run Phase 8.

Step 7: Report

Improve: {TARGET}
  Rollback: stash@{N}

  ✓ Design    plan approved
  ✓ Build     implemented, gate passed
  ✓ Refine    refactored + deduped
  ✓ Review    4 scans, {N} findings fixed
  ✓ Verify    {N} tests, 0 failures
  ✓ Evaluate  {initial}/100 → {final}/100, lessons written

Rollback: /improve --rollback

Gate Check

After each subagent completes, check that its result contains the gate marker string.

Passes: Report phase completion, proceed.
Fails: Retry (same prompt) up to 3 times. If still failing, halt and report.

6 Mechanisms

#	Mechanism	Where	What
1	plan-approval	after Phase 1	User approves plan before building
2	quality-gate	after Phase 3, after Phase 8	Phase 3: lint + code pattern checks. Phase 8: lint + tests + code pattern checks.
3	implementation-loop	Phase 3	Re-run for remaining work items. Max 5.
4	gate-retry	all phases	Check for marker string. Retry 3x.
5	rollback	before pipeline	Git stash.
6	learning	Phases 1-5 read, Phase 8 writes	Lessons files + rubrics.