| name | dev-review |
| description | This skill should be used when the user asks to 'review the code', 'check implementation quality', or 'run code review'. |
| user-invocable | false |
| disable-model-invocation | true |
| hooks | {"PreToolUse":[{"matcher":"Write|Edit","hooks":[{"type":"command","command":"uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/dev-delegation-guard.py"}]},{"matcher":"Agent","hooks":[{"type":"command","command":"GATE_ARTIFACT=.planning/VALIDATION.md GATE_STATUS=validated GATE_BLOCKED_TOOLS=Agent GATE_DESCRIPTION=\"Test gap validation\" GATE_REMEDY=\"Return to dev-implement and run dev-test-gaps (Phase 5.5). VALIDATION.md must have status: validated (not gaps_found) before review starts.\" uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/phase-gate-guard.py"}]}]} |
Iteration topology: parallel multi-reviewer fan-out (fresh read-only subagents; main chat reconciles)
Context Check
Before starting this phase, check remaining context:
| Level | Remaining | Action |
|---|
| Normal | >35% | Proceed |
| Warning | 25-35% | Finish the current step, then invoke dev-handoff |
| Critical | ≤25% | Invoke dev-handoff immediately — resume fresh |
At Warning/Critical: Read ${CLAUDE_SKILL_DIR}/../../skills/dev-handoff/SKILL.md and follow its instructions.
The Iron Law of Topic Changes
If the user sends a message NOT about the current review, announce the loop pause before responding — then resume. dev-review runs a REVIEW_STATE.md fix-and-re-review loop; silently abandoning it (as dev-debug:121-139 documents) drops the structure the user invoked.
Protocol:
- Announce: "Pausing the dev-review loop to address your request."
- Handle the off-topic request (normal tools allowed — you're outside the loop).
- Announce: "Resuming dev-review. Re-reading .planning/REVIEW_STATE.md for current state."
- Re-read
.planning/REVIEW_STATE.md and continue the review/fix iteration.
If the message could be EITHER a new topic OR part of the review, ask before assuming — do NOT silently abandon the loop.
Contents
Code Review
Load shared enforcement:
Auto-load all constraints matching applies-to: dev-review:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py dev-review
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Dynamic plan re-read: Before starting review, re-read .planning/SPEC.md and .planning/PLAN.md to catch any requirements or tasks added during implementation. Do not rely on cached state from prior phases.
Single-pass code review combining spec compliance and quality checks. Uses confidence-based filtering to report only high-priority issues.
## Prerequisites - Test Output Gate
Do NOT start review without test evidence.
Before reviewing, verify these preconditions:
.planning/LEARNINGS.md contains actual test output
- Tests were run (not just written)
- Test output shows PASS (not SKIP, not assumed)
What Counts as Test Evidence
| Valid Evidence | NOT Valid |
|---|
meson test output with results | "Tests should pass" |
pytest output showing PASS | "I wrote tests" |
| Screenshot of working UI | "It looks correct" |
| Playwright snapshot showing expected state | "User can verify" |
| D-Bus command output | "The feature works" |
| E2E test output with user flow verified | "Unit tests pass" (for UI changes) |
### The E2E Evidence Requirement
FOR USER-FACING CHANGES: Unit test evidence is INSUFFICIENT.
Before approving user-facing changes, verify:
- Unit tests pass (necessary but not sufficient)
- E2E tests pass (required for approval)
- Visual evidence exists (screenshots/snapshots for UI)
| Change Type | Unit Evidence | E2E Evidence | Approval? |
|---|
| Internal refactor | Yes | N/A | APPROVE |
| API change | Yes | Missing | BLOCKED |
| UI change | Yes | Missing | BLOCKED |
| User workflow | Yes | Missing | BLOCKED |
Return BLOCKED if E2E evidence is missing for user-facing changes.
"Unit tests pass" without E2E for UI changes is NOT approvable.
Gate Check
Check LEARNINGS.md for test output:
rg -E "(PASS|OK|SUCCESS|\d+ passed)" .planning/LEARNINGS.md
If no test output is found, STOP and return to /dev-implement.
"It should work" is NOT evidence. Test output IS evidence.
Review Strategy Choice
After verifying test output in LEARNINGS.md, choose review strategy.
Skip this choice when:
- Trivial changes (< 50 LOC, single file)
- Purely cosmetic changes (formatting, comments)
- Automated refactoring (rename, extract)
- Internal utility functions (not user-facing or security-sensitive)
Step 1: Probe Codex availability (silent)
Codex provides an out-of-process adversarial reviewer that uses a different
model family than Claude — the diversity catches issues a Claude-reviewing-Claude
loop would miss. When installed and authenticated, it is the default
adversarial path. When unavailable, fall back to the existing Claude-based
flow without prompting the user about installation.
Read ${CLAUDE_SKILL_DIR}/../../references/codex-availability.md for the full
probe and invocation contract. Execute the probe before asking the user:
CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
if [ -n "$CODEX_SCRIPT" ]; then
node "$CODEX_SCRIPT" setup --json 2>/dev/null | jq -r '.ready // false'
else
echo "false"
fi
Set CODEX_READY=true only when the probe prints true. Otherwise
CODEX_READY=false and skip Codex entirely — do not announce its absence.
Step 2: Ask the user
If CODEX_READY=true:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Codex adversarial review (Recommended)", "description": "Out-of-process adversarial review via Codex. Different model family from Claude — catches issues a Claude-on-Claude loop misses. Default for adversarial review."},
{"label": "Single Claude reviewer", "description": "Combined Claude review covering spec compliance and code quality. Faster, lower overhead. Use when Codex is overkill."},
{"label": "Parallel Claude review (Thorough)", "description": "Spawn 3 specialized Claude reviewers (Security, Performance, Tests). Use when Codex is unavailable or you want multi-perspective Claude review."}
],
"multiSelect": false
}])
If CODEX_READY=false:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Single reviewer (Default)", "description": "Combined review covering spec compliance and code quality. Faster, lower overhead."},
{"label": "Parallel review (Thorough)", "description": "Spawn 3 specialized reviewers (Security, Performance, Tests). Use for security-sensitive, performance-critical, or test-heavy PRs. Requires reconciliation."}
],
"multiSelect": false
}])
Routing:
Codex Adversarial Review
Use this section when the user chose Codex adversarial review.
Reference: See references/codex-availability.md for the full invocation
contract, JSON schema, and verdict mapping table.
1. Prerequisites Check
Before invoking Codex, verify (same as the other review paths):
- Test evidence exists — LEARNINGS.md contains actual test output
- E2E evidence for UI changes — user-facing changes have E2E test output
- SPEC.md exists — for REQ-ID tagging of findings post-hoc
- Git repo present — Codex adversarial review is git-diff scoped
If any prerequisite fails, STOP and return BLOCKED to /dev-implement.
2. Estimate Scope and Choose Wait vs Background
git status --short --untracked-files=all
git diff --shortstat --cached
git diff --shortstat
Wait when the diff is clearly tiny (1-2 files, no untracked dir-sized changes).
Otherwise launch in background.
3. Invoke Codex
Foreground (small diff):
CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
node "$CODEX_SCRIPT" adversarial-review --wait
Background (anything bigger):
Launch with Bash(..., run_in_background: true) and tell the user:
"Codex adversarial review started in the background. Check /codex:status for progress."
Then await completion notification before proceeding to step 4.
Optional focus text — append SPEC.md context to weight the review:
node "$CODEX_SCRIPT" adversarial-review --wait "focus: REQ-AUTH-01 token rotation under retry"
4. Parse Verdict
Codex returns JSON validated against its review-output schema. Top-level fields:
verdict (approve | needs-attention), summary, findings[], next_steps[].
Each finding has severity, title, body, file, line_start, line_end,
confidence (0-1 float), recommendation.
Apply the iron law: only confidence >= 0.8 findings block. Multiply by 100
when displaying alongside Claude-style scores.
| Codex result | dev-review verdict |
|---|
verdict: approve | APPROVED |
needs-attention + any finding ≥ 0.8 confidence | CHANGES_REQUIRED |
needs-attention + all findings < 0.8 confidence | APPROVED (log advisory findings to LEARNINGS.md) |
5. Tag Findings to Requirements
Codex doesn't know SPEC.md REQ-IDs. For each blocking finding:
- Read
.planning/SPEC.md
- Tag the finding with the most likely REQ-ID (or
OUT-OF-SPEC)
OUT-OF-SPEC findings are advisory unless user opts in
6. Report
Use the same output structure as ## Required Output Structure below, with
Reviewer: Codex (adversarial) in the header. Each issue includes the
Codex confidence (×100) and the REQ-ID you tagged in step 5.
7. Iteration & Re-Review
Codex adversarial review participates in the same REVIEW_STATE.md loop as
Claude reviewers — increment iteration on CHANGES_REQUIRED, escalate at
iteration 3. Re-runs stay on Codex unless it becomes unavailable between
runs (re-probe each iteration).
The "Iron Law of Re-Review" still applies: implementer claims "fixed" → main
chat re-invokes Codex via the same command — no spot-checks.
Phase Complete (Codex Adversarial)
After Codex review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES_REQUIRED: Return to /dev-implement with the parsed findings.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
Parallel Review (Thorough)
Use this section when user chose "Parallel review (Thorough)" above.
Prerequisite: Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS enabled. If unavailable, fall back to single reviewer.
1. Prerequisites Check
Before spawning reviewers, verify:
- Test evidence exists - LEARNINGS.md contains actual test output (check first!)
- E2E evidence for UI changes - User-facing changes have E2E test output (not just unit tests)
- Changed files identified -
git diff --name-only to scope review
- SPEC.md exists - reviewers verify against spec, not assumptions
If any prerequisite fails, STOP and return BLOCKED to /dev-implement.
2. When to Use Parallel Review
Use parallel review when:
- Security-sensitive changes (auth, permissions, data access, crypto, input validation)
- Performance-critical paths (tight loops, database queries, API endpoints)
- Test-heavy PRs (new test infrastructure, testing frameworks, E2E flows)
- Complex PRs (4+ files changed, multiple subsystems affected)
- High-stakes deployments (production hotfixes, customer-facing releases)
Do NOT use when:
- Simple bug fixes (< 50 LOC, single file)
- Documentation or config changes
- Automated refactoring (no logic changes)
- Internal utilities (not security-sensitive or performance-critical)
- Overhead exceeds benefit (< 4 files changed)
3. Create Team and Spawn Reviewers
Team Creation
TeamCreate(name="Code Review", task_description="Parallel code review with 3 specialized reviewers")
Press Shift+Tab to enter delegate mode. The lead coordinates reviews, does NOT review code directly.
Spawn 3 Reviewers
Each reviewer receives a self-contained prompt from a reference file. Reviewers start with a blank conversation and do NOT auto-load skills. Read the prompt, substitute variables, and paste it in full.
Tool Restrictions: All reviewers are READ-ONLY. Dispatch each with allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]. Reviewers MUST NOT use Write or Edit tools. They read code, analyze it, and report findings — the main chat handles all fixes.
Generate the review package ONCE (shared across all lenses):
The diff is the largest artifact in review. Write it to a file ONCE and point every
reviewer at the path — do not paste the diff into three prompts (it parks the most
expensive bytes in context three times over).
bash ${CLAUDE_SKILL_DIR}/../../scripts/dev/review-package.sh <BASE_SHA> HEAD
Before spawning, substitute these variables in each prompt:
REVIEW_PACKAGE_PATH -> the path review-package.sh printed (each reviewer reads it ONCE; do NOT paste the diff)
SPEC_CONTEXT -> relevant sections of .planning/SPEC.md (paste inline, do NOT reference file)
LEARNINGS_TEST_OUTPUT -> test output from .planning/LEARNINGS.md (paste actual output)
PLUGIN_ROOT -> resolved base directory for skill paths (relative to this skill's base directory)
Reviewer prompts (read, substitute variables, send as message):
| Reviewer | Focus | Prompt Source |
|---|
| 1. Security | Vulnerabilities, auth, data exposure, crypto | references/security-reviewer.md |
| 2. Performance | Complexity, queries, memory, hot paths | references/performance-reviewer.md |
| 3. Tests | Coverage, correctness, reliability, E2E | references/tests-reviewer.md |
4. Lead Monitoring
While reviewers work, the lead:
- Watches for completion messages from all 3 reviewers
- Does NOT review code directly - your job is coordination and reconciliation
- If a reviewer asks a question: Answer it, then broadcast to other reviewers if relevant
- If a reviewer is taking significantly longer than others: Message them for status
- When all 3 reviewers complete: Proceed to reconciliation
5. Reconciliation Protocol (3 Passes)
Post-subagent boundary (the highest-risk moment). During reconciliation the lead consolidates findings — it does NOT re-review the code or fix anything:
| Lead CAN (verification/coordination) | Lead CANNOT (investigation/fixing) |
|---|
| Read reviewer findings, dedup, prioritize | Re-read source to second-guess a finding |
| Record verdict in REVIEW_STATE.md | Edit code to fix an issue a reviewer raised |
| Route CHANGES_REQUIRED back to dev-implement | Grep the codebase to build a new finding the reviewers missed |
Fixes are dev-implement's job, never the review lead's. (Full rule: auto-loaded verification-vs-investigation / delegation-law constraints.)
After ALL reviewers message completion, the lead performs three passes:
**Pass 1 -- Deduplication:**
Multiple reviewers may find the same issue (e.g., input validation gap found by both Security and Tests reviewers).
- Read all reviewer findings
- Group by file and line number
- Identify duplicates:
- Same file:line
- Same root cause (even if described differently)
- Merge duplicates:
- Keep the highest confidence score
- Combine descriptions if both add value
- Attribute to both reviewers
Example:
Security found: "file.py:42 - Input not validated (Confidence: 85)"
Tests found: "file.py:42 - Missing test for invalid input (Confidence: 80)"
-> Merge: "file.py:42 - Input validation missing + no test coverage (Confidence: 85, found by Security + Tests)"
Pass 2 -- Prioritization:
Not all issues are equally important. Rank by:
- Severity x Confidence:
- Critical (90-100 confidence) > Important (80-89)
- Security > Performance > Tests (when confidence is equal)
- Impact on users:
- User-facing > Internal
- Data loss risk > Slowness > Test gaps
- Fix effort:
- Quick wins (< 30 min) should be fixed now
- Large refactors (> 2 hours) should be filed as tech debt
Create final prioritized list:
1. [CRITICAL] Security: XSS in user input (Confidence: 95)
2. [CRITICAL] Tests: User workflow untested (Confidence: 90)
3. [IMPORTANT] Performance: N+1 query in hot path (Confidence: 85)
4. [IMPORTANT] Tests: Error path missing coverage (Confidence: 80)
Pass 3 -- Integration Check:
Proposed fixes may conflict with each other.
- Read each reviewer's suggested fixes
- Check for conflicts:
- Do two fixes modify the same code?
- Does one fix introduce a problem the other reviewer would flag?
- Do fixes require contradictory approaches?
- If conflicts exist:
- Design a unified fix addressing both concerns
- OR: Flag the conflict and ask reviewers for input
Example conflict:
Security: "Add input validation on every field"
Performance: "Batch validate to reduce overhead"
-> Unified: "Batch validate with early exit on first invalid field (security + performance)"
If ANY pass finds conflicts -> resolve before reporting final verdict.
6. Final Verdict
After reconciliation, the lead reports:
## Parallel Code Review: [Feature Name]
Reviewed by: Security, Performance, Tests
### Reconciliation Summary
**Issues found:** X total (Y critical, Z important)
**Duplicates merged:** N
**Conflicts resolved:** M
### Critical Issues (Must Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Important Issues (Should Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Verdict: APPROVED | CHANGES REQUIRED
[If APPROVED]
All 3 reviewers approved with no issues >= 80 confidence.
[If CHANGES REQUIRED]
X critical and Y important issues must be addressed. Return to /dev-implement.
Phase Complete (Parallel Review)
After parallel review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES REQUIRED: Return to /dev-implement to fix reported issues.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
## The Iron Law of Review
You MUST report only issues with >= 80% confidence. This is not negotiable.
Before reporting ANY issue, complete these verification steps:
- Verify it's not a false positive
- Verify it's not a pre-existing issue
- Assign a confidence score
- Report only if score >= 80
You MUST apply this rule even when encountering:
- "This looks suspicious"
- "I think this might be wrong"
- "The style seems inconsistent"
- "I would have done it differently"
You MUST discard any low-confidence issue found during review.
## The Iron Law of Re-Review
NO "FIXED" CLAIMS WITHOUT FRESH RE-REVIEW. This is not negotiable.
When review returns CHANGES REQUIRED and the implementer applies fixes, you MUST:
- Re-run the SAME review criteria (not lighter, not spot-check)
- Verify issues are actually resolved (not assumed)
- Check for new issues introduced by fixes (regression)
- Only THEN return APPROVED
"I fixed it" without re-reviewing is NOT HELPFUL — unverified fixes ship bugs to the user.
The Audit-Fix Loop (Max 3 Iterations)
Iteration 1: Review → CHANGES REQUIRED → Fix → Re-Review
↓
Iteration 2: Re-Review → CHANGES REQUIRED → Fix → Re-Review
↓
Iteration 3: Re-Review → CHANGES REQUIRED → Fix → Re-Review
↓
Still issues? → ESCALATE to user
All clean? → APPROVED
Track iterations in .planning/REVIEW_STATE.md:
---
iteration: 1
max_iterations: 3
last_review_date: 2026-03-09
issues_found_count: 5
---
Exit criteria:
- APPROVED: Zero issues >= 80 confidence
- ESCALATE: iteration >= 3 AND issues remain
- CONTINUE: iteration < 3 AND issues remain → loop back
Before returning any verdict, check iteration count:
- READ
.planning/REVIEW_STATE.md (create if missing with iteration: 1)
- If iteration >= 3 and issues remain: ESCALATE (don't return CHANGES REQUIRED)
- If iteration < 3 and issues remain: INCREMENT iteration, return CHANGES REQUIRED
- If no issues: APPROVED
Claiming APPROVED without re-review after fixes is NOT HELPFUL — you're rubber-stamping unverified work that ships bugs to the user.
Re-Review Facts
- A re-review after fixes runs the FULL review with the same criteria — spot-checking only the fixed lines misses the regressions a fix introduces elsewhere, which is the failure mode re-review exists to catch.
- At iteration 3 with issues remaining the verdict is ESCALATE, never APPROVED — an approval issued to end the loop is a fabricated verdict, not a judgment.
Review Focus Areas
Test Evidence (Check First!)
Spec Compliance
Code Quality
Confidence Scoring
Rate each potential issue from 0-100:
| Score | Meaning |
|---|
| 0 | False positive or pre-existing issue |
| 25 | Might be real, might not. Stylistic without guideline backing |
| 50 | Real issue but nitpick or rare in practice |
| 75 | Verified real issue, impacts functionality |
| 100 | Absolutely certain, confirmed with direct evidence |
CRITICAL: Only report issues with confidence >= 80.
Required Output Structure
## Code Review: [Feature/Change Name]
Reviewing: [files/scope being reviewed]
### Test Evidence Verified
- Unit tests: [PASS/FAIL/MISSING] - [paste key output line]
- Integration: [PASS/FAIL/N/A]
- UI/Visual: [Screenshot taken / Snapshot verified / N/A]
### Critical Issues (Confidence >= 90)
#### [Issue Title] (Confidence: XX)
**Location:** `file/path.ext:line_number`
**Requirement:** [REQ-ID from SPEC.md — every issue MUST trace to a requirement ID]
**Problem:** Clear description of the issue
**Fix:**
```[language]
// Specific code fix
Important Issues (Confidence 80-89)
[Same format as Critical Issues]
Summary
Verdict: APPROVED | CHANGES REQUIRED | BLOCKED (no test evidence)
[If APPROVED]
The reviewed code meets project standards. Tests pass. No issues with confidence >= 80 detected.
[If CHANGES REQUIRED]
X critical issues and Y important issues must be addressed before proceeding.
[If BLOCKED]
Cannot approve without test evidence. Return to /dev-implement and run tests.
**If review finds the implementation fundamentally violates the spec (not just minor issues), DELETE the contaminated implementation and return to dev-implement for a fresh attempt. Do not patch a structurally wrong approach.**
### Delete & Restart Protocol
**When implementation deviates fundamentally from spec, DELETE and restart entirely.**
| Situation | Action |
|-----------|--------|
| Code uses wrong protocol/architecture than spec | DELETE. Rewrite from scratch with correct approach. |
| Code implements different approach than PLAN.md | DELETE. User approved specific approach for a reason. |
| Fundamental misunderstanding of requirements | DELETE. Don't patch. Fresh subagent with correct understanding. |
| Patch would require 30%+ of implementation to change | DELETE. Rewrite is cleaner than patching wrong foundation. |
**Why delete instead of patch:** Patching a structurally wrong approach creates technical debt. Fresh implementation from correct architecture is faster than fixing wrong foundation.
**When to patch instead:** Bug in otherwise-correct implementation, missing edge case, performance tweak, minor deviation that doesn't affect core behavior.
**The test:** If the subagent says "oh, I misunderstood the whole approach" → DELETE and restart.
## Agent Invocation
Spawn Task agent for review execution:
Task(subagent_type="general-purpose",
allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]):
"Review implementation against .planning/SPEC.md.
Tool Restrictions: You are READ-ONLY. You MUST NOT use Write or Edit tools. You read code, check test evidence, and report issues — you do NOT fix them. The main chat handles all fixes.
FIRST: Check .planning/LEARNINGS.md for test output.
Return BLOCKED immediately if no test output is found.
Complete single-pass review covering:
- Test evidence - tests actually run and pass?
- Spec compliance - all requirements met?
- Code quality - simple, correct, follows conventions?
Confidence score each issue (0-100).
Report only issues with >= 80 confidence.
Return structured output per /dev-review format."
## Review Facts
- An APPROVED verdict asserts three things: tests actually ran, the output shows PASS (not SKIP, not assumed), and the evidence was verified by the reviewer rather than trusted from a report. An APPROVED with any leg missing is a fabricated verdict — review is the last gate before bugs ship, and BLOCKED is the honest answer when evidence is absent.
- During reconciliation the controller dedups and prioritizes; it does not get to drop a reviewer's qualifying finding or pre-rate its severity down. A finding suppressed at reconciliation ships the bug with the review's sign-off attached — the one outcome review exists to prevent. (Conflicts between lenses get a *unified* fix in Pass 3, never a silent deletion.)
- A rationale a reviewer or implementer offers ("it's intentional", "covered elsewhere") is a claim to verify against the diff and SPEC, never a reason to downgrade the finding. Lowering severity on the strength of an unverified rationale is trusting the report — the exact failure the read-only-reviewer design rules out.
- A finding the diff cannot settle is labeled **"Cannot verify from diff"** with the missing context named — not silently dropped and not guessed into Critical. Dropping it understates risk; inventing a verdict fabricates one; naming the gap is the honest reviewer move and routes the right follow-up.
- Every lens reads the SAME review package file (one `review-package.sh` run), so re-deriving the diff per reviewer (or pasting it per prompt) is wasted turns and wasted context for an identical artifact — generate once, share the path.
## Quality Standards
- **Test evidence is mandatory** - do not approve without test output
- Do not report style preferences lacking project guideline backing
- Do not report pre-existing issues (confidence = 0)
- Make each reported issue immediately actionable
- Use absolute file paths with line numbers in reports
- Treat uncertainty as below 80 confidence
## Gate: Exit Review Loop
**Checkpoint type:** human-verify (test evidence and confidence scores are machine-verifiable)
Before claiming review is complete (APPROVED or ESCALATE):
-
IDENTIFY → What proves the review verdict is valid?
- APPROVED: Zero issues >= 80 confidence
- ESCALATE: iteration >= 3 AND issues remain
-
RUN → Check .planning/REVIEW_STATE.md for iteration count
Read review output for issue count
-
READ → Examine both:
- Review output (issues list)
- REVIEW_STATE.md (iteration number)
-
VERIFY → Verdict matches state:
- APPROVED only if 0 issues
- ESCALATE only if iteration >= 3
- CHANGES REQUIRED only if iteration < 3
-
CLAIM → Only after steps 1-4 pass, return verdict
**If iteration >= 3 and you're returning CHANGES REQUIRED instead of ESCALATE, you're ignoring the iteration limit — escalate to the user instead of looping forever.**
## Phase Complete
**Phase summary (append to LEARNINGS.md):**
```yaml
## Phase: Review
---
phase: review
status: completed
implements: [] # review verifies; implements no new requirement IDs
requires: [VALIDATION.md, LEARNINGS.md]
provides: [REVIEW_STATE.md, review-verdict]
affects: [] # read-only review; fixes happen in dev-implement
verdict: APPROVED | CHANGES_REQUIRED | ESCALATE | BLOCKED
iterations: N
issues-found: X (Y critical, Z important)
---
After review completes, handle verdict-specific transitions:
If APPROVED (no issues >= 80 confidence)
Mark review complete in .planning/REVIEW_STATE.md:
---
status: APPROVED
iteration: [N]
max_iterations: 3
last_review_date: [date]
issues_found_count: 0
verdict: APPROVED
---
The status: APPROVED line is the structural gate dev-verify checks — only an APPROVED review admits verification. On non-approved paths (CHANGES_REQUIRED / ESCALATE / BLOCKED) set status: to that verdict so the gate correctly blocks.
Immediately invoke dev-verify:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES REQUIRED (issues >= 80 confidence found, iteration < 3)
Update .planning/REVIEW_STATE.md:
---
status: CHANGES_REQUIRED
iteration: [N+1]
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: CHANGES_REQUIRED
---
Return to /dev-implement with specific issues. Implementer MUST re-invoke /dev-review after fixes.
Critical: When implementer returns claiming "fixed", you MUST re-run the FULL review. No shortcuts.
If ESCALATE (iteration >= 3, issues remain)
Update .planning/REVIEW_STATE.md:
---
status: ESCALATE
iteration: 3
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: ESCALATE
---
Report to user:
Review Loop Escalation (3 iterations completed)
After 3 fix-review cycles, [N] issues remain:
[List issues]
Options:
1. Accept current state and proceed (issues become tech debt)
2. Extend review (manual approval for iteration 4+)
3. Rethink approach (return to /dev-design)
Which option do you prefer?
If BLOCKED (no test evidence)
Return immediately to /dev-implement to collect test evidence. Do NOT increment iteration counter - no review occurred.
Workflow Continuity After Review
| Verdict | Next Action | Iteration Counter |
|---|
| APPROVED | Invoke /dev-verify immediately, mark task [x] in PLAN.md | Reset to 1 for next task |
| CHANGES REQUIRED | Return to /dev-implement, implementer fixes then re-invokes /dev-review | Increment |
| ESCALATE | Ask user for direction | Keep at max |
| BLOCKED | Return to /dev-implement for test evidence | No change (no review ran) |
Do NOT pause between review completion and next action. The workflow is sequential.