ワンクリックで
code-review
[Code Quality] Use when evaluating review feedback, requesting targeted code-quality review, or verifying completion claims.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
[Code Quality] Use when evaluating review feedback, requesting targeted code-quality review, or verifying completion claims.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
[Architecture] Use when designing solution architecture across backend, frontend, deployment, monitoring, testing, and code quality.
[Utilities] Use when you need to answer technical and architectural questions.
[Content] Use when you need to brainstorm as a PO/BA — structured ideation for problem-solving, new product creation, or feature enhancement.
[Git] Use when the user asks to compare branches, analyze git diffs, review changes between branches, update specifications based on code changes, or analyze what changed.
[Project Management] Use when creating user stories, writing acceptance criteria, analyzing requirements, or mapping business processes.
[Content] Use when you need to evaluate business idea viability: Business Model Canvas, financial projections, risk matrix, go-to-market, execution plan.
| name | code-review |
| description | [Code Quality] Use when evaluating review feedback, requesting targeted code-quality review, or verifying completion claims. |
Codex compatibility note:
- Invoke repository skills with
$skill-namein Codex; this mirrored copy rewrites legacy Claude/skill-namereferences.- Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.
- User-question prompts mean to ask the user directly in Codex.
- Ignore Claude-specific mode-switch instructions when they appear.
- Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.
- Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required
spawn_agentsubagent(s) for that task.- Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.
- For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.
- If a required step/tool cannot run in this environment, stop and ask the user before adapting.
Codex uses static project-reference loading instead of runtime-injected project docs. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.
Always read:
docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)docs/project-reference/lessons.md (always-on guardrails and anti-patterns)Missing/stale context route: If docs/project-config.json, the docs index, lessons.md, CLAUDE.md, AGENTS.md, or any task-required reference doc is missing or stale, auto-run $project-init or the narrow setup route ($project-config, $docs-init, $scan-all, $scan --target=<key>, $claude-md-init) before ordinary project-specific work. If Codex mirrors or AGENTS.md are missing/stale, ask the user to run $sync-codex; do not auto-run it.
Situation-based docs:
backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.mdfrontend-patterns-reference.md, scss-styling-guide.md, design-system/README.mddocs/specs/ pathing, or TC format: feature-spec-reference.md, spec-system-reference.md, spec-principles.mdworkflow-spec-test-code-cycle-reference.md plus the spec docs abovespec-system-reference.md and source Feature Specs under docs/specs/integration-test-reference.mde2e-test-reference.mdcode-review-rules.md plus domain docs above based on changed filesDo not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.
[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set
in_progresswhen step starts, setcompletedwhen step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.
Goal: Ensure reviewed code is correct, easy to change, convention-aligned, and verification-backed before acceptance or handoff — via receiving feedback with verification (not performative agreement), requesting targeted systematic reviews through the code-reviewer subagent, and enforcing verification gates before completion claims.
Routing boundary: If the user asks to review current changes, uncommitted work, staged/unstaged diffs, or a branch-to-branch diff, use
review-changesinstead.
Shared engine (keep in sync):
code-reviewandreview-changesshare the same review-protocolSYNC:blocks. Canonical source:.claude/skills/shared/sync-inline-versions.md; policy:SYNC:shared-protocol-duplication-policy. When you change a shared block in one skill, update the canonical file AND the sibling skill so the two never drift. The skills differ only in entry intent (explicit scope / feedback / completion-gate vs git diff) — not in review quality.
MANDATORY Before reviewing, search for project-specific reference docs:
Coding standards — search:
code-review-rules,coding-standards,style-guide,contributingArchitecture — search:patterns-reference,architecture,adrTest conventions — search:integration-test-reference,test-guide,test-conventionsDesign system — search:design-system,design-tokens,component-libraryRead found docs before reviewing. None found → rely on tech stack knowledge from file extensions/directory structure.
Workflow:
plans/reports/code-review-{date}-{slug}.md.code-graph/graph.db exists$review-ui when frontend/UI files are presentKey Rules:
review-changesThree practices: receiving feedback with technical rigor, requesting systematic reviews via code-reviewer subagent, enforcing verification gates before completion claims.
Run
python .claude/scripts/code_graph query tests_for <function> --jsonon changed functions to flag coverage gaps.
Skeptical. Every claim needs traced proof file:line. Confidence >80% to act.
file:line evidence (grep results, read confirmations)The success metric of every coding decision is future change cost. DRY, SRP, abstraction, design patterns, naming, layering, tests — every technique exists to serve one goal: making the next change cheaper.
When evaluating code, a refactor, a test, or an abstraction, ask: does this make the next change cheaper or more expensive?
Apply this lens before invoking any specific rule, pattern, or checklist below — if a downstream rule would raise change cost, this principle wins.
| Principle | Rule |
|---|---|
| YAGNI | Flag code solving hypothetical problems (unused params, speculative interfaces) |
| KISS | Flag unnecessary complexity. "Is there a simpler way?" |
| DRY | Grep for similar/duplicate code. 3+ similar patterns → flag for extraction |
| Clean Code | Readable > clever. Names reveal intent. Functions do ONE thing. Nesting <=3. Methods <30 lines |
| Convention | MUST ATTENTION grep 3+ existing examples before flagging violations. Codebase convention wins over textbook |
| No Bugs | Trace logic paths. Verify edge cases (null, empty, boundary). Check error handling |
| Proof Required | Every claim backed by file:line evidence. Speculation is forbidden |
| Doc Staleness | Cross-ref changed files against related docs. Flag stale/missing updates |
Technical correctness over social comfort. Verify before implementing. Evidence before claims.
python .claude/scripts/code_graph graph-blast-radius --json — prioritize files by impact (most dependents first)python .claude/scripts/code_graph query tests_for <function_name> --json — flag untested changed functionspython .claude/scripts/code_graph trace <file> --direction downstream --json — downstream impact (events, bus, cross-service)python .claude/scripts/code_graph trace <file> --direction both --json — full flow context for controllers/commands/handlersMANDATORY FIRST: Create Todo Tasks
| Task | Status |
|---|---|
[Review] Create report file | in_progress |
[Review Phase 0] Run graph blast-radius if available | pending |
[Review Phase 0.3] Detect high-risk change types | pending |
[Review Phase 0.5] Plan compliance check (skip if no active plan) | pending |
[Review Phase 0.7] Detect categories + route sub-agents | pending |
[Review Phase 0.7b] $review-ui sub-review — skip if no frontend/UI files in changeset | pending |
[Review Phase 1] File-by-file review + update report | pending |
[Review Phase 2] Holistic assessment | pending |
[Review Phase 3] Final findings, docs triage, and test sync findings | pending |
[Review Fix Loop] Validate findings, fix validated issues, and full re-review if fixes are applied | pending |
[Review Final] Consolidate all rounds | pending |
Step 0: Create Report File
Create plans/reports/code-review-{date}-{slug}.md with Scope, Files to Review sections.
Phase 0: Graph Blast Radius (FIRST WHEN AVAILABLE)
If .code-graph/graph.db exists, run graph impact analysis before reviewing:
python .claude/scripts/code_graph graph-blast-radius --json or the project equivalentIf graph data is unavailable, record "Graph not available — skipping blast radius" and continue.
Phase 0.3: Detect High-Risk Change Types
Before file review, inspect the target diff or explicit file set for:
Debugger Trace: End -> Start, all feeder paths, hypothesis matrix, owning fix layer, and forward convergence proof; missing trace evidence is a High/Critical review findingCreate focused review tasks for every true signal and complete them before dimensional review.
Phase 0.5: Plan Compliance Check (CONDITIONAL)
If active plan context exists, verify scope, test evidence, and success criteria against the plan before file review; otherwise record the skip reason.
Goal Contract mapping (CONDITIONAL — when an active goal exists): Resolve the active Goal Contract per the goal-contract-satisfaction-loop protocol (active plan goal.md → plans/goals/{YYMMDD-HHmm}-{slug}/goal.md). When found, map the reviewed changes to the saved success criteria in the report — which criteria this changeset advances (with file:line evidence), which it leaves untouched, and any change serving NO saved criterion (flag as scope drift unless justified). Record No active goal — mapping skipped. when none exists; do NOT create a goal file from inside a review.
Phase 0.7: Detect Review Categories
Before any review — classify the changeset and route sub-agents:
| Signal in changed files | Route to |
|---|---|
| Auth/permission/token/encryption files | security-auditor |
| Query files, caching, batch processing | performance-optimizer |
| Source code (logic, handlers, services) | code-reviewer |
Frontend/UI files (components, templates, .html/.scss/.css, design-system) | $review-ui skill (see Phase 0.7b) |
| Docs, plans, specs, markdown | general-purpose |
| Mixed changeset with security/perf files | Spawn specialized sub-agent first, then code-reviewer |
Phase 0.7b: Frontend/UI Sub-Review (CONDITIONAL — $review-ui)
If the changeset contains any frontend/UI files matching the project's configured UI patterns (components, templates, .html/.scss/.css, design-system tokens), invoke the $review-ui skill as a sub-review so UI-specific concerns are covered — long-content overflow (wrap vs ellipsis+tooltip), responsive multi-screen flex, flex-grow with min/max over fixed px, semantic z-index discipline (no raw numbers, no !important), and BEM classes on all template elements. Fold its findings into this report's Phase 3 results.
Skip (record reason) when no frontend/UI files are present in the changeset — log "Skipped Phase 0.7b — no frontend/UI files in changeset".
Phase 0.8: Derive Review Categories
Group changed files by: file language (extension), directory semantics (path), change nature (new entity, schema, config, UI, test).
For each category: name it, create sub-task, derive concerns using SYNC:category-review-thinking (first principles — NOT a fixed checklist).
Category list = Phase 1 work breakdown. Each category → own section in report.
Phase 1: File-by-File Review (Build Report)
For EACH file, immediately update report:
Phase 2: Holistic Review (Re-read Report)
After all files reviewed, re-read accumulated report:
## Plan Context: impl matches requirements, TCs have code evidence (not "TBD"), no requirement unaddressedMUST ATTENTION CHECK — Spec-Loop Test Discipline (changed core logic): Beyond the happy/error path traces above, hold changed core logic to a hard-to-fake bar. Apply a MUTATION-SCORE bar — a surviving mutant means a missing invariant, so demand the killing test — and do NOT accept a line-coverage % as proof of test strength. Flag any [HARD]/§5 invariant whose only coverage is example tests with no universally-quantified property TC (plus boundary counter-case) as a HIGH finding. Every behavior-changing finding requires a Dual-Feedback row (does it feed the spec? does it feed the tests? a blank axis = INCOMPLETE) — record it in the report. Adjudicate any spec divergence per SYNC:spec-drift-adjudication (CODE-WRONG / SPEC-STALE / AMBIGUOUS / SPEC-SILENT); a SPEC-SILENT finding — the code correctly enforces an invariant NO spec artifact states — has BOTH axes non-N/A: Spec feedback = add the missing §4 BR / §3 AC (+ §5 invariant if applicable) and a §8 TC via $spec [update] + $spec [mode=tests]; Test feedback = the new property/regression test guarding the now-written invariant — never leave a discovered invariant only in code or only in tests. Review the whole package (spec + tests + code), not just the diff, so the spec is enriched, not just patched.
MUST ATTENTION CHECK — Clean Code: YAGNI (unused params, speculative interfaces)? KISS (simpler exists)? Methods >30 lines or nesting >3?
MUST ATTENTION CHECK — Correctness: Null/empty/boundary handled? Error paths caught? Async race conditions? Trace happy + error paths.
Documentation Staleness Check:
For each changed file — grep file name/module across docs/ and AI tooling dirs. Changed behavior → flag stale doc (specific section + what changed). Flag the staleness only — never auto-fix docs here.
Common staleness patterns: count/limit changed → docs embedding that number | API/contract changed → API usage docs | hook/skill added/removed → catalogs/README | schema changed → entity reference docs.
Phase 3: Final Review Result
Update report: Overall Assessment, Critical Issues, High Priority, Architecture Recommendations, Documentation Staleness, Positive Observations.
If documentation staleness is detected, recommend docs-update and list exact stale sections; do not silently pass stale docs.
After Phase 3, do not spawn a fresh reviewer just to re-review the same finding set. First validate findings, then fix only validated findings. Because fixes change the review target, restart the full review after the fix cycle. If that restarted protocol uses sub-agents, construct each Agent call with the canonical template from SYNC:review-protocol-injection:
SYNC:review-protocol-injection verbatimSYNC:spec-tests-code-triangulation, SYNC:evidence-based-reasoning, SYNC:bug-detection, SYNC:design-patterns-quality, SYNC:complexity-prevention, SYNC:logic-and-intention-review, SYNC:test-spec-verification, SYNC:fix-layer-accountability, SYNC:rationalization-prevention, SYNC:graph-assisted-investigation, SYNC:understand-code-first"Run a full fresh code-review pass over the current assigned scope after validated fixes were applied. Focus: cross-cutting concerns, interaction bugs, convention drift, missing pieces, subtle edge cases, logic errors, test spec gaps, and regressions introduced by the fixes.""use the explicit files, plan scope, or reviewer-provided target range"plans/reports/code-review-rerun{N}-{date}.mdAfter sub-agent returns:
plans/reports/code-review-rerun{N}-{date}.md## Re-Review {N} Findings — DO NOT filter or override| # | Rule | Details |
|---|---|---|
| 1 | No Magic Values | All literals → named constants |
| 2 | Type Annotations | Explicit parameter and return types on all functions |
| 3 | Single Responsibility | One concern per method/class. Event handlers/consumers: one handler = one concern. NEVER bundle — a framework event dispatcher can swallow handler exceptions silently |
| 4 | DRY | No duplication; extract shared logic |
| 5 | Naming | Specific (orderRecords not data), Verb+Noun methods, is/has/can/should booleans, no abbreviations |
| 6 | Performance | No O(n²) (use dictionary). Project in query (not load-all). ALWAYS paginate. Batch-by-IDs (not N+1) |
| 7 | Entity Indexes | Collections: index management methods. EF Core: composite indexes. Expression fields match index order. Text search → text indexes |
Decision test: "Delete the DB and start fresh — does this data still need to exist?" Yes → Seeder/fixture. No → Migration.
| Type | Contains | NEVER contains |
|---|---|---|
| Seeder / Fixture | Default records, system config, reference data (idempotent — safe to run every startup) | Schema changes |
| Migration | Schema changes, column adds/removes, data transforms, index changes | Default records, permission seeds, system config |
Apply project's language/framework conventions. Principle universal — implementation project-specific.
When reviewing files with legacy and modern patterns:
project-config.json, package.json, or equivalent for "legacy", version flags, feature annotationsNEVER assume any specific framework's lifecycle. Derive from codebase evidence.
| Practice | Triggers | MUST ATTENTION READ |
|---|---|---|
| Receiving Feedback | Review comments received, feedback unclear/questionable, conflicts with existing decisions | references/code-review-reception.md |
| Requesting Review | After each subagent task, major feature done, targeted review scope, after complex bug fix | references/requesting-code-review.md |
| Verification Gates | Before any completion claim, commit, push, or PR. ANY success/satisfaction statement | references/verification-before-completion.md |
SITUATION?
│
├─ Received feedback
│ ├─ Unclear items? → STOP, ask for clarification first
│ ├─ From human partner? → Understand, then implement
│ └─ From external reviewer? → Verify technically before implementing
│
├─ Completed work
│ ├─ Major feature/task? → Request code-reviewer subagent review
│ └─ Before merge? → Request code-reviewer subagent review
│
└─ About to claim status
├─ Have fresh verification? → State claim WITH evidence
└─ No fresh verification? → RUN verification command first
Pattern: READ → UNDERSTAND → VERIFY → EVALUATE → RESPOND → IMPLEMENT
Source handling: Human partner → implement after understanding. External reviewer → verify technically, push back if wrong.
Full protocol: references/code-review-reception.md
BASE_SHA=$(git rev-parse HEAD~1) and HEAD_SHA=$(git rev-parse HEAD)Full protocol: references/requesting-code-review.md
Iron Law: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
Gate: IDENTIFY command → RUN it → READ output → VERIFY it confirms claim → THEN claim. Skip any step = lying.
| Claim | Required Evidence |
|---|---|
| Tests pass | Test output shows 0 failures |
| Build succeeds | Build command exit 0 |
| Bug fixed | Original symptom test passes |
| Requirements met | Line-by-line checklist verified |
Red Flags — STOP: "should"/"probably"/"seems to", satisfaction before verification, committing without verification, trusting agent reports.
Full protocol: references/verification-before-completion.md
code-simplifierdebug-investigaterefactoringWhen Phase 1 finds 10+ changed files, apply the Systematic Review Batching protocol (map-reduce: size-capped batches + hierarchical synthesis) defined below.
MANDATORY — NO EXCEPTIONS: If NOT already in a workflow, use a direct user question to ask user:
- Activate
workflow-review-changesworkflow (Recommended) — full review → validated fix cycle → re-review until clean- Execute
$code-reviewdirectly — run standalone
For each changed file, verify no forbidden layer imports:
docs/project-config.json → architectureRules.layerBoundariespaths glob patternsarchitectureRules.excludePatterns"BLOCKED: {layer} layer file {filePath} imports from {forbiddenLayer} ({importStatement})"If architectureRules absent in project-config.json → skip silently.
Purpose: Adversarial validation of own findings BEFORE handoff. Catches over-flagged Highs, false positives, and severity inflation at the source rather than letting them propagate downstream.
Trigger: Any finding produced (Critical, High, Medium, OR Low). Skip ONLY when the report's verdict is unconditional PASS with literally zero findings.
Protocol:
plans/reports/{skill}-{date}-{slug}.md$why-review skill with arg: validate findings in plans/reports/{skill}-{date}-{slug}.md — verify each finding has file:line proof, steel-man each rejected interpretation, and stress-test severity classificationsplans/reports/why-review-validate-{date}.md## Why-Review Validation Notes section citing what changed and why## Why-Review Validation line to own report stating "All N findings re-validated against actual code; no severity changes."Skip conditions (record explicit reason if skipping):
Why this exists: AI sub-agent reports inherit confirmation bias — the orchestrator absorbs severity claims as ground truth. The 2026-05-09 review incident produced 5 Highs; adversarial validation demoted 3 of them. Codify this as standard practice.
MANDATORY — NO EXCEPTIONS after completing, use a direct user question:
Completion ≠ Correctness. Before reporting ANY work done:
[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.
Critical Purpose: Ensure quality — no flaws, bugs, missing updates, stale content. Verify code AND documentation.
External Memory: Complex work → write findings incrementally to
plans/reports/— prevents context loss, serves as deliverable.
Evidence Gate: MANDATORY — every claim, finding, recommendation requires
file:lineproof + confidence % (>80% act, <80% verify first).
OOP & DRY: MANDATORY — flag patterns extractable to base class/generic/helper. Same-suffix/lifecycle/responsibility classes share common base. Apply idiomatic abstraction (base class, mixin, trait, protocol) for project's language. Verify linting/analyzer configured.
Systematic Review Batching (map-reduce) — When a changeset is large, do NOT review files one-by-one. Partition into size-capped batches, fire one specialized sub-agent per batch in parallel, then reduce. This bounds EVERY context — each batch agent AND the orchestrator — so coverage stays complete as file count grows.
Trigger ladder (one ordered escalation — not competing thresholds):
- < 10 changed files → sequential per-file review (default; no batching).
- ≥ 10 changed files → switch to systematic parallel mode. Announce:
"Detected {N} changed files. Switching to systematic parallel review protocol."Then: categorize → size-capped batches → flat consolidation.- categories > 6 OR files > 40 → additionally insert the hierarchical synthesis tier (below). Everything from rung 2 still applies.
Step 1 — Categorize. Group changed files into logical categories derived from the project's actual structure (not forced). Category is the concern axis; orient with these examples, derive what fits the repository:
Category Type Example Groupings Agent/Tooling AI scripts, hooks, skill definitions, workflow configs, linting rules Root config/docs Root README, project config, CI/CD pipeline configs Reference docs Architecture docs, patterns references, setup guides Feature/domain docs Business feature documentation, spec files, ADRs Backend logic Service/handler/controller source (infer from project structure) Frontend logic UI component/state/API source (infer from project structure) Data/Schema Migrations, schema files, seed data Tests Unit, integration, E2E test files Infrastructure Docker, k8s, CI/CD, cloud manifests Step 2 — Size-capped batches. One sub-agent per batch of ≤8 files OR ≤2000 diff-lines, whichever hits first. Category stays the concern axis, but any category exceeding a cap splits into multiple size-capped batches (30 backend files → 4 batches). Size caps — not category caps — make "many files" safe: a category cap alone lets one giant category blow a single agent's context.
Step 2a — Sub-agent type per batch (match the batch's dominant concern):
- Code logic (any stack) →
code-reviewer- Security-sensitive changes →
security-auditor- Performance-critical paths →
performance-optimizer- Docs, plans, specs, configs, infra →
general-purposeEach batch sub-agent receives: its full file list;
SYNC:category-review-thinkingas its primary thinking model — derive each category's concerns from first principles, NOT a fixed checklist (if the consuming skill does not carry that block, apply category-first thinking directly); project reference docs relevant to its concern (discover via*patterns*,*conventions*,*style-guide*); cross-reference verification instructions (counts, tables, links). All batch agents run in parallel and write findings toplans/reports/(perSYNC:task-tracking-external-report); reducers read from disk, never from memory.Step 3 — Reduce.
- Flat reduction (rung 2, ≤6 categories AND ≤40 files): the orchestrator collects each batch report, cross-references counts/tables/contracts ACROSS batches, detects gaps visible only across categories (feature in code but missing from docs; new API endpoint with no client call), and consolidates into one categorized holistic report.
- Hierarchical reduction (rung 3, > 6 categories OR > 40 files): insert a mid-tier — each concern gets ONE synthesizer agent that reads only its own batch reports and emits a single concern-synthesis. The orchestrator reads the concern-syntheses (~5), never the raw batch reports — keeping the reducer's context O(#concerns), not O(#files).
- Cross-concern interaction pass (mandatory at rung 3 — closes the synthesis-tier blind spot): concern-siloed synthesis can drop an interaction spanning two concerns AND two batches (tainted source in data-layer/batch 7 → sink in api/batch 3). So: (a) each concern-synthesizer MUST emit an explicit "cross-concern interaction candidates" list — entities/symbols/contracts it touched that plausibly bind to another concern (shared DTOs, event names, table/collection names, exported symbols); (b) the orchestrator MUST run the Step-3 cross-reference/gap step over those candidate lists across all concern-syntheses, not only within a batch, before concluding. Without this pass the tier trades completeness for context-bounding on exactly the large diffs it targets.
Step 4 — Holistic assessment. With all findings combined, judge: overall coherence as a unified intent; cross-category sync (docs match code? contracts match callers?); risk areas where categories interact; missing doc/spec updates for changed artifacts.
No silent truncation. If any cap forces sampling or a batch is dropped for budget, ANNOUNCE the dropped/sampled scope explicitly — bounded coverage must never read as complete coverage.
End-to-Start Debugger Trace — For non-trivial bugs, failed verification, regression fixes, behavior-changing code, or unclear code flow, start from the observed final state and walk backward before proposing a fix.
- Frame 0: observed end state — Name the exact user-visible output, failing assertion, log line, persisted value, API response, rendered UI, or aggregate bucket. Record the reader/query/renderer that produced it with
file:lineevidence.- Walk backward one hop at a time — Trace final reader -> projection/cache/storage -> writer -> consumer/handler/job -> producer/caller -> original trigger. At every hop record: input, transformation, output, owner, and evidence.
- Enumerate all feeder paths — Find every upstream producer/caller/event/job that can write into the final path, including retry, async, cache, background, and alternate UI/API paths. Mark each path verified, ruled out, or still unknown.
- Build the hypothesis matrix — For each plausible cause, list evidence for, evidence against, how to reproduce/verify, blast radius, and status (
primary,contributing,ruled out,latent). Do not fix until competing causes are explicitly resolved or bounded.- Choose the owning fix layer — Identify the invariant owner and the lowest shared point that protects all downstream consumers. A fix at the symptom site is rejected unless the symptom site owns the invariant.
- Prove convergence forward — After choosing the fix, walk start -> end again and show how the corrected state reaches the observed final output. Map each root cause to a fix part and each fix part to a test/proof.
BLOCKED until: final state named · backward trace written · all feeder paths enumerated · hypothesis matrix completed · owning fix layer justified · forward convergence proof mapped to tests.
NEVER: Start at the first suspicious code path. Collapse multiple producers into one "flow". Treat duplicate symptoms as duplicate records without proving the read model. Skip ruled-out hypotheses.
Graph-Assisted Investigation — MANDATORY when
.code-graph/graph.dbexists.HARD-GATE: MUST ATTENTION run at least ONE graph command on key files before concluding any investigation.
Pattern: Grep finds files →
trace --direction bothreveals full system flow → Grep verifies details
Task Minimum Graph Action Investigation/Scout trace --direction bothon 2-3 entry filesFix/Debug callers_ofon buggy function +tests_forFeature/Enhancement connectionson files to be modifiedCode Review tests_foron changed functionsBlast Radius trace --direction downstreamCLI:
python .claude/scripts/code_graph {command} --json. Use--node-mode filefirst (10-30x less noise), then--node-mode functionfor detail.
Category Review Thinking — A thinking framework for reviewing any category of changed files. NOT a fixed checklist — derive concerns from domain knowledge; the examples are starting points only. Your knowledge of the category exceeds any list here — trust it.
Step 1 — Understand the category's role. What is this category responsible for in the overall system? What invariants must it uphold? What are its consumer contracts (who depends on it, what do they expect)?
Step 2 — Read project conventions for this category. Search for reference docs, style guides, ADRs, or READMEs specific to this area. Grep 3+ existing similar files — extract naming conventions, structural patterns, shared base classes. If no docs exist, derive conventions empirically from existing code.
Step 3 — Derive concerns from first principles. Apply all that are relevant; expand beyond this list based on the actual category:
- Correctness: Does the logic match the intent? Trace happy path AND error path.
- Boundary contracts: Are interfaces/APIs/events/protocols honored? No implicit coupling introduced?
- Project conventions: Does new code follow the patterns found in Step 2? Evidence-confirmed, not assumed.
- Security: Auth enforced at every entry point? Input validated at boundaries? No secrets in the diff?
- Performance: Unbounded operations? N+1 patterns? Blocking calls in async context? Unindexed queries?
- Maintainability: DRY? Single responsibility? Complexity within reason? Names reveal intent?
- Test coverage: Are the changed paths covered by tests? Are existing tests still valid after the change?
- Documentation: Do related docs, specs, or READMEs reflect the changes?
Step 4 — Create sub-tasks and execute. For each identified concern: create a task tracking sub-task, work through it with
file:lineevidence, mark done. No findings without proof.Illustrative concern examples by category type (not exhaustive — trust your knowledge beyond this):
- Server-side logic: handler/service structure conventions, validation layer placement, side-effect isolation, cross-service boundary enforcement, data-access layer separation, error propagation strategy
- Client-side logic: component lifecycle management, resource cleanup (subscriptions, listeners, timers), state management patterns, API integration layer separation, reactive stream composition
- Data/Schema: migration reversibility (rollback script), lock impact on table volume, backfill idempotency, index coverage for query patterns, deployment ordering
- Configuration: present in ALL environments? No secrets in diff? App fails fast if config missing (not silently null)? Documented in setup guide?
- Infrastructure: dev/prod parity? No hardcoded dev values (localhost, debug flags)? Pinned image/dependency versions? CI/CD secret requirements documented?
- Styles/Assets: follows project naming conventions? Uses design variables/tokens (no hardcoded magic values)? Correct scope (no global side effects from component styles)?
- Documentation: accurate? Links valid? Examples still match current code/behavior? Covers new scenarios?
- Tests: assertions verify specific outcomes (not just "no exception")? Idempotent (repeatable N times)? Covers edge cases, not just happy path?
- Security artifacts: all code paths reach the gate? Negative tests exist (unauthorized denied)? Both enforcement AND display control updated?
- Build/Tooling: rule changes apply consistently? No exceptions that silently swallow violations? Impact on CI runtime documented?
Sub-Agent Return Contract — When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary — NEVER requests full sub-agent output inline.
## Sub-Agent Result: [skill-name] Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL Confidence: [0-100]% ### Findings (Critical/High only — max 10 bullets) - [severity] [file:line] [finding] ### Actions Taken - [file changed] [what changed] ### Blockers (if any) - [blocker description] Full report: plans/reports/[skill-name]-[date]-[slug].mdMain agent reads
Full reportfile ONLY when: (a) resolving a specific blocker, or (b) building a fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — not held in memory.Context budget — the return payload is a SUMMARY, not a transcript: ≤10 finding bullets, no raw file contents / full diffs / verbatim logs inline, no re-pasted source. Everything beyond the summary lives in the
Full reporton disk. A sub-agent that would exceed the summary shape MUST write the detail to its report and return only the pointer — the orchestrator's context is the scarce resource the whole map-reduce protects.
Nested Task Expansion Contract — For workflow-step invocation, the
[Workflow] ...row is only a parent container; the child skill still creates visible phase tasks.
- Call the current task list first. If a matching active parent workflow row exists, set
nested=trueand recordparentTaskId; otherwise run standalone.- Create one task per declared phase before phase work. When nested, prefix subjects
[N.M] $skill-name — phase.- When nested, link the parent with
TaskUpdate(parentTaskId, addBlockedBy: [childIds]).- Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.
- Mark exactly one child
in_progressbefore work andcompletedimmediately after evidence is written.- Complete the parent only after all child tasks are completed or explicitly cancelled with reason.
Blocked until: the current task list done, child phases created, parent linked when nested, first child marked
in_progress.
Project Reference Docs Gate — Run after task-tracking bootstrap and before target/source file reads, grep, edits, or analysis. Project docs override generic framework assumptions.
- Identify scope: file types, domain area, and operation.
- Required docs by trigger: always
docs/project-reference/lessons.md; doc lookupdocs-index-reference.md; reviewcode-review-rules.md; backend/CQRS/APIbackend-patterns-reference.md; domain/entitydomain-entities-reference.md; frontend/UIfrontend-patterns-reference.md; styles/designscss-styling-guide.md+design-system/design-system-canonical.md; integration testsintegration-test-reference.md; E2Ee2e-test-reference.md; feature docs/specsfeature-spec-reference.md+spec-system-reference.md+spec-principles.md; behavior/public-contract/spec-test-code syncworkflow-spec-test-code-cycle-reference.md; derived spec index/ERD/reimplementation guidesspec-system-reference.md+ source Feature Specs underdocs/specs/; architecture/new areaproject-structure-reference.md.- Read every required doc. If
docs/project-config.json, the docs index,lessons.md,CLAUDE.md,AGENTS.md, or any task-required reference doc is missing or stale, auto-run$project-initor the narrow lower-level route ($project-config,$docs-init,$scan-all,$scan --target=<key>,$claude-md-init) before ordinary project-specific work. If Codex mirrors orAGENTS.mdare missing/stale, ask the user to run$sync-codex; do not auto-run it.- Before target work, state:
Reference docs read: ... | Not applicable: ....Ready when: scope evaluated, required docs checked/read or setup route completed,
lessons.mdconfirmed, citation emitted.
Task Tracking & External Report Persistence — Bootstrap this before execution; then run project-reference doc prefetch before target/source work.
- Create a small task breakdown before target file reads, grep, edits, or analysis. On context loss, inspect the current task list first.
- Mark one task
in_progressbefore work andcompletedimmediately after evidence; never batch transitions.- For plan/review work, create
plans/reports/{skill}-{YYMMDD}-{HHmm}-{slug}.mdbefore first finding.- Append findings after each file/section/decision and synthesize from the report file at the end.
- Final output cites
Full report: plans/reports/{filename}.Blocked until: task breakdown exists, report path declared for plan/review work, first finding persisted before the next finding.
Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.
Sequential Thinking Protocol — Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.
Trigger when: complex problem decomposition · adaptive plans needing revision · analysis with course correction · unclear/emerging scope · multi-step solutions · hypothesis-driven debugging · cross-cutting trade-off evaluation.
Format (explicit mode — visible thought trail):
Thought N/M: [aspect]— one aspect per thought, state assumptions/uncertaintyThought N/M [REVISION of Thought K]: ...— when prior reasoning invalidated; state Original / Why revised / ImpactThought N/M [BRANCH A from Thought K]: ...— explore alternative; converge with decision rationaleThought N/M [HYPOTHESIS]: ...then[VERIFICATION]: ...— test before actingThought N/N [FINAL]— only when verified, all critical aspects addressed, confidence >80%Mandatory closers: Confidence % stated · Assumptions listed · Open questions surfaced · Next action concrete.
Stop conditions: confidence <80% on any critical decision → escalate via ask the user directly · ≥3 revisions on same thought → re-frame the problem · branch count >3 → split into sub-task.
Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).
Deep-dive: see
$sequential-thinkingskill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (API design, debugging, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).
Evidence-Based Reasoning — Speculation is FORBIDDEN. Every claim needs proof.
- Cite
file:line, grep results, or framework docs for EVERY claim- Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend
- Cross-service validation required for architectural changes
- "I don't have enough evidence" is valid and expected output
BLOCKED until:
- [ ]Evidence file path (file:line)- [ ]Grep search performed- [ ]3+ similar patterns found- [ ]Confidence level statedForbidden without proof: "obviously", "I think", "should be", "probably", "this is because" If incomplete → output:
"Insufficient evidence. Verified: [...]. Not verified: [...]."
Design Patterns Quality — Priority checks for every code change:
- DRY via OOP: Identify classes/modules with the same purpose, naming pattern, or lifecycle. Apply your knowledge of the project's language/framework to determine the idiomatic abstraction (base class, mixin, trait, protocol, decorator). 3+ similar patterns → extract to shared abstraction.
- Right Responsibility: Logic in LOWEST layer (Entity > Domain Service > Application Service > Controller). Never business logic in controllers.
- SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).
- After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.
- YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.
Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.
Serial Attention for Design Quality — Scan one quality dimension at a time (serial passes), not all concerns at once. — why: split attention misses violations that single-focus passes catch.
- Identify applicable dimensions — Based on the code's language, domain, and patterns, determine which quality dimensions apply: DRY, SOLID principles (SRP/OCP/LSP/ISP/DIP), OOP idioms, cohesion/coupling, GRASP, Law of Demeter, CQRS invariants, etc. Your list is NOT fixed — derive from what the code actually does.
- One focused pass per dimension — Dedicate single-focus attention to EACH dimension in sequence. Do NOT mix concerns across passes.
- Threshold: 3+ similar patterns = MANDATORY extraction — Not optional suggestion. Flag as mandatory structural fix requiring action.
- 2+ violations of same kind = structural finding — Report as "pattern problem" needing architectural resolution, not a list of individual instances.
Complexity Prevention (Ousterhout) — MANDATORY. Measure code by cost of change: one business change should map to one code change. Flag ALL of the following in review:
- Change amplification — small business change forces edits in >3 places → structural flaw. Count edit sites for a plausible future change (add variant, add field, add authorization). >3 = reject.
- Cognitive load — reader must hold too much context to safely modify. Flag deep inheritance, long parameter lists, boolean traps, implicit ordering dependencies.
- Cross-cutting duplication at entry points — logging, error handling, validation, auth, transactions reimplemented per controller/handler/route. Lift to middleware / interceptor / filter / decorator / aspect.
- Leaked implementation technology — repos returning
IQueryable/QuerySet/Criteria/raw cursors/ORM entities to callers. Return finished results + intent-revealing methods (GetActiveVipUsers()notQuery()).- Type-switch scattering —
switch/if-chains on enum/discriminator in >1 place. New variant = new file, not N edits. One factory/registry switch at the boundary OK; scattered switches = reject.- Anemic models — domain objects with only getters/setters, logic floats in services. Move invariants/behavior onto the object (
order.Checkout(), notorder.Status = ...).- Primitive obsession — raw
string/int/decimalfor account numbers, emails, money, percentages, date ranges, with re-validation at every entry. Wrap in value objects / records / structs that validate once at construction.- Inline cross-cutting concerns — authorization/tenant isolation/audit/sanitization hand-written at top of every handler. Flag intent with declarative markers (
@RequirePermission("Order.Delete")), enforce once centrally.- Shallow modules — tiny class, big interface (many public methods, many flags, many ctor params) wrapping little logic. A module is deep when a small interface hides a lot of implementation. If interface ≈ implementation cost to learn → inline.
- Missing base class for repeated component/handler lifecycle — 3+ forms/CRUD handlers/list views reimplementing loading/dirty/submit/pagination → extract to base class / hook / composable / mixin / trait.
- Premature vs delayed abstraction — rule-of-three. First occurrence: write it. Second: notice duplication. Third: extract. Don't build generic frameworks before real variation; don't copy-paste for the 4th time.
- Embedded utility logic not extracted to helpers — inline paging loops (
while (hasMore) { skip += take; ... }), ad-hoc datetime math, string parsing/formatting, collection partitioning, retry/backoff loops, URL/query-string building. If the algorithm is non-trivial AND stack-generic (not business-specific), extract toutil/helper/extensionsand let consumers call one line. Inline duplicates → duplicated bug surface.- Logic in wrong (higher) layer — downshift to callee — business/derivation logic written in the caller when the callee owns the data. Defaults: Controller code that should be App Service. App Service code that should be Domain Service or Entity. Component code that should be ViewModel/Store/Service. Caller reaching into callee's data shape to compute something → move the computation behind an intent-revealing method on the callee. Lowest responsible layer wins (Entity > Domain Service > App Service > Controller · Model/VM > Store > Component). Higher-layer placement = duplicated logic when a sibling caller needs the same thing.
- Owner owns the rule — extract on first write — if a caller inlines logic that derives, normalizes, validates, or computes from another type's data, MOVE it to the owning type. Single use is sufficient — the trigger is wrong responsibility, not duplication. Sibling callers always arrive; inline copies drift silently with no compile error and no name to grep. Common offenders: Backend — inlined rules in application-layer handlers / commands / queries / services / controllers that belong on the domain entity / value object / domain service. Frontend — inlined derivations / formatting / validation in components that belong on the model / store / view-model / API service. Fix: name the rule once as a method (static or instance) on the owning type; callers invoke by name. Future variant → SECOND named method on the owner, never an inline near-duplicate. Right responsibility first; reuse is the consequence.
Extraction target — where the named rule lives:
Shape of the rule Goes to Pure function over an entity's own data static method on the entity Behavior that mutates / guards entity state instance method on the entity Always-true invariant on a primitive value value object constructor Needs DI (repo / settings / clock) helper class registered in DI Domain-agnostic algorithm reused across types util / extension method Pure shape / projection conversion DTO mapping Pre-commit edit-site test (reject if answer is "many"):
Change Scenario Should touch Add new variant (customer type, payment method) 1 new file Change HTTP error response format 1 middleware/filter Add timestamp field to every persisted entity 1 base entity/interceptor Add authorization to a new endpoint 1 declarative marker Swap database/ORM Data layer only Change business calculation rule 1 method on owning entity Add loading indicator pattern to forms 1 base component/hook Add validation rule to a domain primitive 1 value-object ctor Change paging/retry/datetime algorithm 1 helper/util function Change a derivation of entity data 1 method on the entity Operating heuristics:
- Write the call site first.
- Count edit sites for plausible future change.
- Prefer removing code over adding it.
- Surface assumptions at boundaries, hide details inside.
- Pre-reuse scan — before writing a non-trivial block, grep for similar algorithms (
while.*skip,DateTime.*Add,split/joinchains, paging loops, retry loops). Match existing helper → call it. None exists but pattern is stack-generic → extract to util before second caller appears.- Layer placement test — ask "if a sibling caller needed this tomorrow, would they re-derive it?" If yes, the logic is in the wrong layer. Move it down.
- Open-case-for-future-reuse — if reviewer spots a block that is likely to appear in another feature (domain-agnostic algorithm, shared lifecycle, recurring derivation), do NOT rationalize with pure YAGNI. Either extract now (if cheap) or create a tracked TODO with the exact extraction target so the second caller does not duplicate silently. Silent duplication is the default failure mode.
- When in doubt ask: "What would need to change if the requirement shifts?"
The measure of good code is the cost of change. Not shortest. Not cleverest. Not most abstracted. Cheapest to safely modify having read a small local portion.
Validated-Finding Fix + Full Re-Review Loop — Re-review is triggered by a validated finding fix cycle, not by a round number. Review purpose:
review → validate findings → fix validated findings → full re-reviewuntil a complete review pass finds no issues. A clean review ENDS the loop — no further rounds required.Round 1: Main-session review. Read target files, build understanding, note issues. Output findings + verdict (PASS / FAIL).
Decision after Round 1:
- No issues found (PASS, zero findings) → review ENDS. Do NOT spawn a fresh sub-agent for confirmation.
- Issues found (FAIL, or any non-zero findings) → run the active review skill's findings-validation gate first; for review skills the default gate is
$why-review --validate-findings <report-path>. Fix only validated findings, then restart the full review protocol from the beginning with a fresh task breakdown.Fresh full re-review after every fix cycle: Re-run the whole review protocol over the current full target. When sub-agents are part of that protocol, spawn NEW
spawn_agentcalls — never reuse prior agents. Reviewers re-read ALL files from scratch with ZERO memory of prior rounds. SeeSYNC:fresh-context-reviewfor the spawn mechanism andSYNC:review-protocol-injectionfor the canonical Agent prompt template. Each fresh full review must catch:
- Cross-cutting concerns missed in the prior round
- Interaction bugs between changed files
- Convention drift (new code vs existing patterns)
- Missing pieces that should exist but don't
- Subtle edge cases the prior round rationalized away
- Regressions introduced by the fixes themselves
Loop termination: After each full re-review, repeat the same decision: clean → END; issues → validate findings → fix → restart from the first review phase. Continue until a complete review pass finds zero issues. If the same validated finding repeats for 3 full invocations with no progress, or a fix requires product/owner input, escalate via a direct user question.
Rules:
- A clean Round 1 ENDS the review — no mandatory Round 2
- NEVER fix unvalidated findings; validate first using the caller's validation gate
- NEVER skip the full re-review after a fix cycle (every fix invalidates the prior verdict)
- NEVER reuse a sub-agent across rounds — every iteration that uses sub-agents spawns NEW Agent calls
- Main agent READS sub-agent reports but MUST NOT filter, reinterpret, or override findings
- No arbitrary sub-agent-round cap replaces the clean-review requirement; use the 3 repeated-no-progress blocker rule only to avoid infinite spinning
- Track recursive invocation count and repeated blockers in conversation context (session-scoped)
- Final verdict must incorporate ALL rounds executed
Report must include
## Round N Findings (Fresh Sub-Agent)for every round N≥2 that was executed.
Fresh Context Re-Review — Eliminate orchestrator confirmation bias after fixes by restarting the full review with isolated sub-agents where applicable.
Why: The main agent knows what it (or
$feature-implement) just fixed and rationalizes findings accordingly. A fresh sub-agent has ZERO memory, re-reads from scratch, and catches what the main agent dismissed. Sub-agent bias is mitigated by (1) fresh context, (2) verbatim protocol injection, (3) main agent not filtering the report.When: ONLY after a validated-finding fix cycle. A review round that finds zero issues ENDS the loop — do NOT spawn a confirmation sub-agent. A review round that finds issues triggers: validate findings → fix → full review restart from the first phase.
How:
- Start a NEW full review invocation/task breakdown; when that protocol calls for agents, spawn NEW
spawn_agenttool calls — usecode-revieweragent_type for code reviews,general-purposefor plan/doc/artifact reviews- Inject ALL required review protocols VERBATIM into the prompt — see
SYNC:review-protocol-injectionfor the full list and template. Never reference protocols by file path; AI compliance drops behind file-read indirection (seeSYNC:shared-protocol-duplication-policy)- Sub-agent re-reads ALL target files from scratch via its own tool calls — never pass file contents inline in the prompt
- Sub-agent writes structured report to
plans/reports/{review-type}-round{N}-{date}.md- Main agent reads the report, integrates findings into its own report, DOES NOT override or filter
Rules:
- SKIP fresh sub-agent when the prior full review found zero issues (no fixes = nothing new to verify)
- NEVER skip the full review restart after a fix cycle — every fix invalidates the prior verdict
- NEVER reuse a sub-agent across rounds — every fresh round spawns a NEW
spawn_agentcall- Continue until a complete full review pass has zero findings; if the same blocker repeats 3 times with no progress, escalate via a direct user question
- Track iteration count and repeated blockers in conversation context (session-scoped, no persistent files)
Review Protocol Injection — Every fresh sub-agent review prompt MUST embed 11 protocol blocks VERBATIM. The template below has ALL 11 bodies already expanded inline. Copy the template wholesale into the Agent call's
promptfield at runtime, replacing only the{placeholders}in Task / Round / Reference Docs / Target Files / Output sections with context-specific values. Do NOT touch the embedded protocol sections.Why inline expansion: Placeholder markers would force file-read indirection at runtime. AI compliance drops significantly behind indirection (see
SYNC:shared-protocol-duplication-policy). Therefore the template carries all 11 protocol bodies pre-embedded.
code-reviewer — for code reviews (reviewing source files, git diffs, implementation)general-purpose — for plan / doc / artifact reviews (reviewing markdown plans, docs, specs)spawn_agent({
description: "Fresh Round {N} review",
agent_type: "code-reviewer",
prompt: `
## Task
{review-specific task — e.g., "Review all uncommitted changes for code quality" | "Review plan files under {plan-dir}" | "Review integration tests in {path}"}
## Round
Round {N}. You have ZERO memory of prior rounds. Re-read all target files from scratch via your own tool calls. Do NOT trust anything from the main agent beyond this prompt.
## Protocols (follow VERBATIM — these are non-negotiable)
### Spec ↔ Tests ↔ Code Triangulation
DO THIS FIRST — before any per-protocol check below. The review target is the WHOLE PACKAGE, not the diff alone: load the behavior's spec (§3 ACs / §4 BRs / §8 TCs), its tests, and the changed code TOGETHER, and reason about their mutual consistency BEFORE judging any one in isolation.
1. Locate all three faces: the Feature Spec section(s) governing the changed behavior, the tests that guard it, and the production code that implements it. A missing face is itself a finding (SPEC-GAP / TEST-GAP / DEAD-SPEC).
2. Triangulate pairwise — every disagreement is a finding; classify which face is wrong:
- code vs spec: behavior the code does that no §3/§4/§8 rule describes → CODE-EXTRA or SPEC-STALE; a [HARD] §4 rule or §5 invariant with no enforcing code path → CODE-WRONG.
- tests vs spec: a §8 TC with no test, or a test asserting behavior no TC/rule names → TEST-GAP or SPEC-SILENT.
- tests vs code: a changed code path with no covering test → TEST-GAP; a test that still passes against a deliberately broken invariant → WEAK-TEST (apply the mutation thinking in Bug Detection).
3. Hidden-rule capture: any invariant the code enforces but the spec never states (SPEC-SILENT) MUST be surfaced as a finding to add into §3/§4/§8 AND guarded with a test — the enrichment loop, never a silent pass.
4. Only after the three faces agree — or every disagreement is logged as a finding — proceed to the per-protocol checks below; when enrichment adds spec/test content, re-review the package against the enriched spec.
NEVER mark review PASS while any spec/test/code face disagrees without a logged finding. The diff is the entry point; the package is the unit of judgment.
### Evidence-Based Reasoning
Speculation is FORBIDDEN. Every claim needs proof.
1. Cite file:line, grep results, or framework docs for EVERY claim
2. Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend
3. Cross-service validation required for architectural changes
4. "I don't have enough evidence" is valid and expected output
BLOCKED until: Evidence file path (file:line) provided; Grep search performed; 3+ similar patterns found; Confidence level stated.
Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because".
If incomplete → output: "Insufficient evidence. Verified: [...]. Not verified: [...]."
### Bug Detection
MUST check categories 1-4 for EVERY review. Never skip.
1. Null Safety: Can params/returns be null? Are they guarded? Optional chaining gaps? .find() returns checked?
2. Boundary Conditions: Off-by-one (< vs <=)? Empty collections handled? Zero/negative values? Max limits?
3. Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally?
4. Resource Management: Connections/streams closed? Subscriptions unsubscribed on destroy? Timers cleared? Memory bounded?
5. Concurrency (if async): Missing await? Race conditions on shared state? Stale closures? Retry storms?
6. Stack-Specific: Check the configured language/runtime pitfalls and framework-specific failure modes discovered from local code.
Classify: CRITICAL (crash/corrupt) → FAIL | HIGH (incorrect behavior) → FAIL | MEDIUM (edge case) → WARN | LOW (defensive) → INFO.
### Design Patterns Quality
Priority checks for every code change:
1. DRY via OOP: Same-suffix classes (*Entity, *Dto, *Service) MUST share base class. 3+ similar patterns → extract to shared abstraction.
2. Right Responsibility: Logic in LOWEST layer (Entity > Domain Service > Application Service > Controller). Never business logic in controllers.
3. SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).
4. After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.
5. YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.
Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.
### Logic & Intention Review
Verify WHAT code does matches WHY it was changed.
1. Change Intention Check: Every changed file MUST serve the stated purpose. Flag unrelated changes as scope creep.
2. Happy Path Trace: Walk through one complete success scenario through changed code.
3. Error Path Trace: Walk through one failure/edge case scenario through changed code.
4. Acceptance Mapping: If plan context available, map every acceptance criterion to a code change.
5. Tests Verify Intent: For test/spec changes, verify tests name the protected business rule or invariant and would fail if that intent breaks.
6. Migration Test Exclusion: Do not write tests for migration code. Schema/data migrations are one-time execution paths, not core application logic.
NEVER mark review PASS without completing both traces (happy + error path).
### Test Spec Verification
Map changed code to test specifications.
1. Identify the project's test/spec format from existing docs, test-case files, BDD feature files, or spec folders.
2. Every changed code path MUST map to a corresponding test case/spec (or flag as "needs test case").
3. New functions/endpoints/handlers → flag for test spec creation.
4. Migration files are excluded from test/spec creation; schema/data migrations are one-time execution paths, not core application logic.
5. If spec evidence fields exist, verify they point to actual code (file:line, not stale references).
6. Verify each meaningful test case names the business intent/invariant; flag behavior-only cases that only mirror implementation details.
7. Auth/data changes → verify corresponding authorization and data-state test cases exist.
8. If no specs exist for a changed path → log the gap and recommend the project's test-spec workflow.
NEVER skip test mapping. Untested code paths are the #1 source of production bugs.
### Behavioral Delta Matrix
MANDATORY for any bugfix review. Produce input-state × pre-fix × post-fix × delta table BEFORE writing verdict.
- Minimum 3 rows; include at least one row OUTSIDE the original bug report.
- Any "REGRESSION" delta → review returns FAIL until a preservation test is added.
- Narrative descriptions do NOT substitute for the matrix.
Example rows (external-record sync fix):
| Input | Pre-fix | Post-fix | Delta |
| --------------------- | ------- | ------------------------- | ---------- |
| Record exists (valid) | Reused | Always recreated → orphan | REGRESSION |
| Record missing (404) | Error | Recreated | Fixed |
### Fix-Layer Accountability
NEVER fix at the crash site. Trace the full flow, fix at the owning layer. The crash site is a SYMPTOM, not the cause.
MANDATORY before ANY fix: