| name | code-review |
| description | Multi-domain code review (agents + Conventional Comments → findings + verdict). Triggers: "code review" | "review changes" | "review PR #42" | "check my code" | "review my changes" | "review this PR" | "do a code review" | "review the diff" | "look at my code". |
Code Review
Success
I := F collected ∧ verdict posted (PR ∃) ∧ Phase 8 decision made
V := gh pr view {N} --comments | grep "## Code Review" ∧ verdict ∈ {Approve, Request changes}
Review branch/PR via fresh domain-specific agents → Conventional Comments → findings + verdict.
⚠ Flow: single continuous pipeline (Phases 1→4 + 6 + 8). ¬stop between phases. Decision response → immediately execute next phase. Stop only on: |Δ|=0, explicit Cancel, or Phase 8 completion.
/code-review → diff ${BASE}...HEAD (BASE = staging if exists, else main)
/code-review #42 → gh pr diff 42
Let:
F := set of all findings | f ∈ F := single finding
C(f) ∈ [0,100] ∩ ℤ — confidence | cat(f) ∈ {issue, suggestion, todo, nitpick, thought, question, praise}
Δ := changed files | BASE := staging ∨ main
Q := present decision via protocol: read ${CLAUDE_PLUGIN_ROOT}/../shared/references/decision-presentation.md (Pattern A)
Pipeline
| Phase | ID | Required | Verifies via | Notes |
|---|
| 1 | gather-changes | ✓ | Δ listed | — |
| 1.5 | secret-scan | ✓ | ∅ matches (or ACK) | — |
| 2 | spec-compliance | — | criteria checked | spec ∃ |
| 3 | multi-domain-review | ✓ | agents return | parallel |
| 4 | merge-and-present | ✓ | F + verdict | — |
| 6 | post-to-pr | — | comment posted | PR ∃ |
| 8 | next-step | ✓ | decision made | — |
Pre-flight
Success: F collected ∧ verdict posted ∧ Phase 8 decision made
Evidence: gh pr view {N} --comments | grep "## Code Review"
Steps: gather-changes → secret-scan → multi-domain-review → merge-and-present → post-to-pr → next-step
¬clear → STOP + ask: "Which branch/PR to review?"
Phase 1 — Gather Changes
BASE=$(git branch -r | grep -q 'origin/staging' && echo staging || echo main)
- PR# →
gh pr diff <#> | else → git diff ${BASE}...HEAD
- Δ =
git diff --name-only ${BASE}...HEAD (or gh pr diff <#> --name-only)
- ∀ f ∈ Δ: read full (skip binaries, note)
- |Δ| = 0 → halt
- |Δ| > 50 → warn, suggest split
Phase 1.5 — Secret Scan
git diff ${BASE}...HEAD | grep -iE '(password|passwd|secret|api[_-]?key|auth[_-]?token|access[_-]?token|private[_-]?key)\s*[:=]\s*["\x27`][^"\x27`]{8,}' | head -20
∃ matches → WARN (redact to first 2 + last 2 chars):
⚠️ Potential secrets found in diff — review before proceeding:
<file>: <matched line with secret value redacted to first 2 + last 2 chars>
→ DP(A) Review and proceed | Abort
∅ → continue silently.
Phase 2 — Spec Compliance
- issue_num ←
git branch --show-current | grep -oP '\d+' | head -1
- spec ←
ls artifacts/specs/<issue_num>-*.mdx 2>/dev/null
- spec ∃ → ∀ criterion: met → ∅ | ¬met →
issue(blocking): | ∀ met → praise:
- spec ∄ → skip
Phase 3 — Multi-Domain Review (Fresh Agents)
Spawn fresh agents via Task (¬implementation context → ¬bias).
Chunking (Slice 2 — O2)
Before dispatching agents, partition Δ into chunks using the Python chunker
(${CLAUDE_SKILL_DIR}/chunker.py).
from chunker import parse_diff, chunk, compute_budget
from digest import emit_all_digests, format_digest_for_agent
raw_diff = <diff text from Phase 1>
ctx_window = <active model context window, e.g. 200_000>
files = parse_diff(raw_diff)
budget = compute_budget(ctx_window)
chunks = chunk(files, budget)
digests = emit_all_digests(chunks)
- If
len(chunks) == 1 → single-chunk path (identical to pre-Slice-2 behaviour; all agents receive the full diff as before).
- If
len(chunks) > 1 → per-chunk Lane A dispatch (see below).
Agent dispatch
| Agent | Condition | Focus |
|---|
| security-auditor | always | OWASP, secrets, injection, auth |
| architect | |Δ| > 5 ∨ src ⊇ {arch, pattern, structure, service, module} | patterns, structure, circular deps |
| product-lead | spec ∃ | spec compliance, product fit |
| tester | Δ ∩ {src/, test/, *.test.*, *.spec.*} ≠ ∅ | coverage, AAA, edge cases |
| frontend-dev | Δ ∩ {{frontend.path}, {shared.ui}} ≠ ∅ | FE patterns, components, hooks |
| backend-dev | Δ ∩ {{backend.path}, {shared.types}} ≠ ∅ | BE patterns, API, errors |
| devops | Δ ∩ {configs, CI} ≠ ∅ | config, deploy, infra |
Skip rules: architect → |Δ| ≤ 5 ∧ ¬arch keywords | product-lead → spec ∄ | tester → Δ ⊂ {config, docs, infra}
Subdomain split (multi-chunk): For each chunk c_i, apply the dispatch table against c_i.files only (not full Δ). Default: 1 agent per domain per chunk.
Security-auditor scoping
-
∀ f ∈ Δ: imports(f) = static from '...' ∪ dynamic import('...')
-
Resolve aliases:
| Pattern | Resolution |
|---|
./, ../ | relative, try .ts, /index.ts |
@repo/<pkg> | → packages/<pkg>/src/index.ts (skip vitest/playwright config) |
@/* | → {frontend.path}/src/ + rest, try .ts, .tsx, /index.{ts,tsx} |
| External | skip |
-
scope = Δ ∪ ⋃{resolve(imports(f)) | f ∈ Δ} ∪ {backend.path}/src/auth/** — deduplicate
SYNC REQUIRED: inline class list must match review-classes.yml slugs — see #149
Spawn template
Note (orchestrator): The {format_digest_for_agent(d) for d in digests if d.chunk_index != i} placeholder is a Python expression evaluated by the orchestrator (Claude main context) BEFORE the Task call — substitute its rendered value into the prompt string. It is NOT a runtime-resolved placeholder. All other {...} placeholders are simple value substitutions.
Single-chunk (|chunks| = 1): identical to pre-Slice-2 — agents receive full diff.
Multi-chunk (|chunks| > 1) — Lane A per-chunk:
For each chunk c_i, spawn the applicable domain agents in parallel:
Task(
subagent_type: "dev-core:{agent}",
description: "{agent} review — chunk {i}/{N} — {PR#|branch}",
prompt: "Code review task. Focus: {focus}. Output Conventional Comments findings only. ¬TaskCreate.\n\nYou are reviewing chunk {i} of {N}. Review ONLY the files in this chunk.\n\nFormat per finding:\n<label>: <description>\n <file>:<line>\n -- {agent}\n Root cause: <why>\n Class: [<canonical-class>, ...] [candidate/<slug>?] ← 0–N canonical from review-classes.yml + 0–1 candidate; omit field if no class applies\n Raw callsites: [{file: <path>, line: <n>}, ...] ← all locations of this anti-pattern; required when Class is set; never empty\n Solutions:\n 1. <primary> (recommended)\n 2. <alternative>\n Confidence: N%\n\nCanonical classes (use slug only): test-tautology, generator-drift, parallel-path-drift, bash-arithmetic-trap, bash-error-suppression, shell-injection, sql-injection, missing-error-handling, missing-input-validation, secret-leak, bare-except, path-traversal, unbounded-loop. Free-text labels not in this list or candidate/* namespace are invalid. Candidate slugs must match ^candidate/[a-z][a-z0-9-]{1,48}$. Subsumption: bare-except subsumes missing-error-handling — when both apply, tag bare-except only.\n\n---CHUNK DIFF (chunk {i})---\n{c_i.hunk_text for all files in chunk}\n\n---CHUNK FILES---\n{contents of files in c_i}\n\n---BOUNDARY DIGESTS (other chunks)---\n{format_digest_for_agent(d) for d in digests if d.chunk_index != i}\n\n---SPEC---\n{spec contents if ∃, else omit section}"
)
Agent name map: security-auditor → dev-core:security-auditor | architect → dev-core:architect | product-lead → dev-core:product-lead | tester → dev-core:tester | frontend-dev → dev-core:frontend-dev | backend-dev → dev-core:backend-dev | devops → dev-core:devops | recall → dev-core:recall
Agent payload
Single-chunk: each agent receives full diff + Δ + spec (if ∃) + "output Conventional Comments".
Multi-chunk (Lane A): each agent receives chunk diff + chunk file contents + boundary digests of all other chunks + spec (if ∃).
Phase 3b — Cross-chunk class join + recall trigger (multi-chunk only)
After all Lane A agents complete, the orchestrator builds a cross-chunk index and triggers recall agents where warranted.
Step 1 — Build index:
class_index = {} # class_slug → {chunks: set[int], callsites: [{file, line}]}
∀ chunk c_i, ∀ finding f with class[] ≠ []:
∀ cls in f.class[] where ¬cls.startswith("candidate/"):
class_index[cls].chunks.add(i)
class_index[cls].callsites.extend(f.raw_callsites)
candidate/* classes → ¬join (advisory only, never trigger recall).
Step 2 — Trigger condition (per class):
|class_index[cls].chunks| ≥ 2 → trigger (cross-chunk spread)
|class_index[cls].callsites| ≥ 3 within single chunk → trigger (density)
Step 3 — Spawn recall agent per triggered class:
Task(
subagent_type: "dev-core:recall",
description: "recall — {cls} — {PR#|branch}",
prompt: "Targeted recall task for class '{cls}'.
Input:
class: {cls}
callsites: {class_index[cls].callsites}
context_lines: 10
cross_chunk_index: {chunks: {class_index[cls].chunks}, agents: {agents_that_flagged}}
Follow agents/recall.md procedure. Output Conventional Comments findings only.
All recall findings MUST use label `issue(blocking):`. ¬TaskCreate."
)
Recall agents run ∥. Collect findings → Phase 4 merge.
Single-chunk path: skip Phase 3b entirely (no cross-chunk index needed).
Review dimensions
correctness | security | performance | architecture | tests | readability | observability
Finding format (ALL fields mandatory except Class/Raw callsites)
<label>: <description>
<file>:<line>
-- <agent>
Root cause: <why, not what>
Class: [<canonical-class>, ...] [candidate/<slug>?]
Raw callsites: [{file: <path>, line: <n>}, ...]
Solutions:
1. <primary> (recommended)
2. <alternative>
3. <alternative> [optional]
Confidence: <0-100>%
Class field rules:
- 0–N canonical tags from
${CLAUDE_SKILL_DIR}/review-classes.yml + 0–1 candidate/<slug> tag
- Omit the
Class: field entirely when no class applies (¬write Class: [])
- Free-text labels not in the canonical list and not prefixed
candidate/ → invalid; treat as C(f) := 0
candidate/<slug> must match ^candidate/[a-z][a-z0-9-]{1,48}$; slug violating format → invalid, C(f) := 0
Raw callsites required when Class is set; list ALL locations of the anti-pattern in the diff + resolved imports, never just the cited line; format: [{file: <path>, line: <n>}, ...]
- Subsumption:
bare-except subsumes missing-error-handling — when both could apply, tag bare-except only
C(f) = min(diagnostic_certainty, fix_certainty)
| Band | C | Criteria |
|---|
| Certain | 90-100 | Unambiguous diagnosis + fix |
| High | 70-89 | Clear diagnosis, 1-2 approaches |
| Moderate | 40-69 | Probable, context-dependent |
| Low | 0-39 | Speculative, competing explanations |
Validation: missing mandatory fields ∨ C ∉ ℤ ∩ [0,100] ∨ free-text class label → C(f) := 0 (noted; /fix routes to 1b1).
Finding categories
| Category | Label | Blocks? |
|---|
| Bug / Security / Spec gap | issue: / todo: | ✓ |
| Standard violation | suggestion(blocking): | ✓ |
| Style | suggestion(non-blocking): / nitpick: | ✗ |
| Architecture | thought: / question: | ✗ |
| Good work | praise: | ✗ |
Phase 4 — Merge & Present
- Collect F from all agents (Lane A + recall agents + Lane B)
- Dedup: same file:line + issue → keep max C; ∀ pair sharing file:line with class[] sets that intersect after subsumption → merge: max C, union class[] (apply subsumption strip), union raw_callsites[]
- Sort: C desc within category
- Group: Blockers → Warnings → Suggestions → Praise
Source classification before verdict:
- Lane A findings: standard blocking/advisory per category label
- Recall findings (
source: recall): always blocking regardless of label — override to issue(blocking): if not already
- Lane B findings (
pattern-class tag): advisory only — cap at Approve with comments; ¬Request changes from Lane B alone
Verdict:
| Condition | Verdict |
|---|
| ∃f: recall finding (source: recall) | Request changes |
| ∃f: blocks(f) ∧ ¬recall | Request changes |
| Lane B advisory ∨ warns(f) only ∧ ¬blocks | Approve with comments |
| suggestions/praise only | Approve |
| F = ∅ | Approve (clean) |
Phase 6 — Post to PR
- PR# = provided ∨
gh pr list --head "$(git branch --show-current)" --json number --jq '.[0].number'; ¬∃ → skip
- Tempfile per
${CLAUDE_PLUGIN_ROOT}/../shared/references/tempfile-convention.md:
[[ "$PR" =~ ^[0-9]+$ ]] || { echo "Invalid PR number: $PR" >&2; exit 1; }
TMPDIR=$(mktemp -d -t "dev-core-review-comment-PR${PR}-XXXXXX")
trap 'rm -rf "$TMPDIR"' EXIT
BODY="$TMPDIR/body.md"
Write grouped findings to "$BODY" → gh pr comment "$PR" --body-file "$BODY"
## Code Review header; grouped findings + summary + verdict; ∀C included
→ immediately continue to Phase 8.
Phase 8 — Next Step
Q:
- Fix now (
/fix) — invoke /fix (auto-apply + 1b1 + spawn fixers; /fix Phase 8 offers rebase + label + merge)
- Merge as-is — rebase + label + squash merge (below)
- Stop — exit
If Merge as-is:
git fetch origin ${BASE} && git rev-list HEAD..origin/${BASE} --count
- count > 0 →
git rebase origin/${BASE} + git push --force-with-lease
- conflict → halt (¬label)
- Q: "Add
reviewed label?" → Yes / No
- Yes →
gh api repos/:owner/:repo/issues/<#>/labels -f "labels[]=reviewed" → squash merge on green CI
- No → inform manual
/code-review ¬fixes code. Fixing = /fix skill.
Edge Cases
| Scenario | Behavior |
|---|
| Δ |
| Binary ∈ Δ | Skip, note |
| Δ |
| F = ∅ | Clean approve, post, Phase 8 |
| Critical security | Escalate in findings, flag in verdict |
| Agents disagree | Present both with respective C |
| ¬∃ PR | Skip Phase 6, Phase 8 local only |
| Missing root cause/solutions | C(f) := 0 |
| architect skipped | ¬arch review → faster |
| product-lead skipped | Phase 2 skipped |
| tester skipped | ¬test coverage review |
Safety Rules
- Fresh agents only — ¬implementation context
- ¬auto-merge, ¬approve PRs on GitHub
- ¬fix code — findings only. Fixing =
/fix skill
- ∃ PR → must post comment (Phase 6)
- Human decides at Phase 8 — ¬proceed without Q
Chain Position
- Phase: Verify
- Predecessor:
/validate
- Successor: conditional — APPROVED → merge →
/cleanup | CHANGES_REQUESTED → /fix
- Class: verdict (branching based on findings)
Task Integration
/dev owns the dev-pipeline task lifecycle externally
- Sub-tasks created: review findings (
kind: "review-finding") if applicable
- Follow-up tasks: on CHANGES_REQUESTED (user picks
/fix at Phase 8) → TaskCreate fix task with metadata: { kind: "dev-pipeline", follow_up: true, iteration: N, blockedBy: [this.id] }
Exit
- APPROVED via
/dev (user picks Merge as-is at Phase 8): rebase + label + merge → return. /dev advances to /cleanup.
- CHANGES_REQUESTED via
/dev (user picks /fix at Phase 8): TaskCreate follow-up fix task → return silently. /dev picks up the new task and invokes /fix.
- Stop (user): return →
/dev presents Abort | Resume.
- Loop cap: max 2 fix→review iterations (tracked via
metadata.iteration). 3rd review iteration → Phase 8 must recommend Merge as-is or Stop, not Fix. /dev presents Abort if 3rd fix attempted.
$ARGUMENTS