with one click
review-changes
// [Code Quality] Use when reviewing current changes, staged or unstaged diffs, or branch-to-branch diffs.
// [Code Quality] Use when reviewing current changes, staged or unstaged diffs, or branch-to-branch diffs.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | review-changes |
| description | [Code Quality] Use when reviewing current changes, staged or unstaged diffs, or branch-to-branch diffs. |
Codex compatibility note:
- Invoke repository skills with
$skill-namein Codex; this mirrored copy rewrites legacy Claude/skill-namereferences.- Prefer the
plan-hardskill for planning guidance in this Codex mirror.- Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.
- User-question prompts mean to ask the user directly in Codex.
- Ignore Claude-specific mode-switch instructions when they appear.
- Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.
- Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required
spawn_agentsubagent(s) for that task.- Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.
- For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.
- If a required step/tool cannot run in this environment, stop and ask the user before adapting.
Codex does not receive Claude hook-based doc injection. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.
Always read:
docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)docs/project-reference/lessons.md (always-on guardrails and anti-patterns)Situation-based docs:
backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.mdfrontend-patterns-reference.md, scss-styling-guide.md, design-system/README.mdfeature-docs-reference.mdintegration-test-reference.mde2e-test-reference.mdcode-review-rules.md plus domain docs above based on changed filesDo not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.
[FINAL PURPOSE REMINDER โ MUST ATTENTION CRITICAL]
Ensure the changes is reasonable, no potential bugs or flaws, critical thinking hard.
[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set
in_progresswhen step starts, setcompletedwhen step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.
Goal: Comprehensive review of current diffs following project standards. No flaws, no bugs, no missing updates, no stale content. Applies to uncommitted work, staged changes, branch-to-branch diffs, and any project type โ code, docs, config, infrastructure, or non-coding artifacts.
Workflow:
$graph-blast-radius skill FIRST (if .code-graph/graph.db exists)$docs-update if staleness detectedKey Rules:
plans/reports/code-review-{date}-{slug}.mdfile:line proofMANDATORY IMPORTANT MUST ATTENTION Plan ToDo Task to discover and READ project-specific reference docs:
- Search for code standards docs:
*code-review*,*patterns*,*conventions*,*style-guide*โ read any found- Search for architecture docs:
*architecture*,*adr-*,README.mdat service/module roots- Look for docs referencing changed technology areas (backend, frontend, infra, etc.)
- Read docs most relevant to the categories of files changed
Prerequisites: MUST ATTENTION READ before executing:
Critical Purpose: Ensure quality โ no flaws, no bugs, no missing updates, no stale content. Verify both artifacts AND documentation.
External Memory: For complex or lengthy work (research, analysis, scan, review), write intermediate findings and final results to a report file in
plans/reports/โ prevents context loss and serves as deliverable.
Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION โ every claim, finding, and recommendation requires
file:lineproof or traced evidence with confidence percentage (>80% to act, <80% must verify first).
OOP & DRY Enforcement: MANDATORY IMPORTANT MUST ATTENTION โ flag duplicated patterns that should be extracted to a base class, generic, or helper. Classes in the same group or suffix MUST ATTENTION inherit a common base (even if empty now โ enables future shared logic and child overrides). Verify project has code linting/analyzer configured for the stack.
Comprehensive review of current changes or explicit branch/commit diffs following project standards.
Target: Current working-tree changes by default, or an explicit branch/tag/commit diff when the user asks to review a branch comparison.
Use these sources:
git status, git diff, and git diff --cachedgit diff <base>...<head> plus git diff --name-only <base>...<head>git diff <base>..<head> plus git diff --name-only <base>..<head>Be skeptical. Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80%.
file:line evidence (grep results, read confirmations)YAGNI โ Flag code solving hypothetical future problems (unused parameters, speculative interfaces, premature abstractions)
KISS โ Flag unnecessarily complex solutions. "Is there a simpler way meeting same requirement?"
DRY โ Actively grep for similar/duplicate code before accepting new code. 3+ similar patterns โ flag for extraction.
Clean Code โ Readable > clever. Names reveal intent. Functions do one thing. No deep nesting.
Follow Convention โ Before flagging ANY pattern violation, grep for 3+ existing examples. Codebase convention wins over textbook rules.
No Flaws/No Bugs โ Trace logic paths. Verify edge cases (null, empty, boundary values). Check error handling covers failure modes.
Proof Required โ Every claim backed by file:line evidence or grep results. Speculation FORBIDDEN.
Doc Staleness โ Cross-reference changed files against related docs (feature docs, test specs, READMEs). Flag stale or missing updates.
Run
python .claude/scripts/code_graph batch-query <f1> <f2> --jsonon changed files for test coverage and caller impact.
IMPORTANT MANDATORY MUST ATTENTION: FIRST action in every review. Call
$graph-blast-radiusBEFORE any other review work.
If .code-graph/graph.db exists, run graph-blast-radius analysis before reviewing changes:
$graph-blast-radius skill (runs python .claude/scripts/code_graph blast-radius --json)For each changed file, trace full impact:
python .claude/scripts/code_graph trace <changed-file> --direction downstream --json โ all files affected by changesMANDATORY FIRST: Create Todo Tasks for Review Phases Before starting, call task tracking with:
[Review Phase 0] Run $graph-blast-radius to analyze change impact - in_progress (MUST ATTENTION BE FIRST)[Review Phase 0.3] Detect high-risk change types, create risk tasks - pending[Review Phase 0.7] Categorize changed files, create dimension review tasks - pending[Review Phase 0.5] Plan compliance check (skip if no active plan) - pending[Review Phase 1] Get changes and create report file - pending[Review Phase 2] Review file-by-file and update report - pending[Review Phase 3] Spawn fresh-context sub-agent for holistic assessment - pending[Review Phase 4] Generate final review findings - pending[Review Phase 5] Run $docs-update if staleness detected - pendingUpdate todo status as each phase completes.
Note: If Phase 1 reveals 10+ changed files, replace Phase 2-4 tasks with Systematic Review Protocol tasks:
[Review Phase 2] Categorize and fire parallel sub-agents,[Review Phase 3] Synchronize and cross-reference,[Review Phase 4] Generate consolidated report
Phase 0: Run Graph Blast Radius Analysis (MANDATORY FIRST STEP)
IMPORTANT MANDATORY MUST ATTENTION: FIRST action before ANY other review work.
$graph-blast-radius skill.code-graph/graph.db does not exist, note "Graph not available โ skipping blast radius" and proceed to Phase 0.3Phase 0.3: Change Type Detection + Risk Tasks (MANDATORY)
Purpose: Identify HIGH-RISK change types in this diff before dimensional review. Each detected type creates a focused risk task. Change types are ORTHOGONAL to file category: the same file can be both a migration AND a security change โ detect all independently.
Step 1: Detect change types
git diff --name-only HEAD # unstaged
git diff --cached --name-only # staged
# For branch or commit-range review, use the user-provided diff source:
git diff --name-only <base>...<head>
Evaluate each change type for this diff:
| Change Type | Detection Signal (adapt to project's actual conventions) | TRUE if... |
|---|---|---|
| DepUpgrade | Dependency manifest changed (package.json, *.csproj, Gemfile, go.mod, requirements.txt, Cargo.toml, pom.xml, etc.) | A version number changed in any dependency manifest |
| Migration | File path or name suggests schema change (contains migration, schema, alter_table, or matches project's migration convention) | Any migration-convention file appears in the diff |
| BusEvent | New or modified event/message definition or consumer (infer from project conventions: consumer naming, message type directories) | A consumer or event class is new or its contract changed |
| ApiContract | API definition file changed (controller, route handler, OpenAPI/GraphQL schema) with route or field differences | Diff shows route/action/field additions or removals |
| SecurityChange | Auth/permission definition changed โ infer from project conventions (auth middleware, permission constants, policy definitions) | Any auth or permission gate is added, removed, or changed |
| ConfigChange | Configuration files changed (e.g., *.json, *.yaml, *.env*, *Config*, *Options*, *Settings*, *.toml) | Any config-convention file appears |
| InfraChange | Infrastructure definition changed (Dockerfile, docker-compose*.yml, CI/CD pipelines, k8s manifests, IaC files) | Any infra-convention file appears |
Record in report:
## Change Type Analysis
DepUpgrade: [YES/NO] | Migration: [YES/NO] | BusEvent: [YES/NO]
ApiContract: [YES/NO] | SecurityChange: [YES/NO] | ConfigChange: [YES/NO] | InfraChange: [YES/NO]
Step 2: Create change-type risk tasks (ALWAYS before any review work)
MANDATORY: Call task tracking for each TRUE signal. Do NOT create tasks for FALSE signals. The concerns listed are starting points โ apply domain knowledge beyond them.
| Condition | task tracking subject | Key concerns to investigate (starting points โ expand with domain knowledge) |
|---|---|---|
| DepUpgrade TRUE | [Review-DepUpgrade] Dependency upgrade โ semver, breaking changes, security advisories | Major/minor/patch? Read upstream CHANGELOG for breaking API changes. Grep deprecated API usage. Check transitive dependency changes. Known security advisories for new version? Peer dependency compatibility? Tests still passing? |
| Migration TRUE | [Review-Migration] DB migration โ rollback path, volume impact, zero-downtime | Rollback/Down script exists? Table size estimate โ large tables need lock analysis. NOT NULL column without default on non-empty table? Indexes created with no-lock option? Deployment ordering (before/after service deploy)? Backfill idempotent if run twice? |
| BusEvent TRUE | [Review-BusEvent] Cross-service event/message โ consumer, idempotency, retry, poison pill | Consumer exists for new event? Retry strategy: prerequisite data not synced โ wait-retry vs silent skip? Handler safe to run twice (idempotency)? Malformed message handling / dead-letter configured? Ordering assumptions vs broker guarantees? |
| ApiContract TRUE | [Review-ApiContract] API contract change โ backward compat, client alignment, auth | Additive or breaking? Breaking โ versioning or coordinated deploy required. All callers (UI, other services, tests) still compatible? New endpoint protected appropriately? No required response fields added without client update? |
| SecurityChange TRUE | [Review-SecurityChange] Security/permission change โ all paths covered, no privilege escalation | All code paths reaching the gate covered? Negative test verifying unauthorized access DENIED? Privilege escalation possible? BOTH enforcement AND display control updated? Permission definition in single authoritative place (no duplicated strings risking drift)? |
| ConfigChange TRUE | [Review-ConfigChange] Config/env change โ all environments, no secrets committed | New config key present in ALL environment configs? Hardcoded default masking missing production config? Any secret value in the diff? โ CRITICAL if yes. Documented in setup guide? App fails fast if config missing? |
| InfraChange TRUE | [Review-InfraChange] Infrastructure change โ env parity, no dev values in prod, reproducible build | Change affects all environments consistently? Hardcoded dev values (localhost, debug flags, dev credentials)? Pinned image/dependency versions? Local dev impact documented? CI/CD secret/permission requirements documented? |
Step 3: Work through change-type tasks before dimensional review
For each created change-type task:
in_progressfile:line for PASS or describe finding for FAIL/WARN## {Task Subject} Findings in reportcompletedIMPORTANT: Complete ALL change-type tasks FIRST, then proceed to Phase 0.7. If no change-type signals detected, log
"No high-risk change types detected"and proceed.
Phase 0.7: Change Surface Detection + Dynamic Review Tasks (MANDATORY)
Purpose: Let AI categorize the changes by nature and create review tasks accordingly. Do NOT assume fixed categories โ derive them from what the project's actual changed files are. Think, don't classify into a preset grid. The AI owns this step entirely.
Step 1: Derive categories from the diff
git diff --name-only HEAD # unstaged
git diff --cached --name-only # staged
# For branch or commit-range review, use the user-provided diff source:
git diff --name-only <base>...<head>
For each changed file, infer its category by examining:
Do NOT map to fixed buckets. Derive categories that fit THIS project's actual structure and vocabulary.
Common category types to consider as starting points (not exhaustive โ derive what fits):
Record in report:
## Change Surface
{Category name} ({category type}): {N} files
{Category name} ({category type}): {M} files
...
Step 2: For each category, enumerate concerns and create a task
This is where you THINK, not fill in blanks. Apply
SYNC:category-review-thinkingfor each category.
For EACH identified category:
[Review-{Category}] {brief concern summary} listing derived concernsALWAYS create:
[Review-General]โ universal quality: correctness, YAGNI/KISS/DRY, doc staleness, test coverage. Runs across ALL changed files regardless of other categories.
Sub-Agent Type Selection:
| Category Nature | subagent_type |
|---|---|
| Code logic (any stack) | code-reviewer |
| Security, auth, permissions | security-auditor |
| Performance, query efficiency, latency | performance-optimizer |
| Documentation, plans, specs, ADRs | general-purpose |
| Infrastructure, CI/CD, config | general-purpose |
| Mixed or default | code-reviewer |
Step 3: Work through tasks in order
For each created task:
in_progress before startingSYNC:category-review-thinking โ trust your domain knowledge beyond the examples there## {Task Subject} Findings sectioncompleted before starting next taskNEVER mark a dimension task completed by scanning. Work through each relevant file explicitly. For large categories (10+ files): escalate to a parallel sub-agent using the Systematic Review Protocol.
Phase 0.5: Plan Compliance Check (CONDITIONAL โ only when active plan exists)
Check ## Plan Context in injected context:
{plan-path}$plan-hard.md โ get phase list and scopePhase 1: Get Changes and Create Report File
git status for current changes, or git diff --name-only <base>...<head> for branch comparisonsgit diff or git diff <base>...<head> to see actual changesplans/reports/code-review-{date}-{slug}.mdPhase 2: File-by-File Review (Build Report Incrementally)
For EACH changed file, read and immediately update report with:
Phase 3: Second-Round Review (Conditional Protocol โ branch on Phase 0.7 surface)
Protocol:
SYNC:double-round-trip-review+SYNC:fresh-context-review+SYNC:review-protocol-injection(all inlined above). INVARIANT: Phase 3 fires a fresh sub-agent ONLY after a fix cycle. If Phase 2 finds zero issues, the review ENDS โ no Phase 3 needed. If Phase 2 finds issues, fix them, then Phase 3 fresh sub-agent re-review is mandatory.
Check categories from Phase 0.7 โ if multiple distinct domains changed (e.g., server-side + client-side), run Synthesis Mode. Otherwise run Holistic Mode.
[SYNTHESIS MODE โ when multiple distinct domains changed]
Spawn a Synthesis Agent as Round 2. Purpose: catch cross-boundary issues individual dimensional tasks cannot see.
When constructing Agent call prompt:
Copy Agent call shape from SYNC:review-protocol-injection template verbatim, agent_type: "code-reviewer"
Embed all 10 universal SYNC blocks verbatim
Set Task as:
Synthesis review โ cross-boundary concerns ONLY across the changed domains in this diff.
You have these dimensional findings as context: {summary from each dimensional task}.
Re-read ALL changed files from scratch via your own tool calls.
Focus ONLY on cross-boundary concerns โ do NOT re-review each domain's internals:
1. Contract Alignment: Do callers match what callees expose? (routes, parameters, field names, types)
2. Data Consistency: Are field names/types consistent across layer boundaries?
3. Security Boundary: Is auth enforced on BOTH sides (enforcement AND display control)?
4. Cross-Layer Naming: Same concept named differently across layers?
5. Missing Wiring: New producer with no consumer? New consumer with no producer? New feature with no doc?
6. Documentation: Docs reflect changes in BOTH domains together?
Set Target Files as "use the selected diff source from Phase 1"
Set report path as plans/reports/synthesis-review-{date}.md
After sub-agent returns:
## Synthesis Round Findings in main report โ DO NOT filter or override[HOLISTIC MODE โ when single domain changed]
No cross-boundary synthesis needed. Spawn standard holistic Round 2.
When constructing Agent call prompt:
SYNC:review-protocol-injection template verbatimsubagent_type based on domain's dominant concern (see Sub-Agent Type Selection)"Review the selected diff holistically. Focus on big picture โ overall technical approach coherence, architecture layers, logic placement (lowest layer), DRY violations, YAGNI/KISS, function complexity. Domain: {category from Phase 0.7} โ apply domain knowledge for this category accordingly.""use the selected diff source from Phase 1"plans/reports/code-review-changes-round{N}-{date}.mdAfter sub-agent returns:
## Round {N} Findings (Fresh Sub-Agent) in main report โ DO NOT filter or overrideThe following checks are handled by sub-agent but can be verified in Phase 4:
Clean Code & Over-engineering Checks:
Documentation Staleness Check (REQUIRED):
For each changed file, identify related documentation:
Correctness & Bug Detection: Apply SYNC:bug-detection โ null safety, boundaries, error handling, resource cleanup, concurrency.
Test Spec Verification: Apply SYNC:test-spec-verification โ locate specs, verify coverage, flag gaps.
Integration Test Sync: Apply SYNC:integration-test-sync-check โ surface missing tests via a direct user question.
Translation Sync: Apply SYNC:translation-sync-check โ for multilingual UI text changes, require translation updates or explicit user risk acceptance.
Phase 4: Generate Final Review Result
Update report with final sections:
If Documentation Staleness Check in Phase 4 identified stale docs:
$docs-update skill to update impacted documentation$docs-update produces changes, include in review summaryBefore approving, verify artifacts are easy to read, maintain, understand:
docs/project-config.json or equivalent)employeeRecords not data)isActive, hasPermission, canEdit)file:line showing existing pattern)Provide feedback in this format:
Summary: Brief overall assessment
Critical Issues: (Must fix before commit)
High Priority: (Should fix)
Suggestions: (Nice to have)
Documentation Staleness: (Docs that may need updating)
No doc updates needed โ if no changed file maps to a docPositive Notes:
Suggested Commit Message:
type(scope): description
- Detail 1
- Detail 2
NON-NEGOTIABLE: When changeset is large (10+ files), MUST ATTENTION use this systematic protocol instead of reviewing files one-by-one sequentially.
Principle: Review carefully and systematically โ break into groups, fire multiple specialized agents to review in parallel. Ensure no flaws, no bugs, no stale info, and best practices in every aspect.
In Phase 0, after running git status, count changed files. If 10 or more files changed:
"Detected {N} changed files. Switching to systematic parallel review protocol."Group all changed files into logical categories derived from the project's actual structure (see Phase 0.7). Example groupings to orient thinking (derive what fits the project):
| Category Type | Example Groupings |
|---|---|
| Agent/Tooling | AI scripts, hooks, skill definitions, workflow configs, linting rules |
| Root config/docs | Root README, project config, CI/CD pipeline configs |
| Reference docs | Architecture docs, patterns references, setup guides |
| Feature/domain docs | Business feature documentation, spec files, ADRs |
| Backend logic | Service/handler/controller source (infer from project structure) |
| Frontend logic | UI component/state/API source (infer from project structure) |
| Data/Schema | Migrations, schema files, seed data |
| Tests | Unit, integration, E2E test files |
| Infrastructure | Docker, k8s, CI/CD, cloud manifests |
Derive the actual groupings from what THIS project contains โ do not force files into categories that don't fit.
Launch one sub-agent per category via spawn_agent tool with run_in_background: true.
Sub-agent type selection per category:
code-reviewersecurity-auditorperformance-optimizergeneral-purposeEach sub-agent receives:
SYNC:category-review-thinking framework as its primary thinking model*patterns*, *conventions*, *style-guide*)All sub-agents run in parallel to maximize speed and coverage.
After all sub-agents complete:
With all category findings combined, assess:
MANDATORY IMPORTANT MUST ATTENTION โ NO EXCEPTIONS: If NOT already in a workflow, MUST use a direct user question to ask user. Do NOT judge task complexity or decide "simple enough to skip" โ user decides, not you:
- Activate
review-changesworkflow (Recommended) โ review-changes โ review-architecture โ code-simplifier โ code-review โ performance โ plan โ plan-validate โ cook โ watzup- Execute
$review-changesdirectly โ run this skill standalone
For each changed file, verify no import from forbidden layer:
docs/project-config.json โ architectureRules.layerBoundariespaths glob patternscannotImportFrom, it is a violationarchitectureRules.excludePatterns"BLOCKED: {layer} layer file {filePath} imports from {forbiddenLayer} layer ({importStatement})"If architectureRules not present in project-config.json, skip silently.
MANDATORY IMPORTANT MUST ATTENTION โ NO EXCEPTIONS after completing this skill, MUST use a direct user question to present options. Do NOT skip because task seems "simple" or "obvious" โ user decides:
Completion โ Correctness. Before reporting ANY work done, prove it:
- Grep every removed name. Extraction/rename/delete touched N files? Grep confirms 0 dangling refs across ALL file types.
- Ask WHY before changing. Existing values are intentional until proven otherwise. No "fix" without traced rationale.
- Verify ALL outputs. One build passing โ all builds passing. Check every affected stack.
- Evaluate pattern fit. Copying nearby code? Verify preconditions match โ same scope, lifetime, base class, constraints.
- New artifact = wired artifact. Created something? Prove it's registered, imported, reachable by all consumers.
| Skill | Relationship | When to Call |
|---|---|---|
$docs-update | Primary downstream โ called when staleness detected | Triggered by Documentation Staleness findings |
$spec-discovery [update] | Spec updater โ called when artifact behavior differs from spec bundle | Call BEFORE docs-update if spec-was-wrong scenario detected |
$feature-docs [update] | Feature doc updater โ called for feature doc section changes | Called internally by docs-update; call directly for targeted update |
$tdd-spec [update] | Test spec updater โ called when test cases may be stale | Called internally by docs-update; call directly for targeted test case update |
$integration-test-review | Test quality gate โ detects test/spec mismatches | Call when changes touch areas covered by integration tests |
$code-review | Code quality โ deeper review of changed code | Always follows review-changes quality pass |
When called outside a workflow (i.e., user ran $review-changes directly):
review-changes (you are here)
โ
โโ Code quality checks (code-simplifier โ review-architecture โ code-review โ performance)
โ
โโ Phase 5: Documentation Staleness Triage
โ โ If stale docs detected: [REQUIRED] โ $docs-update
โ
โโ Integration test check (SYNC:integration-test-sync-check):
โ โ If logic changes touch tested areas: [REQUIRED] โ $integration-test [from-changes]
โ โ Then: $integration-test-review โ $integration-test-verify
โ
โโ Translation sync check (SYNC:translation-sync-check):
โ โ If multilingual UI text changes lack locale updates: [REQUIRED] ask the user directly + explicit decision
โ
โโ Bugfix-specific: "Was spec wrong?" check:
โ If this review is post-bugfix AND spec describes the bug as expected behavior:
โ โ [REQUIRED] Flag to user: "The spec may document the bug as correct behavior."
โ โ If spec bug confirmed โ [REQUIRED]: $spec-discovery [update] FIRST โ $feature-docs [update relevant sections]
โ โ Do NOT let $docs-update update test cases to document broken behavior.
โ
โโ [RECOMMENDED] โ $watzup
Summary of all review findings, doc changes, and test coverage status.
[CRITICAL โ TOP 3 RULES]
- MUST ATTENTION Phase 0 graph blast-radius FIRST โ NEVER skip; informs entire review order
- Clean Round 1 ENDS the review. When issues found, fresh sub-agent re-review mandatory after fixing.
- MUST ATTENTION task tracking ALL phases before starting; missing tests MUST surface via a direct user question โ NOT silently logged
[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting โ including tasks for each file read. Prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.
Critical Thinking Mindset โ Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact โ cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence โ certainty without evidence root of all hallucination.
Sequential Thinking Protocol โ Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.
Trigger when: complex problem decomposition ยท adaptive plans needing revision ยท analysis with course correction ยท unclear/emerging scope ยท multi-step solutions ยท hypothesis-driven debugging ยท cross-cutting trade-off evaluation.
Format (explicit mode โ visible thought trail):
Thought N/M: [aspect]โ one aspect per thought, state assumptions/uncertaintyThought N/M [REVISION of Thought K]: ...โ when prior reasoning invalidated; state Original / Why revised / ImpactThought N/M [BRANCH A from Thought K]: ...โ explore alternative; converge with decision rationaleThought N/M [HYPOTHESIS]: ...then[VERIFICATION]: ...โ test before actingThought N/N [FINAL]โ only when verified, all critical aspects addressed, confidence >80%Mandatory closers: Confidence % stated ยท Assumptions listed ยท Open questions surfaced ยท Next action concrete.
Stop conditions: confidence <80% on any critical decision โ escalate via ask the user directly ยท โฅ3 revisions on same thought โ re-frame the problem ยท branch count >3 โ split into sub-task.
Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).
Deep-dive: see
$sequential-thinkingskill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (api-design, debug, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).
Understand Code First โ HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
- Search 3+ similar patterns (
grep/glob) โ citefile:lineevidence- Read existing files in target area โ understand structure, base classes, conventions
- Run
python .claude/scripts/code_graph trace <file> --direction both --jsonwhen.code-graph/graph.dbexists- Map dependencies via
connectionsorcallers_ofโ know what depends on your target- Write investigation to
.ai/workspace/analysis/for non-trivial tasks (3+ files)- Re-read analysis file before implementing โ never work from memory alone
- NEVER invent new patterns when existing ones work โ match exactly or document deviation
BLOCKED until:
- [ ]Read target files- [ ]Grep 3+ patterns- [ ]Graph trace (if graph.db exists)- [ ]Assumptions verified with evidence
Design Patterns Quality โ Priority checks for every code change:
- DRY via OOP: Identify classes/modules with the same purpose, naming pattern, or lifecycle. Apply your knowledge of the project's language/framework to determine the idiomatic abstraction (base class, mixin, trait, protocol, decorator). 3+ similar patterns โ extract to shared abstraction.
- Right Responsibility: Logic in LOWEST layer (Entity > Domain Service > Application Service > Controller). Never business logic in controllers.
- SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).
- After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.
- YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.
Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.
Serial Attention for Design Quality โ DO NOT scan all quality concerns simultaneously. Split attention misses violations that focused passes catch.
- Identify applicable dimensions โ Based on the code's language, domain, and patterns, determine which quality dimensions apply: DRY, SOLID principles (SRP/OCP/LSP/ISP/DIP), OOP idioms, cohesion/coupling, GRASP, Law of Demeter, CQRS invariants, etc. Your list is NOT fixed โ derive from what the code actually does.
- One focused pass per dimension โ Dedicate single-focus attention to EACH dimension in sequence. Do NOT mix concerns across passes.
- Threshold: 3+ similar patterns = MANDATORY extraction โ Not optional suggestion. Flag as mandatory structural fix requiring action.
- 2+ violations of same kind = structural finding โ Report as "pattern problem" needing architectural resolution, not a list of individual instances.
Complexity Prevention (Ousterhout) โ MANDATORY. Measure code by cost of change: one business change should map to one code change. Flag ALL of the following in review:
- Change amplification โ small business change forces edits in >3 places โ structural flaw. Count edit sites for a plausible future change (add variant, add field, add authorization). >3 = reject.
- Cognitive load โ reader must hold too much context to safely modify. Flag deep inheritance, long parameter lists, boolean traps, implicit ordering dependencies.
- Cross-cutting duplication at entry points โ logging, error handling, validation, auth, transactions reimplemented per controller/handler/route. Lift to middleware / interceptor / filter / decorator / aspect.
- Leaked implementation technology โ repos returning
IQueryable/QuerySet/Criteria/raw cursors/ORM entities to callers. Return finished results + intent-revealing methods (GetActiveVipUsers()notQuery()).- Type-switch scattering โ
switch/if-chains on enum/discriminator in >1 place. New variant = new file, not N edits. One factory/registry switch at the boundary OK; scattered switches = reject.- Anemic models โ domain objects with only getters/setters, logic floats in services. Move invariants/behavior onto the object (
order.Checkout(), notorder.Status = ...).- Primitive obsession โ raw
string/int/decimalfor account numbers, emails, money, percentages, date ranges, with re-validation at every entry. Wrap in value objects / records / structs that validate once at construction.- Inline cross-cutting concerns โ authorization/tenant isolation/audit/sanitization hand-written at top of every handler. Flag intent with declarative markers (
@RequirePermission("Order.Delete")), enforce once centrally.- Shallow modules โ tiny class, big interface (many public methods, many flags, many ctor params) wrapping little logic. A module is deep when a small interface hides a lot of implementation. If interface โ implementation cost to learn โ inline.
- Missing base class for repeated component/handler lifecycle โ 3+ forms/CRUD handlers/list views reimplementing loading/dirty/submit/pagination โ extract to base class / hook / composable / mixin / trait.
- Premature vs delayed abstraction โ rule-of-three. First occurrence: write it. Second: notice duplication. Third: extract. Don't build generic frameworks before real variation; don't copy-paste for the 4th time.
- Embedded utility logic not extracted to helpers โ inline paging loops (
while (hasMore) { skip += take; ... }), ad-hoc datetime math, string parsing/formatting, collection partitioning, retry/backoff loops, URL/query-string building. If the algorithm is non-trivial AND stack-generic (not business-specific), extract toutil/helper/extensionsand let consumers call one line. Inline duplicates โ duplicated bug surface.- Logic in wrong (higher) layer โ downshift to callee โ business/derivation logic written in the caller when the callee owns the data. Defaults: Controller code that should be App Service. App Service code that should be Domain Service or Entity. Component code that should be ViewModel/Store/Service. Caller reaching into callee's data shape to compute something โ move the computation behind an intent-revealing method on the callee. Lowest responsible layer wins (Entity > Domain Service > App Service > Controller ยท Model/VM > Store > Component). Higher-layer placement = duplicated logic when a sibling caller needs the same thing.
- Owner owns the rule โ extract on first write โ if a caller inlines logic that derives, normalizes, validates, or computes from another type's data, MOVE it to the owning type. Single use is sufficient โ the trigger is wrong responsibility, not duplication. Sibling callers always arrive; inline copies drift silently with no compile error and no name to grep. Common offenders: Backend โ inlined rules in application-layer handlers / commands / queries / services / controllers that belong on the domain entity / value object / domain service. Frontend โ inlined derivations / formatting / validation in components that belong on the model / store / view-model / API service. Fix: name the rule once as a method (static or instance) on the owning type; callers invoke by name. Future variant โ SECOND named method on the owner, never an inline near-duplicate. Right responsibility first; reuse is the consequence.
Extraction target โ where the named rule lives:
Shape of the rule Goes to Pure function over an entity's own data static method on the entity Behavior that mutates / guards entity state instance method on the entity Always-true invariant on a primitive value value object constructor Needs DI (repo / settings / clock) helper class registered in DI Domain-agnostic algorithm reused across types util / extension method Pure shape / projection conversion DTO mapping Pre-commit edit-site test (reject if answer is "many"):
Change Scenario Should touch Add new variant (customer type, payment method) 1 new file Change HTTP error response format 1 middleware/filter Add timestamp field to every persisted entity 1 base entity/interceptor Add authorization to a new endpoint 1 declarative marker Swap database/ORM Data layer only Change business calculation rule 1 method on owning entity Add loading indicator pattern to forms 1 base component/hook Add validation rule to a domain primitive 1 value-object ctor Change paging/retry/datetime algorithm 1 helper/util function Change a derivation of entity data 1 method on the entity Operating heuristics:
- Write the call site first.
- Count edit sites for plausible future change.
- Prefer removing code over adding it.
- Surface assumptions at boundaries, hide details inside.
- Pre-reuse scan โ before writing a non-trivial block, grep for similar algorithms (
while.*skip,DateTime.*Add,split/joinchains, paging loops, retry loops). Match existing helper โ call it. None exists but pattern is stack-generic โ extract to util before second caller appears.- Layer placement test โ ask "if a sibling caller needed this tomorrow, would they re-derive it?" If yes, the logic is in the wrong layer. Move it down.
- Open-case-for-future-reuse โ if reviewer spots a block that is likely to appear in another feature (domain-agnostic algorithm, shared lifecycle, recurring derivation), do NOT rationalize with pure YAGNI. Either extract now (if cheap) or create a tracked TODO with the exact extraction target so the second caller does not duplicate silently. Silent duplication is the default failure mode.
- When in doubt ask: "What would need to change if the requirement shifts?"
The measure of good code is the cost of change. Not shortest. Not cleverest. Not most abstracted. Cheapest to safely modify having read a small local portion.
Fix-Triggered Re-Review Loop โ Re-review is triggered by a FIX CYCLE, not by a round number. Review purpose:
review โ if issues โ fix โ re-reviewuntil a round finds no issues. A clean review ENDS the loop โ no further rounds required.Round 1: Main-session review. Read target files, build understanding, note issues. Output findings + verdict (PASS / FAIL).
Decision after Round 1:
- No issues found (PASS, zero findings) โ review ENDS. Do NOT spawn a fresh sub-agent for confirmation.
- Issues found (FAIL, or any non-zero findings) โ fix the issues, then spawn a fresh sub-agent for Round 2 re-review.
Fresh sub-agent re-review (after every fix cycle): Spawn a NEW
spawn_agenttool call โ never reuse a prior agent. Sub-agent re-reads ALL files from scratch with ZERO memory of prior rounds. SeeSYNC:fresh-context-reviewfor the spawn mechanism andSYNC:review-protocol-injectionfor the canonical Agent prompt template. Each fresh round must catch:
- Cross-cutting concerns missed in the prior round
- Interaction bugs between changed files
- Convention drift (new code vs existing patterns)
- Missing pieces that should exist but don't
- Subtle edge cases the prior round rationalized away
- Regressions introduced by the fixes themselves
Loop termination: After each fresh round, repeat the same decision: clean โ END; issues โ fix โ next fresh round. Continue until a round finds zero issues, or 3 fresh-subagent rounds max, then escalate to user via a direct user question.
Rules:
- A clean Round 1 ENDS the review โ no mandatory Round 2
- NEVER skip the fresh sub-agent re-review after a fix cycle (every fix invalidates the prior verdict)
- NEVER reuse a sub-agent across rounds โ every iteration spawns a NEW Agent call
- Main agent READS sub-agent reports but MUST NOT filter, reinterpret, or override findings
- Max 3 fresh-subagent rounds per review โ if still FAIL, escalate via a direct user question (do NOT silently loop)
- Track round count in conversation context (session-scoped)
- Final verdict must incorporate ALL rounds executed
Report must include
## Round N Findings (Fresh Sub-Agent)for every round Nโฅ2 that was executed.
Fresh Sub-Agent Review โ Eliminate orchestrator confirmation bias via isolated sub-agents.
Why: The main agent knows what it (or
$cook) just fixed and rationalizes findings accordingly. A fresh sub-agent has ZERO memory, re-reads from scratch, and catches what the main agent dismissed. Sub-agent bias is mitigated by (1) fresh context, (2) verbatim protocol injection, (3) main agent not filtering the report.When: ONLY after a fix cycle. A review round that finds zero issues ENDS the loop โ do NOT spawn a confirmation sub-agent. A review round that finds issues triggers: fix โ fresh sub-agent re-review.
How:
- Spawn a NEW
spawn_agenttool call โ usecode-reviewersubagent_type for code reviews,general-purposefor plan/doc/artifact reviews- Inject ALL required review protocols VERBATIM into the prompt โ see
SYNC:review-protocol-injectionfor the full list and template. Never reference protocols by file path; AI compliance drops behind file-read indirection (seeSYNC:shared-protocol-duplication-policy)- Sub-agent re-reads ALL target files from scratch via its own tool calls โ never pass file contents inline in the prompt
- Sub-agent writes structured report to
plans/reports/{review-type}-round{N}-{date}.md- Main agent reads the report, integrates findings into its own report, DOES NOT override or filter
Rules:
- SKIP fresh sub-agent when the prior round found zero issues (no fixes = nothing new to verify)
- NEVER skip fresh sub-agent after a fix cycle โ every fix invalidates the prior verdict
- NEVER reuse a sub-agent across rounds โ every fresh round spawns a NEW
spawn_agentcall- Max 3 fresh-subagent rounds per review โ escalate via a direct user question if still failing; do NOT silently loop or fall back to any prior protocol
- Track iteration count in conversation context (session-scoped, no persistent files)
Review Protocol Injection โ Every fresh sub-agent review prompt MUST embed 10 protocol blocks VERBATIM. The template below has ALL 10 bodies already expanded inline. Copy the template wholesale into the Agent call's
promptfield at runtime, replacing only the{placeholders}in Task / Round / Reference Docs / Target Files / Output sections with context-specific values. Do NOT touch the embedded protocol sections.Why inline expansion: Placeholder markers would force file-read indirection at runtime. AI compliance drops significantly behind indirection (see
SYNC:shared-protocol-duplication-policy). Therefore the template carries all 10 protocol bodies pre-embedded.
Choose subagent_type based on the dominant concern of the review:
| Dominant Concern | subagent_type |
|---|---|
| Code logic, architecture, correctness | code-reviewer |
| Security, auth, permissions, vulnerabilities | security-auditor |
| Performance, latency, query efficiency, memory | performance-optimizer |
| Documentation, plans, specs, ADRs, configs | general-purpose |
| Infrastructure, CI/CD, build tooling | general-purpose |
| Mixed concerns (default fallback) | code-reviewer |
For large changesets with multiple distinct dominant concerns โ spawn ONE sub-agent per concern type in parallel.
spawn_agent({
description: "Fresh Round {N} review",
agent_type: "{code-reviewer | security-auditor | performance-optimizer | general-purpose}",
prompt: `
## Task
{review-specific task โ e.g., "Review all uncommitted changes for code quality" | "Security review of auth changes" | "Review plan files under {plan-dir}" | "Performance review of data access layer changes"}
## Round
Round {N}. You have ZERO memory of prior rounds. Re-read all target files from scratch via your own tool calls. Do NOT trust anything from the main agent beyond this prompt.
## Protocols (follow VERBATIM โ these are non-negotiable)
### Evidence-Based Reasoning
Speculation is FORBIDDEN. Every claim needs proof.
1. Cite file:line, grep results, or framework docs for EVERY claim
2. Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend
3. Cross-boundary validation required for architectural changes
4. "I don't have enough evidence" is valid and expected output
BLOCKED until: Evidence file path (file:line) provided; Grep search performed; 3+ similar patterns found; Confidence level stated.
Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because".
If incomplete โ output: "Insufficient evidence. Verified: [...]. Not verified: [...]."
### Bug Detection
MUST check categories 1-4 for EVERY review. Never skip.
1. Null Safety: Can params/returns be null/undefined? Are they guarded? .find()/.get() returns checked before use?
2. Boundary Conditions: Off-by-one (< vs <=)? Empty collections handled? Zero/negative values? Max limits?
3. Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally/defer?
4. Resource Management: Connections/streams closed? Long-lived resources released? Memory bounded?
5. Concurrency (if async): Missing await/promise handling? Race conditions on shared state? Retry storms?
6. Language/Stack-Specific: Apply known failure modes for the language/runtime in this project โ use your domain knowledge of the stack.
Classify: CRITICAL (crash/corrupt) โ FAIL | HIGH (incorrect behavior) โ FAIL | MEDIUM (edge case) โ WARN | LOW (defensive) โ INFO.
### Design Patterns Quality
Priority checks for every code change:
1. DRY via OOP: Same-suffix classes MUST share base class. 3+ similar patterns โ extract to shared abstraction.
2. Right Responsibility: Logic in LOWEST layer. Never business logic in top-layer orchestrators.
3. SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).
4. After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.
5. YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.
Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.
### Logic & Intention Review
Verify WHAT code does matches WHY it was changed.
1. Change Intention Check: Every changed file MUST serve the stated purpose. Flag unrelated changes as scope creep.
2. Happy Path Trace: Walk through one complete success scenario through changed code.
3. Error Path Trace: Walk through one failure/edge case scenario through changed code.
4. Acceptance Mapping: If plan context available, map every acceptance criterion to a code change.
NEVER mark review PASS without completing both traces (happy + error path).
### Test Spec Verification
Map changed code to test specifications.
1. Identify the project's test spec format โ grep for test case files (e.g., docs/**/test-*, docs/specs/**, *.feature, *.spec.md, test-cases/).
2. For each changed code path, locate the corresponding test case โ or flag as "needs test case".
3. New functions/endpoints/handlers โ flag for test spec creation.
4. If test spec evidence fields exist in the project, verify they point to actual code (file:line, not stale).
5. If no specs exist for a changed path โ log gap and recommend $tdd-spec.
NEVER skip test mapping. Untested code paths are the #1 source of production bugs.
### Fix-Layer Accountability
NEVER fix at the crash site. Trace the full flow, fix at the owning layer. The crash site is a SYMPTOM, not the cause.
MANDATORY before ANY fix:
1. Trace full data flow โ Map the complete path from data origin to crash site across ALL layers. Identify where bad state ENTERS, not where it CRASHES.
2. Identify the invariant owner โ Which layer's contract guarantees this value is valid? Fix at the LOWEST layer that owns the invariant, not the highest layer that consumes it.
3. One fix, maximum protection โ If fix requires touching 3+ files with defensive checks, you are at the wrong layer โ go lower.
4. Verify no bypass paths โ Confirm all data flows through the fix point.
BLOCKED until: Full data flow traced (origin โ crash); Invariant owner identified with file:line evidence; All access sites audited (grep count); Fix layer justified (lowest layer that protects most consumers).
Anti-patterns (REJECT): "Fix it where it crashes" (crash site โ cause site, trace upstream); "Add defensive checks at every consumer" (scattered defense = wrong layer); "Both fix is safer" (pick ONE authoritative layer).
### Rationalization Prevention
AI skips steps via these evasions. Recognize and reject:
- "Too simple for a plan" โ Simple + wrong assumptions = wasted time. Plan anyway.
- "I'll test after" โ RED before GREEN. Write/verify test first.
- "Already searched" โ Show grep evidence with file:line. No proof = no search.
- "Just do it" โ Still need task tracking. Skip depth, never skip tracking.
- "Just a small fix" โ Small fix in wrong location cascades. Verify file:line first.
- "Code is self-explanatory" โ Future readers need evidence trail. Document anyway.
- "Combine steps to save time" โ Combined steps dilute focus. Each step has distinct purpose.
### Graph-Assisted Investigation
MANDATORY when .code-graph/graph.db exists.
HARD-GATE: MUST run at least ONE graph command on key files before concluding any investigation.
Pattern: Grep finds files โ trace --direction both reveals full system flow โ Grep verifies details.
- Investigation/Scout: trace --direction both on 2-3 entry files
- Fix/Debug: callers_of on buggy function + tests_for
- Feature/Enhancement: connections on files to be modified
- Code Review: tests_for on changed functions
- Blast Radius: trace --direction downstream
CLI: python .claude/scripts/code_graph {command} --json. Use --node-mode file first (10-30x less noise), then --node-mode function for detail.
### Understand Code First
HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
1. Search 3+ similar patterns (grep/glob) โ cite file:line evidence.
2. Read existing files in target area โ understand structure, base classes, conventions.
3. Run python .claude/scripts/code_graph trace <file> --direction both --json when .code-graph/graph.db exists.
4. Map dependencies via connections or callers_of โ know what depends on your target.
5. Write investigation to .ai/workspace/analysis/ for non-trivial tasks (3+ files).
6. Re-read analysis file before implementing โ never work from memory alone.
7. NEVER invent new patterns when existing ones work โ match exactly or document deviation.
BLOCKED until: Read target files; Grep 3+ patterns; Graph trace (if graph.db exists); Assumptions verified with evidence.
### Category Review Thinking
For EACH category of changed files โ THINK, do not fill in a checklist. DO NOT limit to the examples below.
Step 1 โ Understand the category's role: What is its purpose? What invariants govern it? Who consumes it and what do they expect?
Step 2 โ Read project conventions: grep for reference docs, style guides, READMEs for this area. Examine 3+ existing similar files to surface established patterns.
Step 3 โ Derive concerns from first principles. Apply ALL that are relevant โ expand based on domain knowledge:
- Correctness: logic matches intent? happy path AND error path traced?
- Contracts: interfaces/APIs/events/protocols honored? no implicit coupling introduced?
- Project conventions: follows patterns found in Step 2? evidence-confirmed, not assumed?
- Security: auth enforced? input validated at boundaries? no secrets in diff?
- Performance: unbounded operations? N+1? blocking in async context? unindexed queries?
- Maintainability: DRY? single responsibility? complexity reasonable? names reveal intent?
- Test coverage: changed paths covered? existing tests still valid after the change?
- Documentation: related docs/specs reflect the changes?
Step 4 โ For each concern identified: verify with file:line evidence or flag as finding.
Examples only โ your knowledge exceeds this list:
- Logic files (any stack): handler/service structure, validation placement, side effect isolation, cross-boundary coupling, data access layer separation
- Data/Schema: rollback path, lock impact on table volume, backfill idempotency, index coverage for query patterns, deployment ordering
- Config files: all environments covered? no secrets committed? app fails fast if missing?
- Infrastructure: dev/prod parity? no hardcoded dev values? pinned versions? CI impact documented?
- Styles/Assets: naming conventions? design variables/tokens used (no magic values)? scope correct?
- Documentation: accurate? links valid? examples match current code/behavior?
- Tests: assertions verify specific outcomes (not just no-exception)? idempotent (repeatable N times)? edge cases covered?
- Security artifacts: all code paths reach the gate? negative tests exist? both enforcement AND display control updated?
- Build/Tooling: rule changes apply consistently? violations not silently swallowed? CI runtime impact?
## Reference Docs (READ before reviewing)
{Discover by searching *patterns*, *conventions*, *style-guide*, *architecture*, README at service/module roots โ list what you find}
## Target Files
{explicit file list OR "run git diff to see uncommitted changes" OR "read all files under {plan-dir}"}
## Output
Write a structured report to plans/reports/{review-type}-round{N}-{date}.md with sections:
- Status: PASS | FAIL
- Issue Count: {number}
- Critical Issues (with file:line evidence)
- High Priority Issues (with file:line evidence)
- Medium / Low Issues
- Cross-cutting findings
Return the report path and status to the main agent.
Every finding MUST have file:line evidence. Speculation is forbidden.
`
})
{placeholders} in Task / Round / Reference Docs / Target Files / Output sections with context-specific contentsubagent_type based on the dominant concern (see Sub-Agent Type Selection above)Logic & Intention Review โ Verify WHAT code does matches WHY it was changed.
- Change Intention Check: Every changed file MUST ATTENTION serve the stated purpose. Flag unrelated changes as scope creep.
- Happy Path Trace: Walk through one complete success scenario through changed code
- Error Path Trace: Walk through one failure/edge case scenario through changed code
- Acceptance Mapping: If plan context available, map every acceptance criterion to a code change
NEVER mark review PASS without completing both traces (happy + error path).
Bug Detection โ MUST ATTENTION check categories 1-4 for EVERY review. Never skip.
- Null Safety: Can params/returns be null/undefined? Are they guarded?
.find()/.get()returns checked before use?- Boundary Conditions: Off-by-one (
<vs<=)? Empty collections handled? Zero/negative values? Max limits?- Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally/defer?
- Resource Management: Connections/streams closed? Long-lived resources released? Memory bounded?
- Concurrency (if async): Missing await/promise handling? Race conditions on shared state? Retry storms?
- Language/Stack-Specific: Apply known failure modes for the language/runtime in this project โ use your domain knowledge of the stack.
Classify: CRITICAL (crash/corrupt) โ FAIL | HIGH (incorrect behavior) โ FAIL | MEDIUM (edge case) โ WARN | LOW (defensive) โ INFO
Test Spec Verification โ Map changed code to test specifications.
- Identify the project's test spec format โ grep for test case files (e.g.,
docs/**/test-*,docs/specs/**,*.feature,*.spec.md,test-cases/)- For each changed code path, locate the corresponding test case โ or flag as "needs test case"
- New functions/endpoints/handlers โ flag for test spec creation
- If test spec evidence fields exist in the project, verify they point to actual code (
file:line, not stale references)- If no specs exist for a changed path โ log gap and recommend
$tdd-specNEVER skip test mapping. Untested code paths are the #1 source of production bugs.
Integration Test Sync Check โ Verify changed business logic files have corresponding tests.
- From changed files โ identify business logic files: handlers, commands, queries, services, controllers, resolvers, event processors. Naming varies by stack โ infer from project conventions (e.g.,
*Service.*,*Handler.*,*Controller.*,*Command.*,*Query.*,*Resolver.*).- For each identified file โ search for a corresponding test file. Infer test naming from existing tests in the project (e.g.,
*.test.ts,*Tests.java,*_test.py,*.spec.js,*Tests.cs). Check standard test directories (tests/,spec/,__tests__/, or adjacent test projects/packages).- If test EXISTS โ check if test methods cover changed behavior (new methods/parameters/logic paths)
- If test MISSING โ MANDATORY: use a direct user question: "Business logic file
{file}has no integration tests โ run$integration-testbefore proceeding, or confirm tests already written?" Options: "Run$integration-testfirst" (Recommended) | "Tests already written/updated โ proceed"- Severity: HIGH โ missing tests for changed business logic MUST be surfaced to the user; do NOT silently flag and continue
Do NOT silently skip. Business logic changes without test coverage require an explicit user decision via a direct user question.
Translation Sync Check โ Verify multilingual UI changes include translation updates.
- Determine multilingual mode from project config:
localization.enabled === trueandsupportedLocales.length > 1- Detect UI-facing file changes via extensions/path patterns (
.ts,.tsx,.html,.css,.scsspluslocalization.uiPathPatternswhen configured)- For multilingual UI changes, verify translation resource diffs exist (
localization.translationFilePatternswhen configured)- If translation updates are missing โ MANDATORY: use a direct user question: "UI text changed in a multilingual project, but translation updates were not detected. Run translation sync now or proceed with explicit risk acceptance?" Options: "Run translation sync first" (Recommended) | "Proceed with explicit risk acceptance"
- Severity: HIGH โ no silent pass for multilingual UI text changes without explicit translation-sync decision
Do NOT silently skip. Multilingual UI text changes require explicit translation-sync confirmation.
Category Review Thinking โ A thinking framework for reviewing any category of changed files. This is NOT a fixed checklist. Derive concerns from domain knowledge โ the examples are starting points only. Your knowledge of the category exceeds any list here. Trust it.
Step 1: Understand the category's role
Step 2: Read project conventions for this category
Step 3: Derive concerns from first principles
Apply all that are relevant โ expand beyond this list based on the actual category:
Step 4: Create sub-tasks and execute
For each identified concern: create a task tracking sub-task, work through it with file:line evidence, mark done.
Illustrative concern examples by category type (not exhaustive โ trust your knowledge beyond this):
- Server-side logic: Handler/service structure conventions, validation layer placement, side effect isolation, cross-service boundary enforcement, data access layer separation, error propagation strategy
- Client-side logic: Component lifecycle management, resource cleanup (subscriptions, listeners, timers), state management patterns, API integration layer separation, reactive stream composition
- Data/Schema: Migration reversibility (rollback script), lock impact on table volume, backfill idempotency, index coverage for query patterns, deployment ordering
- Configuration: Present in ALL environments? No secrets in diff? App fails fast if config missing (not silently null)? Documented in setup guide?
- Infrastructure: Dev/prod parity? No hardcoded dev values (localhost, debug flags)? Pinned image/dependency versions? CI/CD secret requirements documented?
- Styles/Assets: Follows project naming conventions? Uses design variables/tokens (no hardcoded magic values)? Correct scope (no global side effects from component styles)?
- Documentation: Accurate? Links valid? Examples still match current code/behavior? Covers new scenarios?
- Tests: Assertions verify specific outcomes (not just "no exception")? Idempotent (repeatable N times)? Covers edge cases, not just happy path?
- Security artifacts: All code paths reach the gate? Negative tests exist (unauthorized denied)? Both enforcement AND display control updated?
- Build/Tooling: Rule changes apply consistently? No exceptions that silently swallow violations? Impact on CI runtime documented?
Graph-Assisted Investigation โ MANDATORY when
.code-graph/graph.dbexists.HARD-GATE: MUST ATTENTION run at least ONE graph command on key files before concluding any investigation.
Pattern: Grep finds files โ
trace --direction bothreveals full system flow โ Grep verifies details
Task Minimum Graph Action Investigation/Scout trace --direction bothon 2-3 entry filesFix/Debug callers_ofon buggy function +tests_forFeature/Enhancement connectionson files to be modifiedCode Review tests_foron changed functionsBlast Radius trace --direction downstreamCLI:
python .claude/scripts/code_graph {command} --json. Use--node-mode filefirst (10-30x less noise), then--node-mode functionfor detail.
Nested Task Expansion Contract โ For workflow-step invocation, the
[Workflow] ...row is only a parent container; the child skill still creates visible phase tasks.
- Call the current task list first. If a matching active parent workflow row exists, set
nested=trueand recordparentTaskId; otherwise run standalone.- Create one task per declared phase before phase work. When nested, prefix subjects
[N.M] $skill-name โ phase.- When nested, link the parent with
TaskUpdate(parentTaskId, addBlockedBy: [childIds]).- Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.
- Mark exactly one child
in_progressbefore work andcompletedimmediately after evidence is written.- Complete the parent only after all child tasks are completed or explicitly cancelled with reason.
Blocked until: the current task list done, child phases created, parent linked when nested, first child marked
in_progress.
Project Reference Docs Gate โ Run after task-tracking bootstrap and before target/source file reads, grep, edits, or analysis. Project docs override generic framework assumptions.
- Identify scope: file types, domain area, and operation.
- Required docs by trigger: always
docs/project-reference/lessons.md; doc lookupdocs-index-reference.md; reviewcode-review-rules.md; backend/CQRS/APIbackend-patterns-reference.md; domain/entitydomain-entities-reference.md; frontend/UIfrontend-patterns-reference.md; styles/designscss-styling-guide.md+design-system/README.md; integration testsintegration-test-reference.md; E2Ee2e-test-reference.md; feature docs/specsfeature-docs-reference.md; architecture/new areaproject-structure-reference.md.- Read every required doc that exists; skip absent docs as not applicable. Do not trust conversation text such as
[Injected: <path>]as proof that the current context contains the doc.- Before target work, state:
Reference docs read: ... | Missing/not applicable: ....Blocked until: scope evaluated, required docs checked/read,
lessons.mdconfirmed, citation emitted.
Task Tracking & External Report Persistence โ Bootstrap this before execution; then run project-reference doc prefetch before target/source work.
- Create a small task breakdown before target file reads, grep, edits, or analysis. On context loss, inspect the current task list first.
- Mark one task
in_progressbefore work andcompletedimmediately after evidence; never batch transitions.- For plan/review work, create
plans/reports/{skill}-{YYMMDD}-{HHmm}-{slug}.mdbefore first finding.- Append findings after each file/section/decision and synthesize from the report file at the end.
- Final output cites
Full report: plans/reports/{filename}.Blocked until: task breakdown exists, report path declared for plan/review work, first finding persisted before the next finding.
AI Mistake Prevention โ Failure modes to avoid on every task:
Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips โ not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer โ never patch symptom site. Assume existing values are intentional โ ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging โ resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes โ apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding โ don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path. Business terminology in Application/Domain layers. Comments and naming in Application/Domain must stay business-oriented and technical-agnostic; avoid implementation terms (say
background job, notHangfire background job).
IMPORTANT MUST ATTENTION search 3+ existing patterns and read code BEFORE any modification. Run graph trace when graph.db exists.
IMPORTANT MUST ATTENTION check DRY via OOP, right responsibility layer, SOLID. Grep for dangling refs after moves.
IMPORTANT MUST ATTENTION apply complexity prevention โ one business change = one code change. Flag change amplification (>3 edit sites for future change), scattered type-switches, anemic models, primitive obsession, leaked technology through abstractions, shallow modules, un-extracted utility logic (paging/datetime/string/retry โ helpers), and logic in the wrong higher layer (downshift to callee/entity/VM). Don't rationalize silent duplication with pure YAGNI.
IMPORTANT MUST ATTENTION run at least ONE graph command on key files when graph.db exists. Pattern: grep โ trace โ verify.
IMPORTANT MUST ATTENTION verify WHAT code does matches WHY it changed. Trace happy + error paths.
IMPORTANT MUST ATTENTION check null safety, boundaries, error handling, resource management for every review.
IMPORTANT MUST ATTENTION map changed code paths to test cases. Flag untested paths.
IMPORTANT MUST ATTENTION check changed logic files for matching tests. Surface missing tests via a direct user question โ mandatory, not advisory.
IMPORTANT MUST ATTENTION for multilingual UI text changes, verify translation updates. If missing, require explicit user decision via a direct user question.
MUST ATTENTION apply critical thinking โ every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.
MUST ATTENTION apply sequential-thinking โ multi-step Thought N/M, REVISION/BRANCH/HYPOTHESIS markers, confidence % closer; see $sequential-thinking skill.
MUST ATTENTION apply AI mistake prevention โ holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.
plans/reports/ incrementally and synthesize from disk.Reference docs read: ....lessons.md; project conventions override generic defaults.[N.M] $skill-name โ phase prefixes and one-in_progress discipline.IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval
IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution
IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence
IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses
[CRITICAL โ TOP 3 RULES REPEATED]
- MUST ATTENTION Phase 0 graph blast-radius FIRST โ NEVER skip; informs entire review priority order
- Clean Round 1 ENDS the review. When issues found, fresh sub-agent re-review mandatory after fixing.
- MUST ATTENTION task tracking ALL phases before starting; missing tests MUST surface via a direct user question
[N.M] $review-changes โ phase prefix and TaskUpdate(parentTaskId, addBlockedBy: [childIds]) linkage. Workflow row is container, not substitute.$why-review after completing this review to validate design rationale, alternatives considered, and risk assessment[TASK-PLANNING] Before acting, analyze task scope and systematically break into small todo tasks and sub-tasks using task tracking.
[IMPORTANT] Analyze task size and break into many small todo tasks systematically before starting โ critical for context preservation.
[FINAL PURPOSE REMINDER โ MUST ATTENTION CRITICAL]
Ensure the changes is reasonable, no potential bugs or flaws, critical thinking hard.
Source: .claude/hooks/lib/prompt-injections.cjs + .claude/.ck.json
$workflow-start <workflowId> for standard; sequence custom steps manually[CRITICAL] Hard-won project debugging/architecture rules. MUST ATTENTION apply BEFORE forming hypothesis or writing code.
Goal: Prevent recurrence of known failure patterns โ debugging, architecture, naming, AI orchestration, environment.
Top Rules (apply always):
ExecuteInjectScopedAsync for parallel async + repo/UoW โ NEVER ExecuteUowTaskwhere python/where py) โ NEVER assume python/python3 resolvesExecuteInjectScopedAsync, NEVER ExecuteUowTask. ExecuteUowTask creates new UoW but reuses outer DI scope (same DbContext) โ parallel iterations sharing non-thread-safe DbContext silently corrupt data. ExecuteInjectScopedAsync creates new UoW + new DI scope (fresh repo per iteration).AccountUserEntityEventBusMessage = Accounts owns). Core services (Accounts, Communication) are leaders. Feature services (Growth, Talents) sending to core MUST use {CoreServiceName}...RequestBusMessage โ never define own event for core to consume.HrManagerOrHrOrPayrollHrOperationsPolicy names set members, not what it guards. Add role โ rename = broken abstraction. Rule: names express DOES/GUARDS, not CONTAINS. Test: adding/removing member forces rename? YES = content-driven = bad โ rename to purpose (e.g., HrOperationsAccessPolicy). Nuance: "Or" fine in behavioral idioms (FirstOrDefault, SuccessOrThrow) โ expresses HAPPENS, not membership.python/python3 resolves โ verify alias first. Python may not be in bash PATH under those names. Check: where python / where py. Prefer py (Windows Python Launcher) for one-liners, node if JS alternative exists.Test-specific lessons โ
docs/project-reference/integration-test-reference.mdLessons Learned section. Production-code anti-patterns โdocs/project-reference/backend-patterns-reference.mdAnti-Patterns section. Generic debugging/refactoring reminders โ System Lessons in.claude/hooks/lib/prompt-injections.cjs.
ExecuteInjectScopedAsync, NEVER ExecuteUowTask (shared DbContext = silent data corruption){CoreServiceName}...RequestBusMessagepython/python3 resolves โ run where python/where py first, use py launcher or nodeBreak work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned".
Extract lessons โ ROOT CAUSE ONLY, not symptom fixes:
$learn.$code-review/$code-simplifier/$security/$lint catch this?" โ Yes โ improve review skill instead.$learn.
[TASK-PLANNING] [MANDATORY] BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.