| name | debug-investigate |
| description | [Fix & Debug] Use when investigating a bug's root cause — reproduce the symptom, trace it end-to-start through the code, form and test hypotheses, and pinpoint the defect before any fix. |
Codex compatibility note:
- Invoke repository skills with
$skill-name in Codex; this mirrored copy rewrites legacy Claude /skill-name references.
- Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.
- User-question prompts mean to ask the user directly in Codex.
- Ignore Claude-specific mode-switch instructions when they appear.
- Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.
- Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required
spawn_agent subagent(s) for that task.
- Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.
- For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.
- If a required step/tool cannot run in this environment, stop and ask the user before adapting.
Codex Project-Reference Loading (No Hooks)
Codex uses static project-reference loading instead of runtime-injected project docs.
When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.
Always read:
docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)
docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)
docs/project-reference/lessons.md (always-on guardrails and anti-patterns)
Missing/stale context route: If docs/project-config.json, the docs index, lessons.md, CLAUDE.md, AGENTS.md, or any task-required reference doc is missing or stale, auto-run $project-init or the narrow setup route ($project-config, $docs-init, $scan-all, $scan --target=<key>, $claude-md-init) before ordinary project-specific work. If Codex mirrors or AGENTS.md are missing/stale, ask the user to run $sync-codex; do not auto-run it.
Situation-based docs:
- Backend/CQRS/API/domain/entity changes:
backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.md
- Frontend/UI/styling/design-system:
frontend-patterns-reference.md, scss-styling-guide.md, design-system/README.md
- Spec authoring,
docs/specs/ pathing, or TC format: feature-spec-reference.md, spec-system-reference.md, spec-principles.md
- Behavior/public-contract changes or spec-test-code sync:
workflow-spec-test-code-cycle-reference.md plus the spec docs above
- Derived spec indexes/ERDs/reimplementation guides:
spec-system-reference.md and source Feature Specs under docs/specs/
- Integration test implementation/review:
integration-test-reference.md
- E2E test implementation/review:
e2e-test-reference.md
- Code review/audit work:
code-review-rules.md plus domain docs above based on changed files
Do not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.
[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval.
[BLOCKING] Before each step or sub-skill call, update task tracking: set in_progress when step starts, set completed when step ends.
[BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason.
[BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.
Quick Summary
Goal: Deliver a $why-review-validated root cause pinned to file:line at the invariant-owning layer — investigation-only, so $fix corrects the cause, not the symptom — or an honest "hypothesis, not confirmed" naming the evidence gaps.
Summary:
- This is investigation-ONLY — never patch here; classify the bug type FIRST (Phase 0, BLOCKING) to route to the right agent and decide which evidence matters before tracing anything.
- Trace end-to-start: name Frame 0 (observed final state), walk backward reader → storage/projection → writer → consumer/job → producer, and enumerate ALL feeder paths — the bug enters where bad state is WRITTEN, not where it crashes.
- Every root-cause claim carries
Confidence: X% + file:line proof; below 60% you report "hypothesis, not confirmed" with named evidence gaps, never a guess.
- The
$why-review gate is non-negotiable: run it in the SAME session/main agent before declaring confirmed; 2 rounds without passing → STOP and escalate via a direct user question. Run a graph trace when graph.db exists — it surfaces bus/event consumers grep cannot see.
Workflow:
- Classify — Detect bug scenario type (Phase 0) → route to specialized agent
- Reproduce — Confirm expected vs actual with evidence
- Hypothesize — Form 2-3 ranked theories
- Trace — Follow code paths; collect
file:line proof per hypothesis
- Confirm — Single root cause explains ALL symptoms
- Validate — Trigger
$why-review on findings/root cause before declaring confirmed
- Report — Confidence-tagged finding + hand off to
$fix
Key Rules:
- NEVER patch symptoms — trace full call chain, fix at owning layer
- NEVER report root cause without
file:line evidence
- NEVER declare confirmed root cause without passing the
$why-review validation gate
- Output: confirmed root cause OR "hypothesis, not confirmed" + evidence gaps
Phase 0: Classify Bug Scenario (BLOCKING — Do Before ANY Investigation)
Think: What type of failure is this? Classification routes to the right agent and determines which evidence matters most.
| Bug Type | Signals | Specialized Agent |
|---|
| Frontend UI / rendering | Console errors, visual regression, component state | debugger |
| Backend logic / data | Wrong API response, data corruption, validation failure | debugger |
| Cross-service / message bus | Events not propagating, consumer failures, sync lag | debugger + graph trace MANDATORY |
| Performance / memory | Slow queries, OOM, N+1, unbounded result sets | performance-optimizer |
| Security / auth | Access denied, token issues, permission bypass | security-auditor |
Cross-service bugs: Run graph trace FIRST — grep alone misses implicit bus connections.
OOM / memory exhaustion: Check row COUNT before row SIZE. Unbounded query loading thousands of records is more common cause. Triage: (1) missing DB-level filter? (2) excessive row size?
Debug Mindset (NON-NEGOTIABLE)
Skeptical. Sequential. Every claim needs traced proof, confidence >80%.
- NEVER assume first hypothesis correct — verify with actual code traces
- Every root cause claim MUST include
file:line evidence
- Cannot prove root cause → state "hypothesis, not confirmed"
- Challenge assumptions: "Is this really the cause?" → trace actual execution path
- Challenge completeness: "Other contributing factors?" → check related code paths
Confidence & Evidence Gate
MUST ATTENTION declare Confidence: X% + evidence list + file:line proof for EVERY claim.
| Confidence | Meaning | Action |
|---|
| 95-100% | Full trace verified | Report as confirmed root cause |
| 80-94% | Main path verified, edge cases uncertain | Report with caveats |
| 60-79% | Partial trace | Report as hypothesis |
| <60% | Insufficient evidence | DO NOT report — gather more evidence |
Investigation Dimensions
Reason through each dimension — state what fails if weak, then apply with evidence.
Dim 1: Reproduce
Think: What exact conditions trigger this? Data state? User action? Timing? Environment delta?
- Confirm issue exists with evidence (error message, stack trace, screenshot)
- Identify trigger: user action, data state, timing, env difference
Dim 2: Hypothesize
Think: Given symptoms, what are the most plausible failure modes? What would confirm vs contradict each?
- Form 2-3 theories ranked by likelihood
- Note evidence needed to confirm/contradict each theory before investigating
Dim 3: End-to-Start Trace
Think: What exact final output proves the bug? Which reader produced it? Which storage/projection/write path fed that reader? Where does bad state ENTER the system — not where it CRASHES? Which layer owns this invariant?
- Name Frame 0: observed final state (UI, API response, log, persisted value, assertion, aggregate)
- Identify the final reader/query/renderer/assertion and the state it consumes
- Walk backward: reader -> storage/projection/cache -> writer -> consumer/handler/job -> producer/origin
- Enumerate every feeder path that can write the same final state
- Check error handling paths
- Collect
file:line evidence per hypothesis
- Use graph trace for implicit connections (event handlers, bus consumers)
Dim 4: Confirm
Think: Does this root cause explain ALL symptoms? Are there bypass paths that skip the fix point?
- Match evidence to single root cause
- Verify root cause explains ALL observed symptoms
- Check secondary contributing factors
- Build hypothesis matrix: primary, contributing, ruled out, latent, unknown
- Resolve or disclose competing causes before proposing a fix
- Verify no bypass paths (direct construction, clone/spread without re-validation, mutations outside model layer)
Dim 5: Report
- Output: confirmed root cause + evidence chain
- Include: affected files, Debugger Trace: End -> Start, feeder paths, hypothesis matrix, data flow summary, owning fix layer, fix recommendation, forward convergence proof
- Hand off to
$fix for implementation
Dependency Tracing (MANDATORY when graph.db exists)
MUST ATTENTION use structural queries — graph reveals ALL callers/consumers grep misses.
python .claude/scripts/code_graph query callers_of <function> --json
python .claude/scripts/code_graph query importers_of <file> --json
python .claude/scripts/code_graph query tests_for <function> --json
python .claude/scripts/code_graph trace <suspect-file> --direction both --json
python .claude/scripts/code_graph trace <suspect-file> --direction upstream --json
Graph reveals implicit connections (MESSAGE_BUS, event handlers) that propagate issues across services — invisible to grep.
Root Cause Validation ($why-review Gate)
NEVER declare a confirmed root cause straight from investigation. Run $why-review as a quality validation gate on the findings and root cause — in the SAME session, SAME main agent (do NOT spawn a sub-agent) — before handing off to $fix.
Step 1 — Investigate (main agent): Identify root cause + full evidence chain. Write findings to report file.
Step 2 — Validate ($why-review, same main agent): Trigger $why-review on the findings/root cause. The gate must confirm:
- Root cause is correct and reasonable, with
file:line evidence that conclusively supports it
- Evidence has no gaps and explains ALL symptoms
- The proposed fix direction would NOT introduce other bugs or regressions (check downstream consumers, bypass paths, owning layer)
Decision:
$why-review PASSES → declare confirmed, proceed to $fix
$why-review finds GAPS/risks → collect additional evidence, repeat
- 2 validation rounds without passing → STOP, escalate to user via a direct user question
⚠️ MANDATORY: Post-Fix Verification
After $fix applies changes, $prove-fix MUST be run — builds code proof traces per change with confidence scores. Non-negotiable in all fix workflows.
Anti-Rationalization (Red Flags)
| Evasion | Rebuttal |
|---|
| "I see the problem, let me fix it" | Symptoms ≠ root cause. Investigate first. |
| "Quick fix for now, investigate later" | Quick fixes mask bugs. Find root cause. |
| "Just try changing X and see" | One hypothesis at a time. Scientific method, not trial and error. |
| "Already tried 2+ fixes, one more" | 3+ failed fixes = STOP. Question the architecture, not the fix. |
| "The error message is misleading" | Read it again carefully. Error messages are usually right. |
| "It works on my machine" | Reproduce in the failing environment. Your environment hides bugs. |
| "This can't be the cause" | Verify with evidence, not intuition. Unlikely causes are still causes. |
| "It's OOM, must be a large object" | Check row COUNT before row SIZE. Unbounded query > large single row. |
"Skip $why-review, findings look solid" | Self-confirmed findings rationalize their own gaps. The $why-review gate is non-negotiable. |
| "Graph.db not needed for this bug" | Cross-service bugs are invisible to grep. Run trace first. |
Workflow Recommendation
MUST ATTENTION — NO EXCEPTIONS: Not in workflow? Use a direct user question:
- Activate
workflow-bugfix workflow (Recommended) — scout → investigate → debug → plan → fix → prove-fix → review → test
- Execute
$debug-investigate directly — standalone
Next Steps (Standalone only — skip if inside workflow)
MUST ATTENTION use a direct user question after completing. NEVER auto-decide next step:
- "Proceed with full workflow (Recommended)" — detect best workflow to continue from here
- "$fix" — apply fix based on debug findings
- "$plan" — if fix requires planning first
- "Skip, continue manually" — user decides
Standalone Review Gate: Outside workflow? MUST create $review-changes task as LAST task.
[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files.
docs/project-reference/domain-entities-reference.md — Domain entity catalog, relationships, cross-service sync (read when task involves business entities/models)
End-to-Start Debugger Trace — For non-trivial bugs, failed verification, regression fixes, behavior-changing code, or unclear code flow, start from the observed final state and walk backward before proposing a fix.
- Frame 0: observed end state — Name the exact user-visible output, failing assertion, log line, persisted value, API response, rendered UI, or aggregate bucket. Record the reader/query/renderer that produced it with
file:line evidence.
- Walk backward one hop at a time — Trace final reader -> projection/cache/storage -> writer -> consumer/handler/job -> producer/caller -> original trigger. At every hop record: input, transformation, output, owner, and evidence.
- Enumerate all feeder paths — Find every upstream producer/caller/event/job that can write into the final path, including retry, async, cache, background, and alternate UI/API paths. Mark each path verified, ruled out, or still unknown.
- Build the hypothesis matrix — For each plausible cause, list evidence for, evidence against, how to reproduce/verify, blast radius, and status (
primary, contributing, ruled out, latent). Do not fix until competing causes are explicitly resolved or bounded.
- Choose the owning fix layer — Identify the invariant owner and the lowest shared point that protects all downstream consumers. A fix at the symptom site is rejected unless the symptom site owns the invariant.
- Prove convergence forward — After choosing the fix, walk start -> end again and show how the corrected state reaches the observed final output. Map each root cause to a fix part and each fix part to a test/proof.
BLOCKED until: final state named · backward trace written · all feeder paths enumerated · hypothesis matrix completed · owning fix layer justified · forward convergence proof mapped to tests.
NEVER: Start at the first suspicious code path. Collapse multiple producers into one "flow". Treat duplicate symptoms as duplicate records without proving the read model. Skip ruled-out hypotheses.
Root Cause Debugging — Systematic approach, never guess-and-check.
- Reproduce — Confirm the issue exists with evidence (error message, stack trace, screenshot)
- Isolate — Narrow to specific file/function/line using binary search + graph trace
- Trace — Follow data flow from input to failure point. Read actual code, don't infer.
- Hypothesize — Form theory with confidence %. State what evidence supports/contradicts it
- Verify — Test hypothesis with targeted grep/read. One variable at a time.
- Fix — Address root cause, not symptoms. Verify fix doesn't break callers via graph
connections
NEVER: Guess without evidence. Fix symptoms instead of cause. Skip reproduction step.
Incremental Result Persistence — MANDATORY for all sub-agents or heavy inline steps processing >3 files.
- Before starting: Create report file
plans/reports/{skill}-{date}-{slug}.md
- After each file/section reviewed: Append findings to report immediately — never hold in memory
- Return to main agent: Summary only (per SYNC:subagent-return-contract) with
Full report: path
- Main agent: Reads report file only when resolving specific blockers
Why: Context cutoff mid-execution loses ALL in-memory findings. Each disk write survives compaction. Partial results are better than no results.
Report naming: plans/reports/{skill-name}-{YYMMDD}-{HHmm}-{slug}.md
Sub-Agent Return Contract — When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary — NEVER requests full sub-agent output inline.
## Sub-Agent Result: [skill-name]
Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL
Confidence: [0-100]%
### Findings (Critical/High only — max 10 bullets)
- [severity] [file:line] [finding]
### Actions Taken
- [file changed] [what changed]
### Blockers (if any)
- [blocker description]
Full report: plans/reports/[skill-name]-[date]-[slug].md
Main agent reads Full report file ONLY when: (a) resolving a specific blocker, or (b) building a fix plan.
Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — not held in memory.
Context budget — the return payload is a SUMMARY, not a transcript: ≤10 finding bullets, no raw file contents / full diffs / verbatim logs inline, no re-pasted source. Everything beyond the summary lives in the Full report on disk. A sub-agent that would exceed the summary shape MUST write the detail to its report and return only the pointer — the orchestrator's context is the scarce resource the whole map-reduce protects.
Source/test drift check. For coding, fix, debug, investigation, test, or review work: when source behavior changes, inspect affected unit/integration/E2E tests and decide from evidence whether tests should change to match intended behavior or the source change is an unintended bug to fix. Do not write tests for migration code; schema/data migrations are one-time execution paths, not core application logic.
AI Mistake Prevention — Failure modes to avoid on every task:
Re-read files after context changes. Context compaction, resume, or long-running work can make memory stale; verify current files before acting.
Verify generated content against source evidence. AI hallucinates APIs, names, claims, and document facts. Check the relevant source before documenting or referencing.
Check downstream references before deleting or renaming. Removing an artifact can stale docs, generated mirrors, configs, and callers; map references first.
Trace the full impact chain after edits. Changing a definition can miss derived outputs and consumers. Follow the affected chain before declaring done.
Verify ALL affected outputs, not just the first. One green check is not all green checks; validate every output surface the change can affect.
Assume existing values are intentional — ask WHY before changing. Before changing a constant, limit, flag, wording, or pattern, read nearby context and history.
Surface ambiguity before acting — don't pick silently. Multiple valid interpretations require an explicit question or stated assumption with risk.
Keep shared guidance role-relevant. Universal guidance must help every receiving skill or agent; code-specific obligations belong only in code-specific protocols.
Nested Task Expansion Contract — For workflow-step invocation, the [Workflow] ... row is only a parent container; the child skill still creates visible phase tasks.
- Call the current task list first. If a matching active parent workflow row exists, set
nested=true and record parentTaskId; otherwise run standalone.
- Create one task per declared phase before phase work. When nested, prefix subjects
[N.M] $skill-name — phase.
- When nested, link the parent with
TaskUpdate(parentTaskId, addBlockedBy: [childIds]).
- Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.
- Mark exactly one child
in_progress before work and completed immediately after evidence is written.
- Complete the parent only after all child tasks are completed or explicitly cancelled with reason.
Blocked until: the current task list done, child phases created, parent linked when nested, first child marked in_progress.
Project Reference Docs Gate — Run after task-tracking bootstrap and before target/source file reads, grep, edits, or analysis. Project docs override generic framework assumptions.
- Identify scope: file types, domain area, and operation.
- Required docs by trigger: always
docs/project-reference/lessons.md; doc lookup docs-index-reference.md; review code-review-rules.md; backend/CQRS/API backend-patterns-reference.md; domain/entity domain-entities-reference.md; frontend/UI frontend-patterns-reference.md; styles/design scss-styling-guide.md + design-system/design-system-canonical.md; integration tests integration-test-reference.md; E2E e2e-test-reference.md; feature docs/specs feature-spec-reference.md + spec-system-reference.md + spec-principles.md; behavior/public-contract/spec-test-code sync workflow-spec-test-code-cycle-reference.md; derived spec index/ERD/reimplementation guides spec-system-reference.md + source Feature Specs under docs/specs/; architecture/new area project-structure-reference.md.
- Read every required doc. If
docs/project-config.json, the docs index, lessons.md, CLAUDE.md, AGENTS.md, or any task-required reference doc is missing or stale, auto-run $project-init or the narrow lower-level route ($project-config, $docs-init, $scan-all, $scan --target=<key>, $claude-md-init) before ordinary project-specific work. If Codex mirrors or AGENTS.md are missing/stale, ask the user to run $sync-codex; do not auto-run it.
- Before target work, state:
Reference docs read: ... | Not applicable: ....
Ready when: scope evaluated, required docs checked/read or setup route completed, lessons.md confirmed, citation emitted.
Task Tracking & External Report Persistence — Bootstrap this before execution; then run project-reference doc prefetch before target/source work.
- Create a small task breakdown before target file reads, grep, edits, or analysis. On context loss, inspect the current task list first.
- Mark one task
in_progress before work and completed immediately after evidence; never batch transitions.
- For plan/review work, create
plans/reports/{skill}-{YYMMDD}-{HHmm}-{slug}.md before first finding.
- Append findings after each file/section/decision and synthesize from the report file at the end.
- Final output cites
Full report: plans/reports/{filename}.
Blocked until: task breakdown exists, report path declared for plan/review work, first finding persisted before the next finding.
Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act.
Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.
Sequential Thinking Protocol — Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.
Trigger when: complex problem decomposition · adaptive plans needing revision · analysis with course correction · unclear/emerging scope · multi-step solutions · hypothesis-driven debugging · cross-cutting trade-off evaluation.
Format (explicit mode — visible thought trail):
Thought N/M: [aspect] — one aspect per thought, state assumptions/uncertainty
Thought N/M [REVISION of Thought K]: ... — when prior reasoning invalidated; state Original / Why revised / Impact
Thought N/M [BRANCH A from Thought K]: ... — explore alternative; converge with decision rationale
Thought N/M [HYPOTHESIS]: ... then [VERIFICATION]: ... — test before acting
Thought N/N [FINAL] — only when verified, all critical aspects addressed, confidence >80%
Mandatory closers: Confidence % stated · Assumptions listed · Open questions surfaced · Next action concrete.
Stop conditions: confidence <80% on any critical decision → escalate via ask the user directly · ≥3 revisions on same thought → re-frame the problem · branch count >3 → split into sub-task.
Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).
Deep-dive: see $sequential-thinking skill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (API design, debugging, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).
Understand Code First — HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
- Search 3+ similar patterns (
grep/glob) — cite file:line evidence
- Read existing files in target area — understand structure, base classes, conventions
- Run
python .claude/scripts/code_graph trace <file> --direction both --json when .code-graph/graph.db exists
- Map dependencies via
connections or callers_of — know what depends on your target
- Write investigation to
.ai/workspace/analysis/ for non-trivial tasks (3+ files)
- Re-read analysis file before implementing — never work from memory alone. — why: long context drifts from the file; the file is ground truth
- NEVER invent new patterns when existing ones work — match exactly or document deviation. — why: divergent patterns fragment the codebase and slow every future reader
BLOCKED until: - [ ] Read target files - [ ] Grep 3+ patterns - [ ] Graph trace (if graph.db exists) - [ ] Assumptions verified with evidence
Evidence-Based Reasoning — Speculation is FORBIDDEN. Every claim needs proof.
- Cite
file:line, grep results, or framework docs for EVERY claim
- Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend
- Cross-service validation required for architectural changes
- "I don't have enough evidence" is valid and expected output
BLOCKED until: - [ ] Evidence file path (file:line) - [ ] Grep search performed - [ ] 3+ similar patterns found - [ ] Confidence level stated
Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because"
If incomplete → output: "Insufficient evidence. Verified: [...]. Not verified: [...]."
Cross-Service Check — Microservices/event-driven: MANDATORY before concluding investigation, plan, spec, or feature doc. Missing downstream consumer = silent regression.
| Boundary | Grep terms |
|---|
| Event producers | Publish, Dispatch, Send, emit, EventBus, outbox, IntegrationEvent |
| Event consumers | Consumer, EventHandler, Subscribe, @EventListener, inbox |
| Sagas/orchestration | Saga, ProcessManager, Choreography, Workflow, Orchestrator |
| Sync service calls | HTTP/gRPC calls to/from other services |
| Shared contracts | OpenAPI spec, proto, shared DTO — flag breaking changes |
| Data ownership | Other service reads/writes same table/collection → Shared-DB anti-pattern |
Per touchpoint: owner service · message name · consumers · risk (NONE / ADDITIVE / BREAKING).
BLOCKED until: Producers scanned · Consumers scanned · Sagas checked · Contracts reviewed · Breaking-change risk flagged
Estimation Framework — Bottom-up first; SP DERIVED; output min-max range when likely ≥3d. Stack-agnostic. Baseline: 3-5yr dev, 6 productive hrs/day. AI estimate assumes Claude Code + project context.
Method:
- Blast Radius pass (below) — drives code AND test cost
- Decompose phases → hours/phase →
bottom_up_hours = Σ phase_hours
likely_days = ceil(bottom_up_hours / 6) × productivity_factor
- Sum Risk Margin (base + add-ons) →
max_days = likely_days × (1 + margin)
min_days = likely_days × 0.9
- Output as range when
likely_days ≥3; single point allowed <3 (still record margin)
man_days_ai = same range × AI speedup
story_points DERIVED from likely_days via SP-Days — NEVER driver. Disagreement >50% → trust bottom-up
Productivity factor: 0.8 strong scaffolding+codegen+AI hooks · 1.0 mature default · 1.2 weak patterns · 1.5 greenfield
Cost Driver Heuristic (apply BEFORE work-type row):
- UI dominates in CRUD/business apps — 1.5-3x backend (states, validation, responsive, a11y, polish)
- Backend dominates ONLY: multi-aggregate invariants, cross-service contracts, schema migrations, heavy query/perf, new event flows
Reuse-vs-Create axis (PRIMARY lever, per layer):
| UI tier | Cost |
|---|
| Reuse component on existing screen | 0.1-0.3d |
| Add control/column to existing screen | 0.3-0.8d |
| Compose components into NEW screen | 1-2d |
| NEW screen, custom layout/states/validation | 2-4d |
| NEW shared/common component (themed, tested) | 3-6d+ |
| Backend tier | Cost |
|---|
| Reuse query/handler from new place | 0.1-0.3d |
| Small update existing handler/entity | 0.3-0.8d |
| NEW query on existing repo/model | 0.5-1d |
| NEW command/handler on existing aggregate (additive) | 1-2d |
| NEW aggregate/entity (repo, validation, events) | 2-4d |
| NEW cross-service contract OR schema migration | 2-4d each |
| Multi-aggregate invariant / heavy domain rule | 3-5d |
Rule: Sum tiers across UI+backend+tests, apply productivity factor. Reuse short-circuits tiers — call out.
Test-Scope drivers (compute test_count EXPLICITLY — "+tests" hand-wave is #1 failure):
| Driver | Count |
|---|
| Happy-path journeys | 1 per story / AC main flow |
| State-machine transitions | reachable transitions × allowed actors |
| Multi-entity state combos | state(A) × state(B) — REACHABLE only, not Cartesian |
| Authorization matrix | (owner, non-owner, elevated, unauth) × each mutation |
| Validation rules | 1 per required field / boundary / format / cross-field |
| UI states (per new screen/dialog) | happy, loading, empty, error, partial — present only |
| Negative paths / invariants | 1 per violatable business rule |
| Test tier (Trad, incl. setup+assert+flake) | Cost |
|---|
| 1-5 cases, fixtures reused | 0.3-0.5d |
| 6-12 cases, 1 new fixture | 0.5-1d |
| 13-25 cases, multi-entity setup | 1-2d |
| 26-50 cases OR new state-machine coverage | 2-3d |
| >50 cases OR full E2E journey | 3-5d |
Test multipliers: new fixture/seed harness +0.5d · cross-service/bus assertion +0.3d each · UI E2E ×1.5 · each new role +1-2 cases
Blast Radius (mandatory pre-pass — affects code AND test):
- Files/components directly modified — count
- Of those, "complex" (>500 LOC, multi-handler, central, frequently-modified) — count
- Downstream consumers (callers, event subscribers, cross-service) — list
- Shared/common code touched (multi-app blast) — yes/no
- Regression scope — areas needing re-test
Rule: Complex touch → add risk_factors. Each downstream consumer → +1-3 regression cases. Blast >5 areas OR >2 complex → re-evaluate SPLIT before estimating.
Risk Margin (drives max bound):
| likely_days | Base margin |
|---|
| <1d trivial | +10% |
| 1-2d small additive | +20% |
| 3-4d real feature | +35% |
| 5-7d large | +50% |
| 8-10d very large | +75% |
| >10d | +100% AND flag SHOULD SPLIT |
Risk-factor add-ons (additive — enumerate in risk_factors):
| Factor | +margin |
|---|
touches-complex-existing-feature (>500 LOC, multi-handler, central) | +20% |
cross-service-contract change | +25% |
schema-migration-on-populated-data | +25% |
new-tech-or-unfamiliar-pattern | +30% |
regression-fan-out (≥3 downstream areas re-test) | +20% |
performance-or-latency-critical | +20% |
concurrency-race-event-ordering | +25% |
shared-common-code (multi-consumer/multi-app) | +25% |
unclear-requirements-or-design | +30% |
Collapse rule: total margin >100% → STOP, split (padding past 2x is dishonesty). Margin <15% on likely_days ≥5 → under-estimated, widen.
Work-Type Caps (hard ceilings on likely_days):
| Work type | Max SP | Max likely |
|---|
| Single field / config flag / style fix | 1 | 0.5d |
| Add property to existing model + bind to existing UI | 2 | 1d |
| Additive endpoint + minor UI control (button/menu/column), reuses fixtures | 3 | 2-3d |
| Additive endpoint + NEW UI surface OR additive multi-layer + new domain rule + 2+ test files | 5 | 3-5d |
| NEW model/aggregate OR migration OR cross-module contract OR heavy test (>1.5d) OR NEW UI + non-trivial backend | 8 | 5-7d |
| NEW UI surface + (NEW aggregate OR migration OR cross-service contract) | 13 | SHOULD split |
| Cross-service contract + migration combined | 13 | SHOULD split |
| Beyond | 21 | MUST split |
SP→Days (validation only): 1=0.5d/0.25d · 2=1d/0.35d · 3=2d/0.65d · 5=4d/1.0d · 8=6d/1.5d · 13=10d/2.0d (Trad/AI likely)
AI speedup: SP 1≈2x · 2-3≈3x · 5-8≈4x · 13+≈5x. AI cost = (code_gen × 1.3) + (test_gen × 1.3) (30% review overhead).
MANDATORY frontmatter:
story_points: <n>
complexity: low | medium | high | critical
man_days_traditional: '<min>-<max>d'
man_days_ai: '<min>-<max>d'
risk_margin_pct: <n>
risk_factors: [touches-complex-existing-feature, regression-fan-out]
blast_radius:
touched_areas: <n>
complex_touched: <n>
downstream_consumers: [list or count]
shared_common_code: yes | no
estimate_scope_included: [code, integration-tests, frontend, i18n, docs]
estimate_scope_excluded: [unit-tests, e2e, perf, deployment, code-review-rounds]
estimate_reasoning: |
5-7 lines covering:
(a) UI tier — row applied
(b) Backend tier — row applied
(c) Test scope — case breakdown by driver, file count, fixtures, tier row
(d) Cost driver — dominant tier + why
(e) Blast radius — touched, complex, regression scope
(f) Risk factors — list driving margin; why not larger/smaller
Example: "UI: compose Form/Table/Dialog → NEW screen (~1.5d). Backend: NEW command on existing aggregate,
reuses validation+repo (~1d). Tests: 4 transitions × 2 actors + 3 validation + 2 UI states = 13 cases,
1 new fixture → tier 13-25 ~1.5d. Driver: UI composition + new states. Blast: 4 areas, 1 complex.
Risk: base 35% + touches-complex +20% = 55% → max 3.9d → range 2.5-4d."
Sanity self-check:
likely_days ≥3d and single-point? → reject, must be range
- Margin <15% on
likely_days ≥5d? → under-estimated, widen
- Margin >100%? → STOP, split instead of buffer
- Complex existing feature touched, no regression budget in
(c)? → reject
- Blast
>5 areas OR >2 complex, no split discussion? → reject
- Purely additive on existing model AND existing UI? → cap SP 3 unless tests >1.5d
- NEW UI surface (page/complex form/dashboard)? → SP 5+ even if backend one endpoint
- Backend cross-service / migration / multi-aggregate? → SP 8+ regardless of UI
bottom_up_hours / 6 vs SP-Days disagreement >50%? → trust bottom-up, downgrade SP
- Without tests, SP drops ≥1 bucket? → tests dominate; state explicitly
- Reasoning called out UI vs backend vs blast vs risk factors? → if missing, add
Red Flag Stop Conditions — STOP and escalate to user via ask the user directly when:
- Confidence drops below 60% on any critical decision
- Changes would affect >20 files (blast radius too large)
- Cross-service boundary is being crossed
- Security-sensitive code (auth, crypto, PII handling)
- Breaking change detected (interface, API contract, DB schema)
- Test coverage would decrease after changes
- Approach requires technology/pattern not in the project
NEVER proceed past a red flag without explicit user approval.
Fix-Layer Accountability — NEVER fix at the crash site. Trace the full flow, fix at the owning layer.
AI default behavior: see error at Place A → fix Place A. This is WRONG. The crash site is a SYMPTOM, not the cause.
MANDATORY before ANY fix:
- Trace full data flow — Map the complete path from data origin to crash site across ALL layers (storage → backend → API → frontend → UI). Identify where the bad state ENTERS, not where it CRASHES.
- Identify the invariant owner — Which layer's contract guarantees this value is valid? That layer is responsible. Fix at the LOWEST layer that owns the invariant — not the highest layer that consumes it.
- One fix, maximum protection — Ask: "If I fix here, does it protect ALL downstream consumers with ONE change?" If fix requires touching 3+ files with defensive checks, you are at the wrong layer — go lower.
- Verify no bypass paths — Confirm all data flows through the fix point. Check for: direct construction skipping factories, clone/spread without re-validation, raw data not wrapped in domain models, mutations outside the model layer.
BLOCKED until: - [ ] Full data flow traced (origin → crash) - [ ] Invariant owner identified with file:line evidence - [ ] All access sites audited (grep count) - [ ] Fix layer justified (lowest layer that protects most consumers)
Anti-patterns (REJECT these):
- "Fix it where it crashes" — Crash site ≠ cause site. Trace upstream.
- "Add defensive checks at every consumer" — Scattered defense = wrong layer. One authoritative fix > many scattered guards.
- "Both fix is safer" — Pick ONE authoritative layer. Redundant checks across layers send mixed signals about who owns the invariant.
MUST ATTENTION apply critical + sequential thinking — every claim needs appropriate traced evidence (file:line for repo/code claims; source URL or artifact section for research, product, content, and docs claims); confidence >80% to act, <60% DO NOT recommend. Anti-hallucination: never present guess as fact, admit uncertainty freely, cross-reference independently, stay skeptical of own confidence.
MUST ATTENTION apply sequential-thinking — multi-step Thought N/M, REVISION/BRANCH/HYPOTHESIS markers, confidence % closer; see $sequential-thinking skill.
MUST ATTENTION apply AI mistake prevention — verify generated content against evidence, trace downstream references before deleting or renaming, verify all affected outputs, re-read files after context loss, and surface ambiguity before acting.
- MANDATORY Bootstrap task tracking before target work; transition one task at a time.
- MANDATORY Persist plan/review findings to
plans/reports/ incrementally and synthesize from disk.
- MANDATORY After task-tracking bootstrap and before target/source work, read required project-reference docs and cite
Reference docs read: ....
- MANDATORY Always include
lessons.md; project conventions override generic defaults.
- MANDATORY If project config, root instruction files, or any required reference doc is missing or stale, auto-run
$project-init or the narrow lower-level route before ordinary project-specific work.
IMPORTANT MUST ATTENTION debugger trace gate: for non-trivial bug/fix/investigation/review work, start at the observed final output and trace backward through reader -> storage/projection -> writer -> consumer/job -> producer/trigger. Enumerate all feeder paths and hypotheses before fixing. BLOCKED until trace, hypothesis matrix, owning fix layer, and forward convergence proof exist.
- MANDATORY Parent workflow rows do not replace child phase tracking; expand phases and link the parent when nested.
- MANDATORY Orchestrators pre-expand child skill phases before invocation; use
[N.M] $skill-name — phase prefixes and one-in_progress discipline.
Prompt-Enhance Closing Anchors
IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval
IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution
IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence
IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses
Closing Reminders
IMPORTANT MUST ATTENTION Goal: Deliver a $why-review-validated root cause pinned to file:line at the invariant-owning layer — investigation-ONLY, so $fix corrects the cause, not the symptom — or an honest "hypothesis, not confirmed" naming the evidence gaps.
Protocols in force (concise digest of the SYNC/shared blocks this skill carries):
- End-to-Start Debugger Trace: MUST ATTENTION trace backward from final state.
- Root Cause Debugging: reproduce, isolate, trace — NEVER guess-and-check.
- Incremental Persistence: append findings to report per file.
- Sub-Agent Return Contract: return summary only, full report on disk.
- Source/Test Drift Check: changed behavior — reconcile affected tests from evidence.
- AI Mistake Prevention: verify generated content against evidence, trace downstream references, verify all affected outputs, re-read after context loss, surface ambiguity.
- Nested Task Creation: expand child phases, link parent when nested.
- Project Reference Docs: ALWAYS read required project docs, cite them.
- Task Tracking & External Report: bootstrap tasks, persist findings incrementally.
- Critical Thinking: traced proof per claim, confidence >80%.
- Sequential Thinking: multi-step Thought N/M with confidence closer.
- Understand Code First: read code, grep 3+ patterns before concluding.
- Evidence: cite
file:line, declare confidence — NEVER speculate.
- Cross-Service Check: scan producers/consumers/sagas/contracts for silent regressions.
- Estimation Framework: bottom-up hours, derived SP, min-max range.
- Red Flag Stop Conditions: escalate on confidence/blast/boundary/security flags.
- Fix-Layer Accountability: fix lowest invariant-owning layer — NEVER crash site.
IMPORTANT MUST ATTENTION investigation-ONLY — NEVER patch here; hand confirmed cause to $fix — why: a fix from un-validated findings patches the symptom and masks the real defect.
IMPORTANT MUST ATTENTION Phase 0 FIRST (BLOCKING) — classify bug type, route to the specialized agent (debugger / performance-optimizer / security-auditor) before any investigation — why: classification decides which evidence matters and which agent has the right checklist.
IMPORTANT MUST ATTENTION NEVER fix at the crash site — trace full data flow origin → crash, fix at the lowest invariant-owning layer that protects ALL downstream consumers — why: the crash site is a symptom; scattered guards at consumers signal nobody owns the invariant.
MUST ATTENTION trace END-to-START — name Frame 0 (observed final state), walk reader → storage/projection → writer → consumer/job → producer, enumerate ALL feeder paths, build the hypothesis matrix BEFORE proposing any fix — why: the bug enters where bad state is WRITTEN, not where it crashes.
MUST ATTENTION every root-cause claim carries Confidence: X% + file:line proof; <60% → report "hypothesis, not confirmed" with named evidence gaps, NEVER a guess — why: self-confirmed findings rationalize their own gaps.
MUST ATTENTION NEVER declare a confirmed root cause without passing the $why-review gate (SAME session, SAME main agent, NO sub-agent); 2 rounds without passing → STOP, escalate via a direct user question.
MUST ATTENTION search 3+ existing patterns and READ the actual code before concluding — cite file:line; inference alone is insufficient — why: trial-and-error and assumed APIs hallucinate causes.
MUST ATTENTION run a graph trace when graph.db exists — callers_of / importers_of / tests_for / trace reveal MESSAGE_BUS consumers and event handlers grep cannot see — why: cross-service chains are invisible to text search.
MUST ATTENTION prove convergence FORWARD after choosing the fix layer — walk start → end, map each root cause to a fix part and each fix part to a test/proof; $prove-fix MUST run after $fix applies changes.
MUST ATTENTION OOM/memory → check row COUNT before row SIZE (unbounded query > large row); 3+ failed fixes → STOP, question the architecture, escalate to user.
MUST ATTENTION bootstrap task tracking task tracking BEFORE first file read; persist findings incrementally to plans/reports/; standalone (outside workflow) → add a $review-changes task as the LAST task — why: context cutoff loses in-memory findings.
Anti-Rationalization:
| Evasion | Rebuttal |
|---|
| "I see the problem, let me fix it" | Symptom ≠ root cause. This skill is investigation-ONLY — trace end-to-start first. |
| "Too simple for Phase 0" | Root-cause assumptions waste more time than classification. Apply Phase 0 anyway. |
| "Already traced, no graph needed" | Show file:line evidence. No proof = no trace. Run graph trace if graph.db exists. |
"Skip $why-review, findings look solid" | Self-confirmed findings rationalize their own gaps. The $why-review gate is non-negotiable. |
| "This is a frontend bug, no graph" | Frontend → backend → bus chains exist. Run trace first. |
| "It's OOM, must be a large object" | Check row COUNT before row SIZE. Unbounded query > large single row. |
| "Just try changing X and see" | One hypothesis at a time. Scientific method, not trial and error. |
IMPORTANT MUST ATTENTION investigation-ONLY: trace end-to-start to the invariant-owning layer, NEVER patch here.
IMPORTANT MUST ATTENTION every root-cause claim needs Confidence: X% + file:line proof; <60% = "hypothesis, not confirmed", NEVER a guess.
IMPORTANT MUST ATTENTION NEVER declare confirmed without the $why-review gate; task tracking before starting.
[TASK-PLANNING] Before acting, analyze task scope and systematically break it into small todo tasks and sub-tasks using task tracking.
Hookless Prompt Protocol Mirror (Auto-Synced)
Source: .claude/.ck.json + .claude/skills/shared/sync-inline-versions.md (:full blocks) + .claude/scripts/lib/hookless-prompt-protocol.cjs
[WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason.
Generic portability boundary: Reusable skills and protocol text stay project-neutral; project-specific conventions are discovered from docs/project-config.json and docs/project-reference/. Apply shared AI-SDD from shared/sdd-artifact-contract.md. Read docs/project-config.json and docs/project-reference/docs-index-reference.md, then open the project reference docs named there. For spec, test-case, behavior-change, public-contract, or docs/specs/ work, route through the local spec docs named by the docs index: feature-spec-reference.md, spec-system-reference.md, spec-principles.md, and workflow-spec-test-code-cycle-reference.md when specs/tests/code must stay synchronized. If either file or a required reference doc is missing or stale, auto-run $project-init (or the narrow lower-level route such as $project-config, $docs-init, $scan-all, or $scan --target=<key>) before ordinary project-specific work. Any supported AI tool may execute when this shared context and local docs are available.
- DETECT: If the prompt starts with an explicit slash skill/workflow command, execute it directly. Otherwise match the prompt against the workflow catalog and skill list.
- ANALYZE: Choose the best option: execute directly, invoke a skill, activate a standard workflow, or compose a custom step combination.
- AUTO-SELECT: Pick the best option yourself. Do not ask the user to choose between direct execution, skill, standard workflow, or custom workflow.
- ACTIVATE: For a selected workflow, call
$start-workflow <workflowId>; for a selected skill, invoke that skill; for a custom workflow, sequence custom steps directly; for direct execution, proceed with the task.
- CREATE TASKS: task tracking for ALL workflow/skill/custom steps before execution when the selected path has multiple steps.
- EXECUTE: Advance per the Workflow Step Advancement & Parallel Phases rule in your context instructions — model-driven; a sub-agent completion advances a step identically to an inline call; a parallel-phase group is an all-return barrier (advance only after ALL members return, never serialize it)
Shared AI-SDD Protocol Markers
Source: .claude/skills/shared/sync-inline-versions.md
SYNC:ai-sdd-artifact-contract
AI-SDD Artifact Contract — Shared spec-driven development rules stay portable and source-owned.
- Keep reusable AI-SDD principles in
.claude; put repository-specific paths, commands, owners, products, and formats in project config/reference docs.
- Preserve cycle:
spec -> plan -> tasks -> implement -> verify -> update spec/docs.
- Trace every requirement or invariant through decision, task, TC/test, source evidence, and docs/spec update.
- Treat code-to-spec extraction as reference-only until accepted by the canonical spec owner.
- Any supported AI tool may plan, implement, review, or verify with synced context; using multiple tools is optional.
- Update
.claude source first, then sync generated mirrors; do not manually edit .agents, .codex, or AGENTS.md. — why: mirrors are generated artifacts; hand-edits are overwritten on the next sync
- If
docs/project-config.json, root instruction files, or a required project-reference doc is missing or stale, auto-run $project-init or the narrow lower-level route before ordinary project-specific work.
Active reference: shared/sdd-artifact-contract.md in the active skills root.
SYNC:ai-sdd-artifact-contract:reminder
- MANDATORY Apply
shared/sdd-artifact-contract.md; keep reusable AI-SDD in .claude and local rules in project docs.
- MANDATORY Code-to-spec extraction is reference-only until canonical acceptance; any supported AI tool may execute with synced context.
- MANDATORY Update
.claude source before syncing generated mirrors; do not manually edit .agents, .codex, or AGENTS.md.
- MANDATORY Missing or stale project config, root instruction files, or required reference docs route project-specific work through
$project-init or the narrow setup route automatically.
[TASK-PLANNING] [MANDATORY] BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.
[LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip.
Break work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned".
Extract lessons — ROOT CAUSE ONLY, not symptom fixes:
- Name the FAILURE MODE (reasoning/assumption failure), not symptom — "assumed API existed without reading source" not "used wrong enum value".
- Generality test: does this failure mode apply to ≥3 contexts/codebases? If not, abstract one level up.
- Write as a universal rule — strip project-specific names/paths/classes. Useful on any codebase.
- Consolidate: multiple mistakes sharing one failure mode → ONE lesson.
- Recurrence gate: "Would this recur in future session WITHOUT this reminder?" — No → skip
$learn.
- Auto-fix gate: "Could
$code-review/$code-simplifier/$security-review/$lint catch this?" — Yes → improve review skill instead.
- BOTH gates pass → ask user to run
$learn.
[CRITICAL-THINKING-MINDSET] Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act.
Anti-hallucination principle: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.
AI Attention principle (Primacy-Recency): Put the 3 most critical rules at both top and bottom of long prompts/protocols so instruction adherence survives long context windows.
Goal-driven execution: Define success criteria first, loop until verified, and stop only when observable checks pass.
Tests verify intent: Tests must protect business rules/invariants and fail when the protected intent breaks, not only mirror current behavior.
Common AI Mistake Prevention (System Lessons)
- Re-read files after context compaction. Edit requires prior Read in same context; compaction wipes read state. Re-read before editing.
- Grep for old terms after bulk replacements. AI over-trusts find/replace completeness. Grep full repo after bulk edits for missed refs in docs/configs/catalogs.
- Check downstream references before deleting. Deletions cascade doc/code staleness. Map referencing files before removal.
- After memory loss, check existing state before creating new. Compaction wipes prior-work memory. Query current state to resume — never blindly duplicate.
- Verify AI-generated content against actual code. AI hallucinates APIs, class names, method signatures. Grep to confirm existence before documenting/referencing.
- Trace full dependency chain after edits. Changing a definition misses downstream consumers. Trace the full chain.
- When renaming, grep ALL consumer file types. Some file types silently ignore missing refs (no compile error). Search code, templates, configs, generated files.
- Trace ALL code paths when verifying correctness. Code existing ≠ code executing. Trace early exits, error branches, conditional skips — not just happy path.
- Update docs that embed canonical data when source changes. Docs inlining derived data (workflows, schemas, configs) go stale silently. Update all embedding docs alongside source.
- Verify sub-agent results after context recovery. Background agents may finish while parent compacted — grep-verify output, don't trust assumed completion.
- Cross-check full target list against sub-agent assignments. Parallel sub-agents by category miss boundary items. Reconcile union of assignments against target list before proceeding.
- Sub-agents inherit knowledge only from their agent .md definition — use custom agent types, not built-in Explore. Tool adoption = permission + knowledge + enforcement (numbered workflow step).
- Persist sub-agent findings incrementally, not as a final batch. Long sub-agents hit cutoffs before final write — findings lost. Instruct append-per-section to report file.
- When debugging, ask "whose responsibility?" before fixing. Trace caller (wrong data) vs callee (wrong handling). Fix at responsible layer — never patch symptom site.
- Grep ALL removed names after extraction/refactoring. Primary file "done" ≠ secondary files clean. Grep entire scope for every removed symbol before declaring complete.
- Assume existing values are intentional — ask WHY before changing. Pattern-matching as "wrong" skips context. Before changing any constant/limit/flag: read comments, git blame, surrounding code.
- Verify ALL affected outputs, not just the first. One build green ≠ all green. Multi-stack changes (backend/frontend/tests/docs) require verifying EVERY output.
- Evaluate fit before copying a nearby pattern. Closest example ≠ matching preconditions — verify the new context shares the same constraints, base classes, scope, lifetime.
- Holistic-first debugging — resist nearest-attention trap. Don't dive into first plausible cause. List EVERY precondition (config, env vars, paths, DB, endpoints, creds, versions, DI, data). Verify each against evidence (grep/query — not reasoning). Ask "what would falsify this?" — if nothing, it's not a hypothesis. Most expensive failure: going deeper in "obvious" layer while bug sits in layer never questioned.
- Surgical changes — apply the diff test (context-aware). Two modes: (1) Bug fix → every line traces to the bug; no restyling; orphan cleanup only for imports YOUR changes made unused. (2) Review/enhancement → implement improvements AND announce as "Enhancement beyond main request: [what]". Never silently scope-creep. Diff test: "Would this line exist if I wasn't asked to do X?" — if no, delete or announce.
- Surface ambiguity before coding — don't pick silently. Multiple valid interpretations → present each with effort: "[Request] could mean (1) [N h], (2) [N h]. Which matters?" List scope/format/volume/constraints assumptions first. If simpler path exists, say so. Never silently pick.
- [MANDATORY FIRST ACTION] ALWAYS activate a suitable skill or workflow BEFORE responding. Match task against workflow catalog + skill list; invoke via skill invocation or
$start-workflow <workflowId>. NEVER answer or write code before checking. Skip = protocol violation.
- Why-Review adversarial mindset — apply when reviewing any plan, decision, or design. Default SKEPTIC not VALIDATOR: steel-man a rejected alternative, invert each stated reason ("what does it sacrifice?"), stress-test top 2-3 assumptions, run pre-mortem ("ships, fails in 3 months — what breaks?"), surface 1-2 alternatives author missed. Section presence ≠ quality; quality = causal reasoning + concrete mitigations + evidence, not "it's better" or "monitor closely".
- Front-load report-write in sub-agent prompts for large reviews. Many-file sub-agents hit budget before final write — findings lost. Design prompts so: (1) report-write is first explicit deliverable, (2) append per-file/section (not batched), (3) scope bounded so reads don't exhaust budget. Truncated mid-sentence with no report file → spawn narrower scope, don't retry same prompt.
- After context compaction, re-verify all prior phase outcomes before continuing. Summaries describe intent, not environment state (git index, filesystem, processes). On resume, FIRST audit: git status, re-read modified files, verify filesystem. Every "completed" claim is an untested hypothesis until evidence confirms.
- OOM/memory: check row count before row size. Triage: (1) Unbounded query — no DB filter for trigger? Push filter to DB; eliminates OOM. (2) Large rows? Projection reduces proportionally. Row reduction > projection in ROI.
- Keep domain concepts out of generic/shared/infrastructure layers. Reusable layer (shared library, framework, infra module) must reference NO consumer-specific domain concept — tenant/customer/product IDs, business entities, feature rules. Leak compiles + runs → passes review silently while coupling the "reusable" layer to one consumer. Keep shared type domain-free; push domain fields/logic down into the consumer via subclass/composition. — why: a layer coupled to one consumer's domain is no longer reusable.