| name | debug |
| description | Systematic debugging methodology based on Zeller's scientific method. Reproduce, hypothesize, predict, test, isolate via binary search, fix root cause. Every fix gets a regression test. Triggers: bug, error, fix, broken, not working, fails, crashed, unexpected, stack trace, regression, exception. |
| allowed-tools | Read, Bash, Grep, Glob |
Debugging is forming and testing a THEORY that explains the bug.
Not random changes. Not guessing. Scientific method applied to code.
DEFECT (in code) → INFECTION (in state) → FAILURE (visible symptom).
The failure you see is NOT where the bug is. Binary search upstream.
Systematic methodology achieves ~95% first-time fix rate vs ~40% ad-hoc. The process is the multiplier.
AgentDB read-start has run. Check past failures—you may have seen this pattern.
Debuggability setup (run once per project, not per session): ensure the test suite runs headlessly (`npm test` / `pytest -x` / `cargo test`), add structured logging at service/module boundaries, and document known failure modes in `_meta/context/DEBUG.md`. These three give Claude the foundation to trace and verify without blind file exploration. Without them, debugging sessions spend the first third just establishing a reproduction path.
Skill-specific: skills/debug/reference/debug-research.md
-
REPRODUCE — get specific before touching code.
- Document: exact input, expected output, actual output (full stack trace), environment, frequency.
- "Sometimes fails" is not a reproduction. Get deterministic.
- (gate: can reproduce consistently, OR have added targeted logging to wait for next occurrence)
-
HYPOTHESIZE — list 3 causes before pursuing any.
- Read ALL error output first (anchoring bias mitigation).
- Write each hypothesis to AgentDB. Prevents circular re-investigation.
- (gate: 3 candidate hypotheses written; none pursued yet)
-
ISOLATE — binary search, O(log n) not O(n).
- Code: call chain A→B→C→D→E fails → check midpoint C → recurse into failing half.
- Time:
git bisect between known-good and known-bad commit. ~10 tests for 1000 commits.
- Input: large failing input → split in half → recurse to minimal reproduction case.
- Instrument at boundaries: log inputs/outputs at each layer boundary.
- Mock external dependencies to isolate which one causes failure.
- (gate: failure localized to a specific function/commit/input subset)
-
ROOT CAUSE — the error line is the FAILURE. The DEFECT is upstream.
- Ask: what assumption was violated? What invariant broke?
- If you can't explain WHY it broke, you haven't found root cause.
- Top causes by frequency: wrong input shape/type · off-by-one · missing null check · race condition · shared-state mutation · wrong comparison operator · variable scope · swallowed error · API contract mismatch · environment difference.
- (gate: can state root cause in one sentence explaining the violated invariant)
-
FIX — root cause, not symptom.
- Fix the DEFECT, not the FAILURE site. (Null check at crash site = symptom fix.)
- Write regression test that fails before fix, passes after.
- Run: original failing case + edge cases + full regression suite.
- Commit fix + test together.
- (gate: regression test green; original failing case passes)
<cognitive_biases>
Seek DISCONFIRMING evidence first. Ask: "What would I see if my hypothesis were WRONG?"
Read ALL output before investigating. List 3 causes before pursuing any.
Past patterns are hypotheses, not conclusions. Check AgentDB but test them.
15 min with no evidence for hypothesis → abandon it. Write what you tested, move on.
Re-run the EXACT original failing case. "Seems to work" is not evidence.
</cognitive_biases>
<anti_patterns>
Random changes until it works. Even if it works, you don't know why. Fragile.
Change something, don't test original case, declare victory. Bug returns.
Null check at crash instead of asking why null. Real defect is upstream.
Logging everywhere. Use binary search first, then targeted logging.
It's almost never the library. Check your usage first.
Asking Claude to "investigate" without scope. Claude reads hundreds of files, filling context before doing anything useful. Scope investigations narrowly ("look at src/auth only") or use a subagent — the summary returns, the file reads don't.
</anti_patterns>
<when_stuck>
- Explain problem in writing (rubber duck — engages System 2 cognition).
- Read error message carefully. Answer is there 80% of the time.
- Simplify to minimal reproduction case.
- "What changed?" Check git log, git diff, dependency updates, env changes.
- Search exact error message in quotes.
- Step away. Bias accumulates under sustained focus.
- Plan Mode: Paste stack traces in Plan Mode first. Analyze, form hypotheses, get approval — then switch to Act mode for fixes.
- Rewind:
Esc+Esc or /rewind restores code to a pre-bug checkpoint without losing conversation context. Checkpoints survive terminal closes.
- Visibility-first: Paste raw terminal output / error logs / screenshots directly. Raw data beats your interpretation.
- Evidence-first, not assertion: Show the actual evidence (test output, exact command + result, screenshot) rather than stating "it works." Reviewing evidence is faster than re-running verification, and catches cases where "seems to work" masks a different failure path.
- Extended thinking: for complex bugs with no clear hypothesis after all above, request deep analysis using extended thinking. Deliberate multi-step reasoning before output catches subtle root causes that fast responses miss.
- --verbose mode: run
claude --verbose for hard-to-reproduce bugs — shows tool calls, thinking steps, and execution paths in real time. Catches silent failures and misparsed outputs.
- Native debugger: when an IDE or runtime debugger is available (VS Code, Xcode LLDB, Chrome DevTools), prefer it over print-statement flooding — set a breakpoint at the suspected boundary, inspect locals at the failure point, evaluate expressions mid-run without re-running from scratch. The call stack tells you the execution path that led to the state.
- Pipe raw data:
cat error.log | claude or npm run build 2>&1 | claude — pipe terminal output directly instead of copy-pasting. Claude receives full fidelity output without truncation or formatting artifacts.
- Test-driven debugging: when failing behavior is covered by a test suite, run the tests first — failing test names + assertions mark the exact defect entry point. Trace backwards from the failing assertion rather than reading code cold; the test already has the failure isolated to a function boundary.
npm test -- --watch <file> or pytest -x stops on first failure. Lets the test suite do the reproduction work.
- Domain context injection for business logic bugs: before asking Claude to debug a business-rule violation, state the rule in plain English ("the discount should apply before tax, not after"). Claude cannot infer invisible business invariants from code alone — naming the violated rule converts an opaque logic mystery into a targeted search. Works especially well for pricing/calculation bugs, multi-step workflows, and permission logic where the correct behavior isn't encoded anywhere in the code.
- Traditional debugger vs Claude Code: use a traditional debugger (IDE breakpoints, Chrome DevTools, LLDB) when you need to inspect runtime state at a specific execution point — step through, watch locals, evaluate expressions. Use Claude Code when you need to trace a bug across multiple files or apply a coordinated fix to many call sites. Combining both is faster than forcing either tool outside its strength.
</when_stuck>
- 30+ min on single hypothesis with no evidence → abandon it.
- 3+ hypotheses rejected → step back, re-examine assumptions.
- 2 failed fix attempts → orchestrator invokes tearitapart. May be design problem, not implementation.
- 2+ failed corrections in same session → `/clear`, rewrite initial prompt with lessons learned. Clean session beats polluted long session.
- 3 consecutive responses under ~500 tokens with no progress → anchor-drift. `/clear`, strip to minimal reproduction, restart.
- Bug only in production → add targeted monitoring, document, move on.
<parallel_debug_strategy>
Competing hypothesis agents: for bugs with 3+ plausible causes, spawn one agent per hypothesis with fresh context.
Each reports: evidence_for | evidence_against | confidence (0-1). Coroner agent synthesizes.
Fresh context catches what a long session anchors past.
When stuck >30 min, spawn fresh agent with ONLY: exact input, expected vs actual, relevant code section.
Trial segmentation: when debugging multi-agent or long-running systems, break the execution trace into segments and analyze each independently. Shorter segments → cleaner causal attribution. Full traces cause spurious correlations that obscure root cause. One segment = one causal claim.
See: skills/debug/reference/debug-research.md — Parallel Debug Strategy.
</parallel_debug_strategy>
<agentic_debugging>
- Check whether bug was introduced by an AI edit:
git log --oneline -20, then git bisect.
- AI edits tend to: drop early returns, misplace null checks, silently change API call signatures.
- Distinguish tool-call failure (environment/permissions) from logic failure (code) — different fixes.
- Check
git status / git diff for interrupted-state partial writes before assuming canonical version.
- Reduce to minimal reproduction BEFORE spawning agents. Agents given vague reproductions reproduce the wrong thing.
- Verify AI's understanding before accepting a fix: ask "What does this function currently do?" before applying a suggested change. Catches hallucinated APIs, misread variable names, and wrong architecture assumptions — catching these before the edit is faster than reverting after.
</agentic_debugging>
<persistent_truth_file>
Create _meta/context/DEBUG.md at session start. Update throughout. Claude reads it before each attempt.
Prevents circular re-investigation across context compression boundaries.
Template fields: Problem Statement · What We Know · Approaches Tried (with failure reason) · Current Hypothesis · Next Steps.
Full template: skills/debug/reference/debug-research.md — Persistent Investigation File.
</persistent_truth_file>
<on_complete>
agentdb write-end '{"skill":"debug","bug":"","root_cause":"<what_broke>","fix":"<what_fixed>","test":"<regression_test_name>","learned":"<pattern_for_future>"}'
Always record debugging learnings. They prevent repeat investigation.
</on_complete>