| name | debug |
| description | Systematic debugging methodology based on Zeller's scientific method. Reproduce, hypothesize, predict, test, isolate via binary search, fix root cause. Every fix gets a regression test. Triggers: bug, error, fix, broken, not working, fails, crashed, unexpected, stack trace, regression, exception. |
| allowed-tools | Read, Bash, Grep, Glob |
Debugging is forming and testing a THEORY that explains the bug.
Not random changes. Not guessing. Scientific method applied to code.
DEFECT (in code) โ INFECTION (in state) โ FAILURE (visible symptom).
The failure you see is NOT where the bug is. Binary search upstream.
AgentDB read-start has run. Check past failuresโyou may have seen this pattern.
Skill-specific: skills/debug/reference/debug-research.md
<scientific_method>
- OBSERVE: What exactly failed? Document input, expected, actual.
- HYPOTHESIZE: Why might this happen? List 3 possible causes before pursuing any.
- PREDICT: If this hypothesis is correct, what else should I see?
- TEST: Does the prediction hold? Seek DISCONFIRMING evidence first.
- REFINE or REJECT: Based on evidence, narrow or abandon hypothesis.
- REPEAT: Until the defectโnot the symptomโis isolated.
Write each hypothesis and test result to AgentDB. Prevents circular investigation.
</scientific_method>
Before touching code, get SPECIFIC:
- Input: exact values, sequence, timing that triggers the bug.
- Expected: what should happen.
- Actual: what happens instead (full error, stack trace).
- Environment: versions, OS, relevant state.
- Frequency: always? sometimes? specific conditions?
"It sometimes fails" is NOT a reproduction. Get deterministic.
If it can't be reproduced, add targeted logging and wait.
Binary search is O(log n). Random guessing is O(n). Use binary search.
CODE: Call chain AโBโCโDโE fails. Check C. Works? Bug in D-E. Fails? Bug in A-C. Recurse.
TIME: git bisect between known-good and known-bad commit. ~10 tests for 1000 commits.
INPUT: Large failing input? Split in half. Which half fails? Recurse to minimal case.
Instrument at boundaries: log inputs/outputs at each step.
Mock external dependencies to isolate which one causes failure.
The error line is the FAILURE. The DEFECT is upstream.
Ask: what assumption was violated? What invariant broke?
<common_causes rank="by_frequency">
- Wrong input shape or type
- Off-by-one (loop bounds, indices, slicing)
- Missing null/undefined check
- Race condition or async timing
- Mutating shared state (aliasing)
- Wrong comparison operator (=, ==, ===, >, >=)
- Variable scope (closure captures, shadowing)
- Swallowed error (empty catch block)
- API contract mismatch
- Environment difference
</common_causes>
If you can't explain WHY it broke, you haven't found root cause.
1. Fix ROOT CAUSE, not symptom. (Null check at crash site = symptom fix.)
2. Write test that reproduces original bug. Must fail before fix, pass after.
3. Run regression: original case + edge cases + happy path.
4. Commit fix + test together.
Every bug fix MUST include a regression test. No exceptions.
<cognitive_biases>
You look for evidence your hypothesis is right. Actively seek DISCONFIRMING evidence instead.
First error message anchors you. Read ALL output first. List 3 causes before pursuing any.
"Last time it was the database." This time might be different. Past patterns are hypotheses, not conclusions.
15 min with no evidence for hypothesis โ abandon it. Write what you tested, move on.
"It probably works now." Re-run the EXACT original failing case. "Seems to work" is not evidence.
</cognitive_biases>
<anti_patterns>
Random changes until it works. Even if it works, you don't know why. Fragile.
Change something, don't test original case, declare victory. Bug returns.
Null check at crash instead of asking why null. Real defect is upstream.
Logging everywhere. Use binary search first, then targeted logging.
It's almost never the library. Check your usage first.
</anti_patterns>
<when_stuck>
- Explain problem in writing (rubber duckโengages System 2 cognition).
- Read error message carefully. Answer is there 80% of the time.
- Simplify to minimal reproduction case.
- "What changed?" Check git log, git diff, dependency updates, env changes.
- Search exact error message in quotes.
- Step away. Bias accumulates under sustained focus.
- Plan Mode: For complex or sensitive bugs, use Plan Mode to analyze errors and form hypotheses without making any code changes until the approach is approved. Paste stack traces and error messages in Plan Mode first.
- Rewind to pre-bug state:
Esc+Esc or /rewind opens checkpoint history. Restore code state to before the suspected change without losing conversation context. Faster than manual git reset when the bug was introduced in the current session.
- Visibility-first: Don't describe the problem โ show the raw evidence. Paste terminal output, actual error logs, or screenshots directly. Let Claude read the data, not your interpretation of it. A description of a bug introduces your assumptions; the raw output doesn't.
</when_stuck>
- 30+ min on single hypothesis with no evidence โ abandon it.
- 3+ hypotheses rejected โ step back, re-examine assumptions.
- 2 failed fix attempts โ orchestrator should invoke tearitapart. May be design problem, not implementation.
- 2+ failed corrections in same session โ `/clear` and rewrite the initial prompt incorporating lessons learned. A clean session with a better prompt outperforms a long session polluted with failed approaches.
- Bug only in production โ add targeted monitoring, document, move on.
<parallel_debug_strategy>
Competing hypothesis agents: For bugs with multiple plausible causes, spawn separate agents
per hypothesis. Each investigates independently with a fresh, unpolluted context window.
Agents with fresh context catch what a single long session anchors past.
Agent A: Hypothesis โ race condition in cache layer
Agent B: Hypothesis โ API contract mismatch on response shape
Agent C: Hypothesis โ off-by-one in pagination cursor
Each reports: evidence_for | evidence_against | confidence (0-1)
Coroner agent synthesizes findings.
Fresh context advantage: Long debugging sessions accumulate cognitive anchoring.
A fresh agent given only the minimal reproduction case (not 200 lines of chat history)
reasons more clearly. When stuck >30 min, spawn a fresh agent with only:
- The exact input that triggers the bug
- The expected vs actual output
- The relevant code section (not the whole file)
AgentDB as debug log: Write each hypothesis and its test result to AgentDB before
abandoning it. Prevents the same hypothesis being re-investigated in the next session.
</parallel_debug_strategy>
<agentic_debugging>
When debugging in an agentic context, additional failure modes arise:
Agent-introduced bugs: Check whether the bug was introduced by an AI edit. Run
git log --oneline -20 to identify when it appeared. git bisect between last known-good
and first known-bad agent commit. AI edits tend to: drop early returns, misplace null
checks, and silently change API call signatures.
Tool-call failures vs logic failures: Distinguish between:
- Tool call returned wrong result (environment/permissions issue)
- Tool call succeeded but logic was wrong (code issue)
These have completely different fixes. Check tool outputs before assuming logic error.
Interrupted state: If a previous agent session was interrupted, check for partial
writes or uncommitted changes (git status, git diff) before assuming the code is
the canonical version.
Reproduce with minimal agent context: If bug is hard to reproduce, reduce the
problem to its smallest form BEFORE spawning any agents. Agents given a vague reproduction
reproduce the wrong thing.
</agentic_debugging>
<persistent_truth_file>
Long debugging sessions create circles: Claude tries fix, fails, context compresses, Claude forgets what was tried.
Prevent this with a persistent investigation file that survives auto-compaction.
Create _meta/context/DEBUG.md at session start. Update it throughout. Claude reads it before each attempt.
Template:
# Debugging Session: [Issue Title]
## Problem Statement
[Concise problem + exact error message]
## What We Know
- [confirmed fact 1]
- [confirmed fact 2]
## Approaches Tried
- [ ] Approach A: [description] โ Failed because [reason]
- [ ] Approach B: [description] โ Failed because [reason]
- [x] Approach C: [description] โ Partially works, need to [next step]
## Current Hypothesis
[Best current theory of root cause]
## Next Steps
1. [specific action]
Instruct Claude: "Read _meta/context/DEBUG.md before each attempt. Update it after each attempt."
Prevents re-trying failed approaches across context compression boundaries.
</persistent_truth_file>
<tooling_2026>
In 2026, Claude Code's default tooling handles more of the debugging burden automatically:
- Compaction: Auto-manages context window limits. You no longer need to manually
/compact during debug sessions โ the system does it. But instruct compaction to preserve _meta/context/DEBUG.md so investigation state survives.
- Plan Mode: Use for complex or multi-file bugs before touching code. Scope the exploration, form hypotheses, get approval โ then switch to Act mode for fixes.
- Agent tool for parallel exploration: Spawn competing-hypothesis agents without polluting the main context. Each agent gets only the minimal reproduction, not 200 lines of chat history.
Root cause over fast fix: Claude solves the immediate problem by finding the fastest path to making the error go away. The fastest fix is frequently wrong โ it patches the symptom and breaks something downstream. When a fix "works" but you can't explain why the bug occurred, you haven't fixed it.
</tooling_2026>
<on_complete>
agentdb write-end '{"skill":"debug","bug":"","root_cause":"<what_broke>","fix":"<what_fixed>","test":"<regression_test_name>","learned":"<pattern_for_future>"}'
Always record debugging learnings. They prevent repeat investigation.
</on_complete>