with one click
diagnose
// Use when debugging hard bugs or perf regressions. Disciplined loop — reproduce, minimise, hypothesise, instrument, fix, regression-test. Triggers: debug this, broken, throwing, failing.
// Use when debugging hard bugs or perf regressions. Disciplined loop — reproduce, minimise, hypothesise, instrument, fix, regression-test. Triggers: debug this, broken, throwing, failing.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | diagnose |
| description | Use when debugging hard bugs or perf regressions. Disciplined loop — reproduce, minimise, hypothesise, instrument, fix, regression-test. Triggers: debug this, broken, throwing, failing. |
| metadata | {"upstream":"mattpocock/skills/engineering/diagnose","upstream-sha":"49cec7be019bc408e87b77670f3d442f536da254","adapted-date":"2026-04-28"} |
Discipline for hard bugs. Skip phases only when explicitly justified.
Domain context runs automatically when the skill loads — output replaces each line below:
cat CONTEXT.md 2>/dev/null || truels docs/adr/ 2>/dev/null || trueIf CONTEXT.md content is present above, use that vocabulary. If ADR list showed files, read the relevant ones before exploring the codebase. If both were empty, proceed silently.
This is the skill. Everything else is mechanical. Fast, deterministic, agent-runnable pass/fail signal for the bug → you will find the cause. Without one, no amount of staring at code will save you.
Spend disproportionate effort here. Be aggressive. Be creative. Refuse to give up.
git bisect run it.${CLAUDE_SKILL_DIR}/scripts/hitl-loop.template.sh so loop is still structured. Captured output feeds back to you.Build the right feedback loop → bug is 90% fixed.
Treat loop as product. Once you have a loop, ask:
30-second flaky loop ≈ no loop. 2-second deterministic loop = debugging superpower.
Goal is not clean repro but higher reproduction rate. Loop trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. 50%-flake bug is debuggable; 1% is not — keep raising rate until debuggable.
Stop and say so explicitly. List what you tried. Ask user for: (a) access to whatever environment reproduces it, (b) captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do not proceed to hypothesise without a loop.
Do not proceed to Phase 2 until you have a loop you believe in.
Run the loop. Watch bug appear.
Confirm:
Do not proceed until you reproduce the bug.
Generate 3–5 ranked hypotheses before testing any. Single-hypothesis generation anchors on first plausible idea.
Each hypothesis must be falsifiable: state the prediction it makes.
Format: "If is the cause, then will make bug disappear / will make it worse."
Can't state prediction → hypothesis is a vibe — discard or sharpen.
Show ranked list to user before testing. They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if user is AFK.
Each probe must map to specific prediction from Phase 3. Change one variable at a time.
Tool preference:
Tag every debug log with unique prefix, e.g. [DEBUG-a4f2]. Cleanup at end becomes single grep. Untagged logs survive; tagged logs die.
Perf branch. For performance regressions, logs usually wrong. Instead: establish baseline measurement (timing harness, performance.now(), profiler, query plan), then bisect. Measure first, fix second.
Write regression test before the fix — but only if there is a correct seam for it.
Correct seam = test exercises real bug pattern as it occurs at call site. If only available seam is too shallow (single-caller test when bug needs multiple callers, unit test that can't replicate trigger chain), regression test there gives false confidence.
If no correct seam exists, that itself is the finding. Note it. Codebase architecture prevents bug from being locked down. Flag for next phase.
If correct seam exists:
Required before declaring done:
[DEBUG-...] instrumentation removed (grep the prefix)Then ask: what would have prevented this bug? If answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the codebase-improve skill with specifics. Make recommendation after fix is in, not before — you have more information now than when you started.