| name | investigate |
| description | Systematic debugging with root cause investigation. Four phases: investigate,
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
Use when asked to "debug this", "fix this bug", "why is this broken",
"investigate this error", or "root cause analysis".
Proactively invoke this skill (do NOT debug directly) when the user reports
errors, 500 errors, stack traces, unexpected behavior, "it was working
yesterday", or is troubleshooting why something stopped working. (gstack)
|
Systematic Debugging
Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address root cause makes the next bug harder to find. Find the root cause, then fix it.
BitFun Team Mode Dispatch
When this skill is invoked by BitFun Team Mode, this skill supplies the debugging methodology. Use existing Task sub-agents to gather independent evidence, then keep hypothesis selection and fixes in the main Team session.
- Do not assume a Debugger sub-agent exists. Choose only from the Task tool's available agents.
- Prefer matching custom debugging/domain sub-agents if available; otherwise use
Explore for code-path tracing and FileFinder for locating logs, configs, tests, and affected files.
- Split independent evidence tracks into parallel Task calls when useful: reproduction path, recent-change audit, config/environment audit, and suspected subsystem trace.
- Keep Task work read-only until root cause is proven. Ask for facts, file paths, commands tried, observations, and confidence.
- The main Team orchestrator owns the root-cause statement, fix plan, implementation, and regression test.
Phase 1: Root Cause Investigation
Gather context before forming any hypothesis.
-
Collect symptoms: Read the error messages, stack traces, and reproduction steps. If the user hasn't provided enough context, ask ONE question at a time via AskUserQuestion.
-
Read the code: Trace the code path from the symptom back to potential causes. Use Grep to find all references, Read to understand the logic.
-
Check recent changes:
git log --oneline -20 -- <affected-files>
Was this working before? What changed? A regression means the root cause is in the diff.
-
Reproduce: Can you trigger the bug deterministically? If not, gather more evidence before proceeding.
Prior Learnings
Use only BitFun in-session memory, project docs, .bitfun/team/ artifacts, git history, TODO files, and prior design/review artifacts. Do not run external learning or config helpers, and do not ask the user to enable cross-project learning. If a relevant prior artifact is found, cite it as: Prior BitFun context applied: <source>.
Output: "Root cause hypothesis: ..." ā a specific, testable claim about what is wrong and why.
Scope Lock
After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep.
[ -x "BitFun built-in freeze check" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
If FREEZE_AVAILABLE: Identify the narrowest directory containing the affected files. Write it to the freeze state file:
STATE_DIR="${BITFUN_TEAM_HOME:-$HOME/.bitfun/team}"
mkdir -p "$STATE_DIR"
echo "<detected-directory>/" > "$STATE_DIR/freeze-dir.txt"
echo "Debug scope locked to: <detected-directory>/"
Substitute <detected-directory> with the actual directory path (e.g., src/auth/). Tell the user: "Edits restricted to <dir>/ for this debug session. This prevents changes to unrelated code. Run /unfreeze to remove the restriction."
If the bug spans the entire repo or the scope is genuinely unclear, skip the lock and note why.
If FREEZE_UNAVAILABLE: Skip scope lock. Edits are unrestricted.
Phase 2: Pattern Analysis
Check if this bug matches a known pattern:
| Pattern | Signature | Where to look |
|---|
| Race condition | Intermittent, timing-dependent | Concurrent access to shared state |
| Nil/null propagation | NoMethodError, TypeError | Missing guards on optional values |
| State corruption | Inconsistent data, partial updates | Transactions, callbacks, hooks |
| Integration failure | Timeout, unexpected response | External API calls, service boundaries |
| Configuration drift | Works locally, fails in staging/prod | Env vars, feature flags, DB state |
| Stale cache | Shows old data, fixes on cache clear | Redis, CDN, browser cache, Turbo |
Also check:
TODOS.md for related known issues
git log for prior fixes in the same area ā recurring bugs in the same files are an architectural smell, not a coincidence
External pattern search: If the bug doesn't match a known pattern above, WebSearch for:
- "{framework} {generic error type}" ā sanitize first: strip hostnames, IPs, file paths, SQL, customer data. Search the error category, not the raw message.
- "{library} {component} known issues"
If WebSearch is unavailable, skip this search and proceed with hypothesis testing. If a documented solution or known dependency bug surfaces, present it as a candidate hypothesis in Phase 3.
Phase 3: Hypothesis Testing
Before writing ANY fix, verify your hypothesis.
-
Confirm the hypothesis: Add a temporary log statement, assertion, or debug output at the suspected root cause. Run the reproduction. Does the evidence match?
-
If the hypothesis is wrong: Before forming the next hypothesis, consider searching for the error. Sanitize first ā strip hostnames, IPs, file paths, SQL fragments, customer identifiers, and any internal/proprietary data from the error message. Search only the generic error type and framework context: "{component} {sanitized error type} {framework version}". If the error message is too specific to sanitize safely, skip the search. If WebSearch is unavailable, skip and proceed. Then return to Phase 1. Gather more evidence. Do not guess.
-
3-strike rule: If 3 hypotheses fail, STOP. Use AskUserQuestion:
3 hypotheses tested, none match. This may be an architectural issue
rather than a simple bug.
A) Continue investigating ā I have a new hypothesis: [describe]
B) Escalate for human review ā this needs someone who knows the system
C) Add logging and wait ā instrument the area and catch it next time
Red flags ā if you see any of these, slow down:
- "Quick fix for now" ā there is no "for now." Fix it right or escalate.
- Proposing a fix before tracing data flow ā you're guessing.
- Each fix reveals a new problem elsewhere ā wrong layer, not wrong code.
Phase 4: Implementation
Once root cause is confirmed:
-
Fix the root cause, not the symptom. The smallest change that eliminates the actual problem.
-
Minimal diff: Fewest files touched, fewest lines changed. Resist the urge to refactor adjacent code.
-
Write a regression test that:
- Fails without the fix (proves the test is meaningful)
- Passes with the fix (proves the fix works)
-
Run the full test suite. Paste the output. No regressions allowed.
-
If the fix touches >5 files: Use AskUserQuestion to flag the blast radius:
This fix touches N files. That's a large blast radius for a bug fix.
A) Proceed ā the root cause genuinely spans these files
B) Split ā fix the critical path now, defer the rest
C) Rethink ā maybe there's a more targeted approach
Phase 5: Verification & Report
Fresh verification: Reproduce the original bug scenario and confirm it's fixed. This is not optional.
Run the test suite and paste the output.
Output a structured debug report:
DEBUG REPORT
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Symptom: [what the user observed]
Root cause: [what was actually wrong]
Fix: [what was changed, with file:line references]
Evidence: [test output, reproduction attempt showing fix works]
Regression test: [file:line of the new test]
Related: [TODOS.md items, prior bugs in same area, architectural notes]
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Capture Learnings
If you discovered a non-obvious pattern, pitfall, or architectural insight during
this session, log it for future sessions:
true
Types: pattern (reusable approach), pitfall (what NOT to do), preference
(user stated), architecture (structural decision), tool (library/framework insight),
operational (project environment/CLI/workflow knowledge).
Sources: observed (you found this in the code), user-stated (user told you),
inferred (AI deduction), cross-model (both BitFun and outside-voice sub-agent agree).
Confidence: 1-10. Be honest. An observed pattern you verified in the code is 8-9.
An inference you're not sure about is 4-5. A user preference they explicitly stated is 10.
files: Include the specific file paths this learning references. This enables
staleness detection: if those files are later deleted, the learning can be flagged.
Only log genuine discoveries. Don't log obvious things. Don't log things the user
already knows. A good test: would this insight save time in a future session? If yes, log it.
Important Rules
- 3+ failed fix attempts ā STOP and question the architecture. Wrong architecture, not failed hypothesis.
- Never apply a fix you cannot verify. If you can't reproduce and confirm, don't ship it.
- Never say "this should fix it." Verify and prove it. Run the tests.
- If fix touches >5 files ā AskUserQuestion about blast radius before proceeding.
- Completion status:
- DONE ā root cause found, fix applied, regression test written, all tests pass
- DONE_WITH_CONCERNS ā fixed but cannot fully verify (e.g., intermittent bug, requires staging)
- BLOCKED ā root cause unclear after investigation, escalated