with one click
systematic-debugging
// 4-phase root cause debugging: understand bugs before fixing.
// 4-phase root cause debugging: understand bugs before fixing.
| name | systematic-debugging |
| description | 4-phase root cause debugging: understand bugs before fixing. |
| version | 1.1.0 |
| author | Hermes Agent (adapted from obra/superpowers) |
| license | MIT |
| metadata | {"hermes":{"tags":["debugging","troubleshooting","problem-solving","root-cause","investigation"],"related_skills":["test-driven-development","writing-plans","subagent-driven-development"]}} |
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
Use for ANY technical issue:
Use this ESPECIALLY when:
Don't skip when:
You MUST complete each phase before proceeding to the next.
BEFORE attempting ANY fix:
Action: Use read_file on the relevant source files. Use search_files to find the error string in the codebase.
Action: Use the terminal tool to run the failing test or trigger the bug:
# Run specific failing test
pytest tests/test_module.py::test_name -v
# Run with verbose output
pytest tests/test_module.py -v --tb=long
Action:
# Recent commits
git log --oneline -10
# Uncommitted changes
git diff
# Changes in specific file
git log -p --follow src/problematic_file.py | head -100
WHEN system has multiple components (API → service → database, CI → build → deploy):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary:
Run once to gather evidence showing WHERE it breaks. THEN analyze evidence to identify the failing component. THEN investigate that specific component.
WHEN error is deep in the call stack:
Action: Use search_files to trace references:
# Find where the function is called
search_files("function_name(", path="src/", file_glob="*.py")
# Find where the variable is set
search_files("variable_name\\s*=", path="src/", file_glob="*.py")
STOP: Do not proceed to Phase 2 until you understand WHY it's happening.
Find the pattern before fixing:
Action: Use search_files to find comparable patterns:
search_files("similar_pattern", path="src/", file_glob="*.py")
Scientific method:
Fix the root cause, not the symptom:
test-driven-development skill# Run the specific regression test
pytest tests/test_module.py::test_regression -v
# Run full suite — no regressions
pytest tests/ -q
Pattern indicating an architectural problem:
STOP and question fundamentals:
Discuss with the user before attempting more fixes.
This is NOT a failed hypothesis — this is a wrong architecture.
If you catch yourself thinking:
ALL of these mean: STOP. Return to Phase 1.
If 3+ fixes failed: Question the architecture (Phase 4 step 5).
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare, identify differences | Know what's different |
| 3. Hypothesis | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
| 4. Implementation | Create regression test, fix root cause, verify | Bug resolved, all tests pass |
Use these Hermes tools during Phase 1:
search_files — Find error strings, trace function calls, locate patternsread_file — Read source code with line numbers for precise analysisterminal — Run tests, check git history, reproduce bugsweb_search/web_extract — Research error messages, library docsFor complex multi-component debugging, dispatch investigation subagents:
delegate_task(
goal="Investigate why [specific test/behavior] fails",
context="""
Follow systematic-debugging skill:
1. Read the error message carefully
2. Reproduce the issue
3. Trace the data flow to find root cause
4. Report findings — do NOT fix yet
Error: [paste full error]
File: [path to failing code]
Test command: [exact command]
""",
toolsets=['terminal', 'file']
)
When fixing bugs:
"Default Cascade" — When a value flows through multiple layers and each layer has its own fallback default for that value, a missing or null value at any layer silently triggers the default instead of raising an error. When all defaults are the same wrong value, this creates silent data corruption with no error message.
Example (ATLAS-ERP DISPMAG bug): A "BAJA" (02) movement type flows Frontend → API Schema → Service → Adapter → Engine. At 7 separate points, the default is "08" (ALTA). If the type gets lost at any layer (null serialization, missing key, Optional field), it silently becomes ALTA — the user generates a low FILE that's actually an ALTA, with no error anywhere in the stack.
How to detect during Phase 1:
default, or, get("key", fallback), Optional[X] = fallback across the data flow pathRed flags:
Optional[field] = most_common_value in Pydantic schemasdata.get("key", most_common_value) in dict lookupsarg or most_common_value in function signaturesmovement_type or status field with a "reasonable default" — these should NEVER default silentlyFix pattern:
Before: def process(type: Optional[str] = "08") # Silent default
After: def process(type: str) # Required — fail loudly
Before: data.get("tipo_movimiento", "08") # Silent default
After: data["tipo_movimiento"] # KeyError if missing — catch upstream
From debugging sessions:
No shortcuts. No guessing. Systematic always wins.
[HINT] Download the complete skill directory including SKILL.md and all related files