ワンクリックで
systematic-debugging
// Use when encountering any bug, test failure, build failure, or unexpected behavior in amd-smi — before proposing any fix. Enforces root-cause investigation before symptom patching.
// Use when encountering any bug, test failure, build failure, or unexpected behavior in amd-smi — before proposing any fix. Enforces root-cause investigation before symptom patching.
Use before any creative work — designing a new amdsmi_* API, adding a CLI command, building a feature, or modifying behavior. Explores intent, requirements, and design before any code is written.
Use when facing 2+ independent problems with no shared state — e.g., unrelated test failures in different subsystems, multiple independent bug investigations, parallel research tasks. Dispatch one focused subagent per domain instead of investigating sequentially.
Use when a written implementation plan exists at docs/dev/plans/ and you need to execute it task-by-task in the current session with verification at each step.
Use when finishing an amd-smi development branch — consolidating commits into logical groups with clean messages AND deciding how to integrate the work (merge to develop, push and open PR, keep as-is, or discard). Covers commit restructuring plus the merge/PR/cleanup workflow.
Use when implementing any amd-smi feature, bug fix, or behavior change — before writing implementation code. Enforces strict RED-GREEN-REFACTOR: failing test first, watch it fail, minimal code to pass, refactor.
Use when starting amd-smi feature work, executing an implementation plan, or reviewing a PR that needs an isolated workspace away from the main checkout. Sets up a worktree following the rocm-systems-pr<PR#> convention.
| name | systematic-debugging |
| description | Use when encountering any bug, test failure, build failure, or unexpected behavior in amd-smi — before proposing any fix. Enforces root-cause investigation before symptom patching. |
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find the root cause before attempting a fix. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
NO FIXES WITHOUT ROOT-CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose a fix.
Use for any technical issue:
amdsmi_* outputEspecially when:
Complete each phase before proceeding to the next.
Read error messages carefully
amdsmi_status_t errors: which enum value? Where is it set?Reproduce consistently
Check recent changes
git log --oneline -20 -- <affected files>git diff HEAD~5 -- <affected file>Gather evidence across cascade layers
amd-smi bugs often span layers. Add instrumentation at every boundary:
# Layer 1: CLI argparse
echo "argparse args: $@"
# Layer 2: Python interface
python3 -c "import amdsmi; print(amdsmi.amdsmi_get_lib_version())"
# Layer 3: C wrapper loaded
python3 -c "from amdsmi import amdsmi_wrapper; print(amdsmi_wrapper.libamd_smi._name)"
# Layer 4: Actual C call return value
strace -e trace=openat python3 -c "import amdsmi; amdsmi.amdsmi_init()" 2>&1 | grep libamd_smi
This reveals which layer breaks, not just that something breaks.
Trace data flow backward
Create a failing test that reproduces the bug
test-driven-development skillImplement a single fix — address the root cause. One change. No bundled refactors.
Verify
verification-before-completion)If the fix doesn't work — STOP. Count attempts.
If 3+ fixes failed: question the architecture
Pattern indicating an architectural problem:
Stop. Discuss with the user. This is not a failed hypothesis — this is a wrong architecture.
All of these mean: stop, return to Phase 1.
If 3+ fixes failed: question the architecture (Phase 4 step 5).
| Excuse | Reality |
|---|---|
| "Issue is simple, no process needed" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write the test after the fix works" | Untested fixes don't stick. Test-first proves the fix actually addresses the bug. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference is too long, I'll adapt the pattern" | Partial understanding guarantees more bugs. Read it fully. |
| "I see the problem, let me fix it" | Seeing the symptom ≠ understanding the cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern. |
| Phase | Activities | Done When |
|---|---|---|
| 1. Root cause | Read errors, reproduce, check changes, instrument layers | You understand WHAT and WHY |
| 2. Pattern | Find working examples, compare differences | You can list every relevant difference |
| 3. Hypothesis | State theory, test minimally | Confirmed, or new hypothesis formed |
| 4. Implementation | Failing test → single fix → verify | Bug resolved, tests green |
grep across header + cc + wrapper + interface + CLI) — gaps are often the root causetools/update_wrapper.sh — the wrapper driftedamdsmi-build-install skill)rm -rf build between attempts — stale CMake cache hides real causestest-driven-development — Phase 4 Step 1 (failing test)verification-before-completion — Phase 4 Step 3 (verify the fix)dispatching-parallel-agents — when there are 2+ independent failures