mit einem Klick
systematic-debugging
// Use when encountering any bug, test failure, or unexpected behavior. Enforces a strict reproduce-first, root-cause-first, failing-test-first debugging workflow before fixing.
// Use when encountering any bug, test failure, or unexpected behavior. Enforces a strict reproduce-first, root-cause-first, failing-test-first debugging workflow before fixing.
Orchestrates multi-day execution of complex tasks through milestones. Each milestone goes through plan-crafting, run-plan (worker-validator), and review-work phases with checkpoint/recovery. Triggers when the user says "long run", "start long run", "execute milestones", or "run all milestones".
Use when you have a written implementation plan to execute. Loads the plan, reviews critically, executes tasks in dependency order, and reports completion. Triggers when the user says "run the plan", "execute the plan", or "let's start implementing".
Corrective cleanup of AI-generated code — removes LLM-specific patterns while preserving behavior. Use when the user says "clean up", "deslop", "slop", "clean AI code", or when you spot LLM-generated code smells after any generation session.
Behavioral guardrails to prevent common LLM coding mistakes — enforces surgical changes, assumption verification, and scope discipline before and during implementation. Use when implementing features, modifying code, or when you notice yourself about to make changes without reading the existing code first.
Decomposes complex, multi-day tasks into optimized milestones using parallel reviewer agents (ultraplan). Spawns 5 independent reviewers that analyze the problem from different angles, then synthesizes their findings into a milestone dependency DAG. Triggers when the user says "plan milestones", "break this into milestones", "ultraplan", or when long-run harness needs milestone generation.
Use when a task's scope is clear and multi-step implementation is needed, before touching code. Triggered after clarification is complete, or when the user explicitly requests plan creation with a clear prompt.
| name | systematic-debugging |
| description | Use when encountering any bug, test failure, or unexpected behavior. Enforces a strict reproduce-first, root-cause-first, failing-test-first debugging workflow before fixing. |
A strict debugging workflow. Use when dealing with bugs, test failures, or unexpected behavior.
Three core purposes:
These rules have no exceptions.
Violating this process is considered a debugging failure.
Use this skill in the following situations:
The following excuses are not accepted:
When using this skill, the following items must be locked internally:
If any of these seven items are missing, the work is not done.
Follow the steps below in order.
First, condense the problem.
Output format:
Problem: <expected> but got <actual> under <condition>
Do not mix symptoms with speculation.
Good: Product detail API returns 500 when brand is null.
Bad: Serializer is broken because brand mapping seems wrong.
You must be able to see the failure again before fixing it.
Priority:
Rules:
What to do when reproduction is not possible:
Collect only observable facts.
Always check:
For multi-component problems, check at each boundary.
Examples:
At each boundary, check:
Do not fix until you have pinpointed the problem location.
Formulate exactly one cause candidate.
Format:
Hypothesis: <root cause> because <evidence>
Qualities of a good hypothesis:
Examples of bad hypotheses:
Trace the cause back to the source. If the error appears deep in the stack, trace the origin of the input, not the symptom.
Lock the failure before fixing.
Priority:
Rules:
If an automated test is feasible, use the test-driven-development skill alongside this one.
The fix addresses only one hypothesis.
Allowed:
Forbidden:
If the fix fails, immediately return to Phase 1 or Phase 3. The previous hypothesis was wrong.
All of the following must be satisfied before closing:
For intermittent bugs, a single pass is not enough. Verification under repeated runs or varying conditions is required.
Stop and reframe in the following situations.
If reproduction fails after multiple attempts:
Changing code without reproduction is forbidden.
If three consecutive fixes miss the mark, conclude:
From this point, a "fourth patch" is not the answer — a structural discussion is needed.
If you cannot create a failing test or equivalent reproduction mechanism, do not declare completion. At minimum, leave behind the reproduction command and observed results.
If any of the following thoughts arise, stop immediately and return to an earlier phase.
Use this checklist for self-verification during execution.
The completion criterion for this skill is not "the code changed."
Completion criteria:
Without these four, debugging is not finished.