| name | geniro:debug |
| description | Use when a bug needs systematic investigation — observe, hypothesize, test, isolate, propose fix, author reproduction test. Then escalates to /geniro:follow-up or /geniro:implement to apply the patch. Adversarial mode authors F→P tests against a diff. Skip for bugs with obvious root cause — use /geniro:follow-up directly. |
| context | main |
| model | opus |
| allowed-tools | ["Read","Write","Edit","Bash","Glob","Grep","Agent","AskUserQuestion","WebSearch"] |
| argument-hint | [bug description | verify <diff-range> | verify last changes] |
Subagent Model Tiering
| Subagent | Model | Why |
|---|
adversarial-tester-agent | inherit | Carve-out — reasoning-grade test authoring. Matches the canonical rule in skills/_shared/model-tiering.md and call sites in /geniro:review Phase 4c, /geniro:implement Phase 6 Stage D, /geniro:follow-up Medium Phase 5. Synthesis tier mirrors the orchestrator's; the agent's F→P verification + 3× flake check enforce correctness regardless of inherited tier. |
Every Agent(...) spawn in this skill MUST pass an explicit model= argument per the canonical rule in ${CLAUDE_PLUGIN_ROOT}/skills/_shared/model-tiering.md. For plugin-defined subagents (adversarial-tester), also follow ${CLAUDE_PLUGIN_ROOT}/skills/_shared/spawn-agent.md — bare-name first; on Agent type '<name>' not found, degrade to general-purpose with the agent body inlined.
Evidence Standard
A hypothesis is confirmed ONLY when its Result: field cites one of these artifact kinds:
| # | Kind | Example |
|---|
| 1 | File:line + verified snippet (orchestrator re-read confirms the text) | src/cache/user.ts:42-58 snippet pasted |
| 2 | Captured command / test / build output | pnpm test src/cache/user.test.ts → 1 failing, AssertionError: ... |
| 3 | Log line or stack trace from the running system | 2026-04-01T12:34Z ERROR ... NullPointerException at ... |
| 4 | Query result against the actual datastore | SELECT count(*) FROM sessions WHERE user_id=42 → 0 rows |
| 5 | User-provided artifact (screenshot, log paste, captured request body, env-var dump) | user pastes the request body that triggered the bug |
Reasoning, "the symptom matches the hypothesis", "the agent reported confirmed", and "the user described it verbally" are NOT evidence — they are hypotheses that still need verification. Symptom-matching is correlation; only reproduction with a captured artifact (kind 2-4) confirms causation.
If the orchestrator's tools cannot produce evidence for a hypothesis (no DB access, no production logs, no credentials, no environment access), do NOT mark it inconclusive by default — use the missing-data gate in Step 3 to ask the user for the artifact.
Debug: Scientific-Method Investigation
Use this skill to systematically debug complex issues. Replaces guessing with evidence gathering and hypothesis testing. Each investigation is tracked so you can review what was tried and why.
The Scientific Debug Loop
OBSERVE → BUILD FEEDBACK LOOP → HYPOTHESIZE → TEST → ISOLATE → PROPOSE FIX → AUTHOR REPRO TEST & VERIFY → ESCALATE → DOCUMENT
This is not a suggestion—it's the required process. Do NOT skip steps or guess.
Input
Load custom instructions from .geniro/instructions/global.md and .geniro/instructions/debug.md. Read any found. Apply rules as constraints, additional steps at specified phases, and hard constraints throughout the run.
$ARGUMENTS
Mode routing (inspect $ARGUMENTS BEFORE the empty-check):
- If
$ARGUMENTS matches any verify-intent signal, route to Adversarial Mode (see ## Adversarial Mode: Verify Last Changes below) and skip the scientific-method Workflow. Every signal below is anchored — bare keywords alone are NOT enough, because phrases like "verify that login returns 500" or "stress-test revealed a memory leak" are scientific-method bug reports, not verify requests:
- Anchored keyword signals (keyword + anchor token):
verify <changes|diff|last|recent|my|this|PR>, break <my|the> diff, hunt for bugs in <diff|change|PR>, find edge cases in <diff|change|PR>, adversarial <mode|pass|scan|run>, stress-test <the diff|my change|last changes>
- Phrase signals:
verify last changes, verify recent changes, verify my changes, check last changes, break my diff
- Explicit diff range signals:
HEAD~N..HEAD, HEAD~N, main...HEAD, a bare PR ref (#1234 or GitHub PR URL), or a bare branch name used alongside any verify keyword
- Otherwise → standard scientific-method flow (the
## Workflow: Observe → ... section, unchanged). When in doubt (ambiguous input), default to scientific-method mode — the user can re-invoke with explicit adversarial phrasing if needed.
If $ARGUMENTS is empty, use the AskUserQuestion tool (do NOT output options as plain text) with header "Mode": "What are we doing?" with options "Describe the symptoms" / "Paste error message" / "Point to a failing test" / "Verify last changes (adversarial)". The first three route to scientific-method mode (the selected option becomes the initial bug description seed); the fourth routes to Adversarial Mode. Do not proceed until the user answers.
Hypothesis Tracking Format
Store hypotheses in .geniro/state/debug/HYPOTHESES-<slug>.md (compute <slug> per ${CLAUDE_PLUGIN_ROOT}/skills/_shared/within-skill-state-handoff.md § Slug rules):
Branch: <git branch --show-current OR detached-<short-sha>>
Worktree: <git rev-parse --show-toplevel>
Timestamp: <ISO-8601 UTC>
# Bug: [Bug Title]
**Date:** 2026-04-03
**Description:** [What's broken and how to reproduce]
**Severity:** Critical | High | Medium | Low
## Feedback Loop
**Command:** [exact command/script that reproduces — written by Step 1.5]
**Expected output:** [what happens on a working system]
**Actual output:** [captured artifact — error / log line / wrong value]
**Re-run cost:** [seconds; flag if >30s]
**Determinism:** [3-run signature comparison; flag if divergent]
## Hypothesis 1
- **Hypothesis:** Cache not invalidating on user role change
- **Evidence For:** User sees stale permissions after role update
- **Evidence Against:** Cache invalidation was tested in unit tests
- **Status:** pending → testing → confirmed | rejected | inconclusive
- **Test Plan:** Trace cache invalidation flow for role changes
- **Result:** [Confirmed/Rejected + why]
## Hypothesis 2
- **Hypothesis:** Race condition between role endpoint and permission check
- **Evidence For:** Bug only occurs under load
- **Evidence Against:** Single-user testing shows issue too
- **Status:** pending
- **Test Plan:** Add async logging to both endpoints, test sequentially
- **Result:** [pending]
## Root Cause
[Only filled once hypothesis is confirmed]
## Proposed Fix
[File path(s), diff or before/after snippet, one-sentence rationale — text only, NOT applied to source]
## Fix Evidence (experimental)
[How the patch was verified locally (throwaway experiment), reproduction result, confirmation that experimental edits were reverted]
## Reproduction Decision
[Default: reproduction test authored at <project test path> — see "Fix Evidence" above for F→P run capture. Escape hatch only: orchestrator judged the bug non-reproducible at the test layer; user picked alternative regression guard via AskUserQuestion (assertion / fuzz seed / monitor / accepted-risk) — record the choice and the rationale.]
## Escalation
[/geniro:follow-up or /geniro:implement; handoff via `<PRIMARY_ROOT>/.geniro/state/debug/findings-state.md`]
(Capital Branch:/Worktree:/Timestamp: headers at top are mandatory per ${CLAUDE_PLUGIN_ROOT}/skills/_shared/within-skill-state-handoff.md § Producer contract — they let consumers on resume detect when the file targets a different branch than the current session.)
Fields:
- Hypothesis: Specific, testable claim ("X is causing Y")
- Evidence For/Against: What supports or contradicts this?
- Status: pending → testing → confirmed | rejected | inconclusive
- Test Plan: How will you confirm or reject this?
- Result: What did the test show?
Inconclusive means the test could not distinguish whether the hypothesis is true or false. Common causes: (1) test environment differs from production, (2) bug is intermittent and didn't manifest, (3) test was too coarse to isolate this hypothesis, (4) multiple interacting causes mask effects. An inconclusive result is NOT a rejection — it means you need a better test or more data.
Universal Rule: All Choice Questions Use AskUserQuestion
Every user-facing choice in this skill — including ad-hoc gates NOT explicitly enumerated below (e.g. "verify root cause or fall back to a band-aid?", "experiment needs a credential I don't have — how do you want to handle it?") — MUST go through the AskUserQuestion tool. The enumerated gates are examples, not an exhaustive list. If you're about to type (A)... or (B)... in chat, stop and call the tool instead.
Workflow: Observe → Build Feedback Loop → Hypothesis → Test → Propose Fix → Escalate
0. Retrieve Prior Knowledge (1 min)
Before investigating, check for relevant prior learnings:
- Scan
<PRIMARY_ROOT>/.geniro/knowledge/learnings.jsonl for gotchas and patterns related to the affected area (use Grep with keywords from the bug description). Resolve <PRIMARY_ROOT> per ${CLAUDE_PLUGIN_ROOT}/skills/_shared/primary-worktree.md Mode A.
- If relevant learnings exist, use them to inform initial hypotheses — don't re-discover known issues
1. Observe (5 min)
- Reproduce the bug consistently
- Gather error messages, logs, stack traces
- Identify what changed (recent commit? config? user action?)
- Record the exact steps to reproduce
- If reproduction steps are unclear or missing: Use the
AskUserQuestion tool (do NOT output options as plain text) to ask the user for specific details (environment, steps to trigger, expected vs actual behavior). Do NOT guess at reproduction — ask.
1.5. Build Feedback Loop (5–10 min) — REQUIRED before Step 2
A feedback loop is a fast, deterministic signal that reproduces the bug AND can be re-run cheaply (≤30 seconds, ideally ≤5 seconds). The whole hypothesis-test cycle in Steps 2-3 iterates on this signal — without one, every hypothesis test is slow and every result is noisy.
This is NOT the project's reproduction test from Step 6. Step 6 authors a unit/integration test in the project framework and ships with the fix as the regression guard. Step 1.5 builds a fast-iteration scratch signal so Steps 2-3 can move quickly. The two are separate deliverables: the Step 6 test STAYS on disk; the Step 1.5 scratch signal is reverted at Cleanup.
Pick the cheapest option that reliably reproduces:
| Option | Use when | Example |
|---|
| Failing assertion in REPL / test runner | Bug is in pure logic, no I/O | node -e "require('./src/cache').compute(...) // expect 5, got 7" |
curl against running dev server | Bug is in HTTP/API behavior | curl -X POST localhost:3000/api/foo -d '{...}' -i |
| SQL query against test DB | Bug is in query/migration logic | psql -c "SELECT * FROM users WHERE ..." |
| Headless browser script | Bug is UI-rendered | Playwright snippet that takes one screenshot |
| Differential test (good vs bad commit) | Regression — works at commit X, broken now | git checkout <good>; <repro>; git checkout <bad>; <repro> |
| Fuzz / loop reproducer | Bug is intermittent | `for i in {1..100}; do ; done |
| Manual click-through script | Genuinely UI-only with no automation seam | numbered steps in HYPOTHESES-.md (use as fallback only) |
Quality bar — the loop must be:
- Fast: re-runs in seconds, not minutes. If the only loop you can build takes 5 minutes per cycle, hypothesis testing will be unbearable. Stop and ask: "can I shrink the scope (smaller payload, in-memory mock, skip auth) to make this faster?"
- Deterministic: same input → same observed failure, every time. If 3 consecutive runs produce 3 different signatures, you have a flake or two bugs — note this in HYPOTHESES-.md before continuing; Step 2 hypotheses must distinguish "intermittent failure" from "two bugs."
- Captured: produces an artifact that satisfies the Evidence Standard (kind 2-4) — failing assertion, log line, query result. "I see it crash" is not a captured artifact.
Save the loop in HYPOTHESES-.md under a ## Feedback Loop section between "Description" and "Hypothesis 1":
## Feedback Loop
**Command:** `<exact command or script that reproduces>`
**Expected output:** `<what happens on a working system>`
**Actual output:** `<paste captured artifact — error, log line, wrong value>`
**Re-run cost:** `<seconds; flag if >30s>`
**Determinism:** `<3-run signature comparison; flag if divergent>`
If you cannot build a feedback loop in 10 minutes, do NOT proceed to Step 2 by guessing — fire the missing-data gate via AskUserQuestion (header "Repro signal", options like "Paste a failing log line" / "Run this command and paste output" / "I can't repro — escalate to user-with-context"). Hypothesis testing without a reliable loop is guessing.
2. Hypothesize (5 min)
- Based on observation AND the feedback-loop output (Step 1.5), form 2–3 competing hypotheses
- Each hypothesis must be testable AGAINST THE FEEDBACK LOOP — Step 3's tests will toggle one variable, re-run the loop, observe whether the captured signature changes
- Avoid "it's probably X" without evidence
- Write hypotheses in
.geniro/state/debug/HYPOTHESES-<slug>.md
- Consider infrastructure causes alongside code causes — connection timeouts, resource exhaustion, DNS failures, container restarts, database connection pool limits, cloud service rate limits, and deployment-related changes (new config, changed env vars, scaled-down replicas) are common root causes that code inspection alone will miss. If symptoms include timeouts, intermittent failures, or errors that only appear in deployed environments, form at least one infrastructure hypothesis.
3. Test (10–30 min)
- Design a minimal test for each hypothesis. The test must produce a captured artifact per the Evidence Standard (kind 2–4) — a failing assertion, a log line, a query row count.
- Add logging, breakpoints, or unit tests to gather evidence
- Do NOT implement a fix yet—you're gathering data
- Missing-data gate: if testing a hypothesis requires data the orchestrator's tools cannot reach (production logs, runtime state, third-party API responses, DB rows behind credentials, screenshots), do NOT mark the hypothesis inconclusive by default. Use the
AskUserQuestion tool (do NOT output options as plain text — use the tool's structured UI) with header "Missing data" and 2-4 concrete options for the artifact you need. Examples:
- "Paste the failing log line at the time of the error" / "Paste the request body that triggered the error" / "I don't have it — mark inconclusive"
- "Run this query against the production DB and paste the result:
<query>" / "I can't run that query" / "Skip this hypothesis"
- "Provide a screenshot of the broken state" / "I don't have it — skip"
Apply the same "ask-don't-guess" rule that already governs Step 1's reproduction-step ambiguity (see Step 1 above) — extend it to test data, not only repro steps.
- Record results: confirmed, rejected, or inconclusive — every Result: field MUST cite an artifact per the Evidence Standard. "Confirmed" with a narrative-only Result is rejected.
4. Isolate (5–15 min)
- Once hypothesis is confirmed, identify exact code location
- Trace the data flow or control flow leading to the bug
- Understand why the bug happens (not just where)
5. Propose Fix (5–15 min)
- Refresh custom instructions (~5 sec): re-read
.geniro/instructions/global.md, .geniro/instructions/debug.md, and .geniro/instructions/code-style.md (if any are present). Their rules / additional steps / hard constraints still apply to this step — re-load to ensure they survive any compaction since Phase 1.
- Multi-path fix gate (Always-WAIT, per Universal Rule). If the confirmed root cause has more than one valid fix path with real trade-offs (e.g., snapshot-vs-live-fetch, COALESCE vs CHECK constraint vs catch+log, fix-at-source vs fix-at-call-site), do NOT pick one and write a single text proposal. Fire
AskUserQuestion per the canonical shape at ${CLAUDE_PLUGIN_ROOT}/skills/_shared/per-finding-question.md § Investigation-driven fix gate (debug-flavored). Set header: "Fix path". Render the question text with the confirmed root cause's path:lines and hypothesis title; render each option's preview with the investigation context (Root cause / Evidence from "Fix Evidence" / hypothesis-confirmed status / hypothesis number from .geniro/state/debug/HYPOTHESES-<slug>.md) so the user can compare fix paths against the same evidence. Each option's label (1-5 words) names the path; description carries the one-line trade-off. Record the user's choice in .geniro/state/debug/HYPOTHESES-<slug>.md under "Proposed Fix" before drafting the unified diff. The single-text-proposal default below applies ONLY when there is one obvious right fix; multi-path is the explicit branch the § Universal Rule above requires the tool for. Empty AskUserQuestion answer = upstream bug — fall back to plain text and re-ask. This is symmetric with [PRODUCT-DECISION] handling in the rest of the pipeline (see ${CLAUDE_PLUGIN_ROOT}/skills/implement/implement-reference.md §Auto Mode Behavior, [PRODUCT-DECISION] finding encountered row).
- Formulate the minimal fix for the root cause as a text proposal: file path(s), exact change (unified diff or before/after snippet), and a one-sentence rationale.
- Do NOT write the fix to production/source files. Write/Edit are available for EXPERIMENTS only (tests, logging, debug scripts,
.geniro/state/debug/ artifacts) — not for applying the proposed patch. If any experiment modified non-test source (e.g., a temporary log line, patched value), revert those edits before escalation; the escalated skill applies the real fix cleanly.
- Do NOT refactor adjacent code.
6. Author Reproduction Test & Verify Root Cause (10–15 min)
- Refresh custom instructions (~5 sec): re-read
.geniro/instructions/global.md, .geniro/instructions/debug.md, and .geniro/instructions/code-style.md (if any are present). Their rules / additional steps / hard constraints still apply to this step — re-load to ensure they survive any compaction since Phase 1.
- Author the reproduction as a unit or integration test in the project's test framework, placed at the project's normal test path next to the source it covers (detect framework + naming convention from CLAUDE.md Essential Commands and an exemplar test file). Scripts, curl commands, and ad-hoc queries are NOT acceptable substitutes — they get deleted at Cleanup and leave no regression guard. The test stays on disk and ships with the fix as the regression gate.
- Test name and comments must be self-contained. The reproduction test name AND any comments inside the test describe the bug behavior — the input, condition, or observable failure — never the hypothesis number from
HYPOTHESES-<slug>.md or any other thread-local label. Tags like Bug A/B/C, Hypothesis 1/2, Test 1, Case X, Issue #N from this run, regression from review run, found by review-gate, or confirmed by this <skill> run are meaningless once the investigation conversation ends; the test ships with the fix as a regression guard, and a reader weeks later won't have that conversation. Prefer cacheKey omits userId so role change leaves stale cached profile over Bug C.
- F→P invariant. Pre-fix: run the authored test at least 2× and confirm the SAME failure signature both times (same exception type + same failing assertion / same status code / same row count). Two divergent failures are NOT confirmation — they're flakiness or two different bugs; investigate before continuing.
- Verify the proposed fix — monkey-patch in the test by default; production-source edits are an explicit escape hatch. Apply the patch locally as a monkey-patch inside the authored test file (mock, fixture, test-local shim, or a throwaway helper imported only by the test) — NOT by editing production/source files. Re-run the authored test at least 2× post-fix and confirm the failure DISAPPEARS both times. If the bug genuinely cannot be verified without editing production source (the patch lives in a hard-to-mock chain — DI container, framework hook, native module, generated code), treat each such edit as an explicit escape hatch: list every touched production file under "Verification edits to revert:" in the Phase 6.5a findings, confirm each is reverted before escalation, and re-run
git diff to prove the working tree contains only the reproduction test. The reproduction test stays on disk; production source must end Phase 6 unchanged.
- Escape hatch — non-deterministic bugs only. If the bug is genuinely non-reproducible at the test layer (race conditions only seen under load, environment-only failures, UI flake), use the
AskUserQuestion tool (do NOT output options as plain text) per the canonical shape at ${CLAUDE_PLUGIN_ROOT}/skills/_shared/per-finding-question.md § Investigation-driven fix gate (debug-flavored). Set header: "Repro infeasible". Render the question text with the best-guess root-cause path:lines (or "unknown" if not isolated) and hypothesis title; render each option's preview with the investigation context — the Reproduction status field carries "Reproduction infeasible — " pulled from the Reproduction Decision rationale below. Each option's label names a regression guard (1-5 words, e.g. "Add runtime assertion", "Author fuzz seed", "Add monitor/alert", "Skip regression guard"); description carries the one-line trade-off (e.g. "at <file:line>", "triggers ~50% of runs", "in <observability tool>", "accept the risk"). Record the user's selection AND the rationale in .geniro/state/debug/HYPOTHESES-<slug>.md under "Reproduction Decision". The default is mandatory; the escape hatch is opt-in with a paper trail.
- Do NOT run the full project test suite here — that belongs to the escalated skill. The goal is the F→P-verified test artifact + evidence the proposed patch turns it green.
- Record the experimental evidence in
.geniro/state/debug/HYPOTHESES-<slug>.md under "Fix Evidence" — paste the captured pre-fix output AND the captured post-fix output, not narrative summaries.
- If the project uses code generation (check CLAUDE.md) AND the proposed fix touches DTOs, schemas, or controllers: note this in the escalation so the receiving skill runs codegen.
6.5. Present Findings & Escalate (WAIT)
Before asking where to route the fix, you MUST present a human-readable findings summary to the user. Do NOT jump straight to AskUserQuestion — the user chooses the escalation target based on this summary, so it has to be visible first. HYPOTHESES-.md is a working scratchpad, not a substitute for a user-facing report.
6.5a — Present Findings Summary & Persist State
Output the following markdown block directly in the chat AND write the same block to <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md — resolve <PRIMARY_ROOT> per ${CLAUDE_PLUGIN_ROOT}/skills/_shared/primary-worktree.md Mode A so the handoff survives worktree teardown (single file per branch, overwritten on each run — mirrors the /geniro:review state-artifact convention). Fill with the actual values from your investigation. Use "none" for any field that truly doesn't apply — don't omit fields. Prepend an ISO-8601 timestamp header (# Debug Findings — <timestamp>) to the file version so downstream skills and resumed sessions can tell stale artifacts from fresh ones. Capture Source branch: from git branch --show-current and Source worktree: from git rev-parse --show-toplevel so consumer skills (/geniro:follow-up, /geniro:implement) can detect when the user later starts work in a different worktree or branch — see ${CLAUDE_PLUGIN_ROOT}/skills/_shared/debug-handoff.md for the consumer contract.
## Debug Findings
**Source branch:** [from `git branch --show-current` — the branch debug ran on]
**Source worktree:** [from `git rev-parse --show-toplevel` — absolute path]
**Why escalating to <target>:** [one sentence — which target (`/geniro:follow-up` trivial / `/geniro:implement` non-trivial) and the concrete reason this scope fits it; user makes the final routing choice in 6.5b.]
**Root cause:** [one sentence, plain language — why the bug happens]
**Reproduction:** [exact steps that trigger the bug]
**Confirmed hypothesis:** [which numbered hypothesis from HYPOTHESES-<slug>.md was confirmed, and the test result that confirmed it]
**Rejected hypotheses:** [brief — which hypotheses were ruled out and why, so the user sees what was considered]
**Proposed fix:**
- Files: [path(s) that need to change]
- Change: [unified diff or before/after snippet]
- Rationale: [one sentence tying the change to the root cause]
**Evidence the fix works:** [what happened when you applied the patch in Step 6 — e.g., "failing test went green under in-test monkey-patch; production source untouched" (default), or "<n> production files edited as escape hatch and reverted; bug stopped reproducing"]
**Reproduction test:** [<path>, <F→P status — example: "verified red on current code; verified green under throwaway patch"> — OR — "escape hatch: <alternative guard with rationale>"]
**Special handling:** [codegen, migrations, schema changes, env/config updates — or "none"]
The receiving skill pre-loads findings from <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md — the state file is the handoff channel, not a chat paste. Do NOT re-derive, reword, or inline the summary into the escalation command; the file path is the contract.
6.5b — Escalation Decision
Only after the summary above is visible AND written to <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md, use the AskUserQuestion tool (do NOT output options as plain text) with header "Escalate" and these options:
- Trivial — run
/geniro:follow-up; pre-load findings from <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md; suggests relocating authored tests if you switch branch/worktree — ≤2 files, obvious target, no architecture or auth/permissions change.
- Non-trivial — run
/geniro:implement; pre-load findings from <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md; suggests relocating authored tests if you switch branch/worktree — touches multiple modules, changes interfaces, needs architecture review, or introduces a new pattern.
- Cannot verify — request specific data from user — pick this when one or more hypotheses are unverified because the orchestrator's tools cannot reach the artifact. Trigger a follow-up
AskUserQuestion with concrete options for the missing data (paste log line / run query / provide screenshot / dump env vars). When the data arrives, return to Step 3, do NOT escalate to fix mode yet.
- Leave it to me — the user will apply the patch manually using the state file as reference. Skip to Step 7.
Do NOT auto-invoke the next skill — surface the suggestion only. The user runs the slash command themselves; the state file is the handoff channel. You do NOT apply the patch yourself. Full-suite validation is the receiving skill's responsibility.
7. Document
- Update
.geniro/state/debug/HYPOTHESES-<slug>.md with final outcome
- Extract Learnings: Follow the canonical rubric in
skills/_shared/learnings-extraction.md. Bias hard toward flow, architectural, and recurring-mistake learnings; do NOT save narrow interface/field shapes, single-file behaviors, or facts re-derivable by reading the code. Apply the Reflect → Abstract → Generalize pre-pass before every save: if you cannot restate the finding one level up, drop it. Route per canonical: transferable debugging insights and class-of-bugs patterns → learnings.jsonl; user corrections during investigation → feedback_* memory; project-wide ongoing-investigation facts → project_* memory. UPDATE existing entries rather than duplicate. Skip if nothing novel.
8. Suggest Improvements (project scope only)
After documenting, follow the canonical routing in skills/_shared/improvement-routing.md. Debug runs typically surface: (a) coding conventions / style or naming patterns discovered while reproducing the bug → .claude/rules/<scope>.md with paths: glob frontmatter (Anthropic-native, file-scoped — auto-loads when matching files are touched); (b) docs that described behavior not matching reality → CLAUDE.md or project docs; (c) new/changed commands discovered during debugging → CLAUDE.md; (d) non-obvious debugging insights or workarounds → learnings.jsonl; (e) skill-behavior quality gates / workflow steps the user enforced manually for debug runs → .geniro/instructions/debug.md. Plugin-internal paths (${CLAUDE_PLUGIN_ROOT}/…) are out of scope — use /improve-template.
Adversarial Mode: Verify Last Changes
A. Purpose
Attacker-mindset pass that AUTHORS executable F→P failing tests against a diff. Complements the scientific-method mode: that mode REPORTS hypotheses about a known bug; adversarial mode hunts for unknown bugs in recent changes by writing tests that fail on today's code. Test authoring is delegated to adversarial-tester-agent; the orchestrator independently re-runs authored tests to confirm the failure before surfacing findings.
B. Diff Resolution
Resolve the diff scope using the same multi-form parser as /geniro:review Phase 1 (see ${CLAUDE_PLUGIN_ROOT}/skills/review/SKILL.md §Phase 1: Collect Context & Triage — do NOT duplicate the parser here).
Default when no explicit range is passed: scope follows ${CLAUDE_PLUGIN_ROOT}/skills/_shared/scope-anchor.md — anchor on the current cwd's worktree and the currently checked-out branch; do NOT invoke gh pr list or git checkout to discover targets. Resolve the base branch per scope-anchor rule #3 (git symbolic-ref --short refs/remotes/origin/HEAD → typically origin/main or origin/master; fall back to local main/master if no remote — do NOT hardcode main). Compute git diff <base>...HEAD (covers everything on the current branch not on the base) and git diff --name-only <base>...HEAD for the file list. If the current branch IS the base branch, fall back to HEAD~1..HEAD. Include uncommitted work via a trailing git diff (unstaged) + git diff --cached (staged) snapshot when present.
Supported shapes:
- Bare keyword (
"verify last changes") → default (<base>...HEAD per scope-anchor rule #3, or HEAD~1..HEAD if currently on the base branch)
- Explicit range (
HEAD~3..HEAD, abc123..def456)
- Branch (
feat/foo...HEAD)
- PR ref (strip leading
#, resolve via gh pr diff <number-or-url>)
C. Skip Conditions
Apply the same skip-matrix philosophy as skills/follow-up/SKILL.md Step 1.5 (see that file for the canonical matrix — do NOT duplicate). Adversarial mode is SKIPPED and the skill reports "no adversarial pass — <reason>" when any of these hold:
- Empty diff (nothing to test)
- Diff contains zero production-code files (docs / config / lock / generated only)
- Diff is >50 changed files OR >1000 changed LOC → suggest
/geniro:review for oversized diffs (the agent's 10-test hard cap wastes budget on diffs this large)
D. Adversarial Workflow
- Resolve the diff (see B). Pre-inline full diff + changed-file contents for the spawn prompt.
- Detect the project test framework. Read
CLAUDE.md Essential Commands section + package.json scripts / pyproject.toml / Cargo.toml to extract (a) the test command, (b) test-file naming convention, (c) 1–2 exemplar test files closest to the changed code.
- Spawn
adversarial-tester-agent — see Spawn Template in §E below.
- Independently re-run authored tests. Read the agent's report at
<PRIMARY_ROOT>/.geniro/state/debug/adversarial-tests.md, extract authored test file paths, then run the project test command once per authored test. Single independent re-run per authored test (the agent already ran its own 3× flake check per its Step 5; duplicating would waste budget). Any test that does not fail deterministically on the re-run is deleted from disk AND removed from the report.
- Present Adversarial Findings — see §F below.
- Escalate. Reuse Step 6.5b
AskUserQuestion (header "Escalate") with the same three options — Trivial → /geniro:follow-up, Non-trivial → /geniro:implement, Leave-it-to-me. Before asking, ensure the Adversarial Findings summary from §F has been written to <PRIMARY_ROOT>/.geniro/state/debug/adversarial-tests.md (the agent already wrote it at Step 3; append the re-verification delta if tests were discarded). The escalation option labels MUST reference that file by path (e.g., "Trivial — run /geniro:follow-up; pre-load findings from <PRIMARY_ROOT>/.geniro/state/debug/adversarial-tests.md") — the authored test file paths inside are the escalation targets, and the receiving skill applies the fix and confirms the now-green test suite. If zero red tests survived re-verification, SKIP Step 6.5b entirely — report "no bugs found in scanned diff" and go to DoD.
E. Spawn Template
Agent(subagent_type="adversarial-tester-agent", prompt="""
## Task: Adversarial Edge-Case Test Authoring (Debug — Verify Changes)
WORKTREE: [from `git rev-parse --show-toplevel`]
BRANCH: [from `git branch --show-current`]
### Diff (changed files + contents)
[Pre-inline `git diff <resolved-range>` output AND full contents of every changed source file from Step 1]
### Shared Edge-Case Checklist (READ this file yourself at runtime — do NOT paste here)
`${CLAUDE_PLUGIN_ROOT}/skills/review/tests-criteria.md`
### Project Test Framework
- Test command (from CLAUDE.md Essential Commands): [e.g. `pnpm test`, `pytest`]
- Test-file naming convention: [project's pattern — e.g. `*.test.ts` adjacent to source]
- Exemplar test files (1-2, pre-inlined): [closest existing test files to the changed code]
### Hypothesis Seeds
none — adversarial mode runs a fresh pass (no prior reviewer findings available in debug).
### Output
Write your report to `<PRIMARY_ROOT>/.geniro/state/debug/adversarial-tests.md` (resolve `<PRIMARY_ROOT>` per `${CLAUDE_PLUGIN_ROOT}/skills/_shared/primary-worktree.md` Mode A). Authored test files go to the project's normal test paths. Do NOT git add/commit/push.
### F→P Invariant (NON-NEGOTIABLE)
Every test you keep MUST fail 3 times in a row on the current code. If it passes today, delete the test and mark `discarded-cannot-repro`. Flaky = discard.
### Scope
Diff-only — the orchestrator resolved the scope above. Do NOT author tests for files outside the changed-files list. Hard cap: 10 authored tests.
Anchor: stay within WORKTREE on BRANCH — verify with `pwd && git branch --show-current` on first Bash call; abort if either differs. See `skills/_shared/scope-anchor.md` § Subagent spawn anchor.
""", description="Adversarial tests: /geniro:debug verify-changes")
F. Findings Summary
After re-verification, present this block directly in chat:
## Adversarial Findings
**Diff scope:** [range + file count + LOC]
**Hypotheses generated:** [N]
**Tests authored (kept after re-verify):** [M]
**Tests discarded (F→P failed on re-run):** [K]
### CRITICAL / HIGH findings
[For each: test file path, targeted source, category, confidence, hypothesis, reproduction command, suggested direction for fix (NOT the patch itself)]
### MEDIUM findings
[same shape]
### Discarded / Inconclusive
[brief list with reasons]
**Zero red tests?** [If M == 0 after re-verify: state plainly "no bugs found in scanned diff" — this is a valid outcome.]
If zero red tests survive, skip escalation entirely and go directly to Cleanup/DoD. Otherwise proceed to escalation per §D Step 6.
Escalation Limits (Scientific-Method Mode)
These limits apply to scientific-method mode only. Adversarial mode inherits the agent-level stop rules defined in agents/adversarial-tester-agent.md (5 consecutive discards stop hypothesis generation; 10 authored tests is the hard cap per run).
- Hypothesis testing: If 5 hypothesis tests across all hypotheses are inconclusive, do NOT escalate directly to the user with findings — first run the missing-data gate (Step 3, "Missing-data gate"). Use the
AskUserQuestion tool (do NOT output options as plain text) with header "Missing data" and concrete options for the specific artifacts that would flip an inconclusive into a result (logs, query output, screenshot, env-var dump, repro from a real environment). Only escalate to user-with-findings if the user explicitly says they cannot supply the artifact, in which case escalation may need domain expertise or more reproduction data.
- Fix attempts: If 2 fix attempts fail verification, stop and present findings to the user via the
AskUserQuestion tool (do NOT output options as plain text) with header "Fix-fail" and these options:
- Try different approach — go back to Step 2 (Hypothesize) with a fresh angle.
- Escalate to /geniro:implement — hand off for deeper architectural rework.
- Show investigation summary — print the current HYPOTHESES-.md state and stop.
Git Constraint
Do NOT run git add, git commit, git push, or git checkout. The orchestrating skill handles all version control. You may use git bisect, git log, git diff, and git blame for investigation.
Fix Constraint
Do NOT apply the bug fix to production/source code. You MAY Write/Edit for two distinct purposes:
- Deliverables (kept on disk, ship with the fix): the reproduction test authored in Step 6, placed at the project's normal test path
- Experiments (reverted before escalation): debug logging, temporary print statements, scratch scripts, throwaway patches used in Step 6 to verify the root cause, and
.geniro/state/debug/HYPOTHESES-<slug>.md working notes
The actual fix is delivered as a text proposal (diff or before/after) and escalated via Step 6.5 to /geniro:follow-up (trivial) or /geniro:implement (non-trivial). The reproduction test is the regression guard the receiving skill turns green; this keeps architecture/review gates in play and preserves a clean audit trail.
Isolation Techniques
When narrowing down the bug source:
- Binary search: Disable half the relevant code path, check if bug reproduces. Narrow the range iteratively.
- Git bisect: For regressions, use
git bisect to identify the commit that introduced the bug.
- Profiling: For performance bugs, use profiling tools to get quantitative data (timing, memory) rather than inspecting code.
Infrastructure Investigation
When symptoms suggest the bug may not be in the code (timeouts, intermittent failures, environment-specific errors, deployment regressions), investigate infrastructure before or alongside code hypotheses:
Logs & error tracking:
- Check application logs for error spikes, unusual patterns, or upstream failures (
docker logs, cloud logging CLI, log aggregator)
- Look for correlation: did errors start at a specific time? Does that coincide with a deployment, config change, or infrastructure event?
Service health:
- Check database connectivity and query performance — connection pool exhaustion and slow queries are common silent killers
- Check external service dependencies — are APIs returning errors or timing out?
- Check container/process health — OOM kills, restart loops, CPU throttling
Environment & config:
- Compare environment variables between working and broken environments
- Check for recent config changes, secret rotations, or certificate expirations
- Verify DNS resolution, network connectivity, and firewall rules
Resource limits:
- Check memory usage, CPU utilization, disk space, and file descriptor limits
- Check database connection pool size vs active connections
- Check rate limits on external APIs
Form infrastructure hypotheses with the same rigor as code hypotheses — record them in HYPOTHESES-.md with test plans. "The database connection pool is exhausted under load" is a testable hypothesis; "something is wrong with the server" is not.
Compliance — Do Not Skip Steps
| Your reasoning | Why it's wrong |
|---|
| "It's probably a cache issue" — guess and code | Guesses waste time. Form a hypothesis, then test it with evidence. |
| "I know what this is, let me just fix it" | Intuition-based fixes mask the real cause. Gather evidence first. |
| "It looks right, no need to test" | "Looks right" is the #1 predictor of broken fixes. Run the tests. |
| "Let me fix these three things at once" | Multi-variable changes make it impossible to know what worked. Test one hypothesis at a time. |
| "The error message says X, so it must be X" | Error messages lie. Verify with logs, debuggers, and traces. |
| "The fix is one line, I'll just write it and escalate nothing" | Escalate every fix. One-line fixes go to /geniro:follow-up; the architecture/review gate still applies. |
| "I added experimental logging and while I'm here I'll patch the bug too" | Experiments and fixes are separate deliverables. Revert experimental edits; escalate the proposed patch. |
| "The user said just fix it" | If the user explicitly overrides, pick "Leave it to me" in Step 6.5 and produce the patch as text — still do NOT write it to source. The user applies it manually. |
| "Changes look fine, I'll skip adversarial mode" | "Looks fine" is the attacker's favorite surface. If the user asked for verify-changes, run the adversarial pass — a zero-red-tests outcome is still a valid deliverable, but only after the agent actually ran. |
| "Small diff, adversarial pass is overkill" | The 10-test hard cap and single-agent cost make adversarial mode cheap even on small diffs. Skip only when the skip-matrix rules fire (empty / docs-only / oversized), not on vibes. |
| "I'll reason about edges instead of authoring tests" | Reasoning is reviewer-mindset. Adversarial mode AUTHORS executable failing tests because reasoning misses what running code catches. Delegate to the agent. |
| "The agent reported F→P, I'll trust it" | The orchestrator MUST independently re-run authored tests per the agent's own Delegation Boundary. Self-reported F→P is evidence, not proof. |
| "A finding improves an agent prompt, I'll include it in Step 8" | Plugin files are out of scope. Suggest only project-owned targets (CLAUDE.md, .geniro/instructions/, .geniro/knowledge/learnings.jsonl). |
| "The findings are in HYPOTHESES-.md, I'll just ask the escalation question" | HYPOTHESES-.md is a scratchpad, not a user-facing report. Step 6.5a requires an explicit findings summary in chat AND persisted to <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md before the escalation question — the user decides where to route based on the chat summary, and the receiving skill pre-loads from the state file. |
| "I'll paste the full findings summary into the escalation command" | The escalation options reference <PRIMARY_ROOT>/.geniro/state/debug/findings-state.md by path — that file IS the handoff. Inlining the summary into the command bloats context and lets the two copies drift. File path only. |
| "The hypothesis matches the symptom — that's confirmation" | Symptom-matching is correlation, not causation. A hypothesis is confirmed only by reproduction with a captured artifact per the Evidence Standard, not by "the story fits". |
| "I have no DB / log / production access — mark this hypothesis inconclusive" | Inconclusive-by-default is a fabrication shortcut. Run the Step 3 missing-data gate first: AskUserQuestion for the specific artifact. Only mark inconclusive if the user confirms they cannot supply it. |
| "The user described the reproduction verbally, that's enough" | Verbal repro is a hypothesis seed, not a re-runnable artifact. Step 6 requires a captured artifact (failing test, script, curl + response). Convert verbal repro to a captured form, or ask the user to paste the actual output. |
| "I have a script / curl / query that reproduces the bug, that's enough" | Scripts get deleted at Cleanup and leave no regression guard. Step 6 mandates the reproduction be authored as a unit or integration test in the project's test framework — that test ships with the fix as the regression gate. The escape hatch is invoked by the orchestrator only when the bug is genuinely non-reproducible at the test layer (race conditions under load, environment-only, UI flake), with the user choosing the alternative guard via AskUserQuestion and the rationale recorded — not a way to skip Step 6 silently. |
| "The agent reported the hypothesis confirmed — I'll trust it and move on" | Self-reported confirmation is evidence, not proof — the same rule that already governs adversarial mode (see existing row about F→P self-reports). The orchestrator MUST independently re-run the test / re-read the file:line / re-execute the query before advancing to Step 4 Isolate. |
| "Per protocol I should ask via AskUserQuestion, but this specific intermediate question isn't in the enumerated gates — I'll inline (A)/(B) in chat" | The Universal Rule at the top of this skill makes the tool mandatory for ANY choice question, not just the enumerated ones. Ad-hoc gates ("verify root cause vs band-aid", "need runtime data — proceed how?", "this hypothesis needs your input") are exactly when the rule fires hardest. If you catch yourself rationalizing "but this case is different / needs runtime confirmation / is just a quick check" — stop and call the tool. |
"I'll name the reproduction test after the confirmed hypothesis number from HYPOTHESES-<slug>.md — it's the cleanest reference back to the investigation." | HYPOTHESES-<slug>.md gets deleted at Cleanup; the test ships with the fix. A name like Bug C or Hypothesis 2 reproduction is meaningless to whoever reads the test in CI weeks later. The test name AND any comments inside it must describe the bug behavior on their own. |
| "I see two valid fixes for this root cause — I'll just pick one and write the text proposal" | Step 5's multi-path fix gate (Always-WAIT, per § Universal Rule above) requires AskUserQuestion whenever the root cause has more than one valid fix path with real trade-offs. The single-text-proposal default applies only when one fix is obviously right. Picking one path silently ships a product decision the user did not authorize. If you catch yourself reasoning "Option A is cleaner / more idiomatic / smaller diff" — that's the rationalization; both options are valid by your own evidence, and the user picks. |
Cleanup
After the debug session completes (fix verified or escalated):
Cleanup is best-effort — if a command fails silently, that's fine.
Definition of Done
For each debug session, confirm the checklist for the mode that ran.
Scientific-Method Mode
Adversarial Mode
When to Use This Skill
Use /geniro:debug:
- Bug has unclear root cause
- Quick fix didn't work and you need to understand why
- Bug is intermittent or hard to reproduce
- You're tempted to guess at a fix
- Multiple possible causes exist
- Bug involves async code, concurrency, or state
Don't use:
- Obvious one-line fix (typo, off-by-one) — go straight to
/geniro:follow-up
- Bug is already understood and fix is clear —
/geniro:follow-up or /geniro:implement directly
- Need system-wide refactor —
/geniro:implement
Remember: debug investigates and proposes — it never applies the fix. If the proposed patch looks obvious after Step 4, that's a signal you should have gone straight to /geniro:follow-up in the first place.
Examples
Example 1: Cache Not Invalidating
/geniro:debug User sees stale data after profile update
→ Observe: User updates name, refresh page shows old name
→ Hypothesis 1: Cache invalidation broken
→ Hypothesis 2: Update endpoint not called
→ Test: Add logging to cache invalidation and endpoint
→ Result: Hypothesis 1 confirmed (cache key mismatch)
→ Propose: patch cacheKey builder in src/cache/user.ts to include user ID
→ Verify root cause: local experiment shows bug disappears with patch (reverted)
→ Escalate: /geniro:follow-up with the proposed patch
Example 2: Intermittent Timeout
/geniro:debug API endpoint times out randomly under load
→ Observe: Happens ~5% of requests during stress test
→ Hypothesis 1: Database query too slow
→ Hypothesis 2: External service timeout
→ Test: Profile database queries, check service logs
→ Result: Hypothesis 2 confirmed (service is slow)
→ Propose: add timeout and fallback around the external service call
→ Verify root cause: local experiment shows timeouts disappear with patch (reverted)
→ Escalate: /geniro:implement with the proposed patch
Example 3: Memory Leak
/geniro:debug Heap grows unbounded in React component
→ Observe: Memory increases 10MB/min over 1 hour
→ Hypothesis 1: Event listener not cleaned up
→ Hypothesis 2: Large cache never evicted
→ Test: Add heap snapshot profiling, check cleanup
→ Result: Hypothesis 1 confirmed (useEffect missing cleanup)
→ Propose: add return cleanup function to the offending useEffect
→ Verify root cause: local experiment shows heap stabilizes with patch (reverted)
→ Escalate: /geniro:follow-up with the proposed patch