with one click
hunt
// Finds root cause of errors, crashes, regressions, screenshot-reported defects, unexpected behavior, and failing tests before applying any fix. Not for code review or new features.
// Finds root cause of errors, crashes, regressions, screenshot-reported defects, unexpected behavior, and failing tests before applying any fix. Not for code review or new features.
Reviews code diffs and release-ready changes after implementation, runs on-demand project-wide code-quality audits with a 4-axis scorecard, executes approved implementation plans, extracts project-specific constraints from repository context, auto-fixes safe issues, and drives approved release, publish, push, release-reaction, and issue/PR follow-through. Also triages issues and PRs when the user mentions them. Not for exploring ideas, debugging, or document prose review.
Produces distinctive, production-grade UI for components, pages, and visual interfaces, with artifact-grounded screenshot iteration for real app polish. Not for backend logic or data pipelines.
Generate visually striking PPT slides via OpenAI's gpt-image-2 -- 10 curated styles (Spatial Glass / Tech Blue / Editorial Mono / Dark Aurora / Risograph / Wabi / Swiss Grid / Hand Sketch / Y2K Chrome / Retro Vector) plus a template-clone mode that mimics any user-supplied .pptx; ships an HTML viewer and a 16:9 .pptx. Use when the user asks to make a presentation, slides, deck, pitch deck, investor PPT, magazine-style PPT, or 做一份 PPT / 生成幻灯片 / 用 gpt-image 生成 PPT / 按这个模板生成 PPT.
Runs a budget-aware Agent Health audit for Codex, Claude Code, agent instructions, verifier surfaces, and AI maintainability when agents ignore instructions, hooks/MCP fail, validation is missing, or AI-written code is hard to maintain. Flags issues by severity. Not for debugging code or reviewing PRs.
Runs a six-phase research workflow to turn unfamiliar domains or collected sources into publish-ready output. Not for quick lookups or single-file reads.
Fetches any URL or PDF as clean Markdown for reading, quoting, citation, or downstream work. Handles paywalls, JS-heavy pages, X/Twitter, and Chinese platforms via proxy cascade. Not for local text files already in the repo.
| name | hunt |
| description | Finds root cause of errors, crashes, regressions, screenshot-reported defects, unexpected behavior, and failing tests before applying any fix. Not for code review or new features. |
| when_to_use | 排查, 查查, 报错, 崩溃, 不工作, 不对, 跑不通, 以前是好的, 回归, 截图回归, 判断错误原因, 判断为什么报错, 反复修不好, debug, regression, used to work, broke after update, why broken, not working, what's wrong, fix error, stack trace |
| dispatch_intent | Error, crash, regression, screenshot-reported defect, test failure, stale cache, runtime boundary, why broken |
Prefix your first line with 🥷 inline, not as its own paragraph.
A patch applied to a symptom creates a new bug somewhere else.
Do not touch code until you can state the root cause in one sentence:
"I believe the root cause is [X] because [evidence]."
Name a specific file, function, line, or condition. "A state management issue" is not testable. "Stale cache in useUser at src/hooks/user.ts:42 because the dependency array is missing userId" is testable. If you cannot be that specific, you do not have a hypothesis yet.
Good progress: a log line matches the hypothesis, you can predict the next error before running it, you understand the propagation path from root cause to symptom, you can write a test that fails on the old code. At each of these signals, find one more independent piece of evidence before committing.
Hypothesis quality gate: before acting on a hypothesis, list all observable symptoms (not just the one the user reported first). The hypothesis must explain every symptom; if it only covers some, it is a symptom-level guess, not a root cause. For timing-dependent issues (flicker, intermittent failure, race condition), reproduce reliably before diagnosing.
Rationalization warning: "I'll just try this" means no hypothesis, write it first. "I'm confident" means run an instrument that proves it. "Probably the same issue" means re-read the execution path from scratch. "It works on my machine" means enumerate every env difference before dismissing. "One more restart" means read the last error verbatim; never restart more than twice without new evidence.
See rules/durable-context.md for when to read durable context, the read-order budget, and the memory-type mapping.
For /hunt, diagnostic constraints are decision, preference, and principle entries; pattern and learning can seed hypotheses. Current code, logs, repro steps, tests, environment versions, and remote state override memory. Durable context is hypothesis fuel only. It never replaces a fresh root-cause sentence, a reproducible symptom list, or evidence from the current state.
sw_vers / node --version / grep first. No results = re-examine the path.Activate when: "以前是好的", "之前是好的", "used to work", "上一次提交还是对的", "broke after update", or the user remembers a specific good commit or version.
git tag --sort=-version:refname | head -10 or ask the user for the last known-good commit.git bisect start && git bisect bad HEAD && git bisect good <tag-or-hash>git bisect good or git bisect bad.git bisect reset when done.Read large files once and reference from notes rather than re-reading at each bisect step.
Activate when the user says the same issue is still wrong after a fix, provides a "good" screenshot/version/file, or describes a visual result as previously correct.
Treat the reference as evidence, not decoration:
If the issue is purely subjective UI taste, route to /design. If it is rendering, state, timing, build output, font generation, or a regression from a known-good version, stay in /hunt.
Activate after fixing a root-cause pattern, before declaring the bug done; also when the user says "举一反三", "举一反三深入看看", or "其他地方有没有同样问题". The same shape often hides in N other places; one local fix that ignores the blast leaves N - 1 bugs in the tree.
grep -rn <pattern> across the repo (exclude generated dirs, build output, vendored deps). For class-of-bug patterns (e.g. "any handler missing the lock"), grep for the surrounding shape, not just the literal text.Common triggers:
If the blast surfaces unrelated bugs, list them but do not fix in this PR unless the user agrees; scope creep is its own anti-pattern.
Add one targeted instrument: a log line, a failing assertion, or the smallest test that would fail if the hypothesis is correct. Run it. If the evidence contradicts the hypothesis, discard it completely and re-orient with what was just learned. Do not preserve a hypothesis the evidence disproves.
Use this ladder before claiming a bug is fixed:
Compile-only is not enough for UI, native-app, visual, rendering, or generated-artifact bugs. If the runtime check is impossible in the environment, say why and hand off the exact screen, command, or artifact to verify.
For recurring classes of failures, load references/failure-patterns.md before adding a second fix.
Use logs as a scalpel, not as noise. Before adding a log, write the question it answers:
"If this log prints X before Y, hypothesis A is still possible; if it does not, hypothesis A is wrong."
Load references/logging-techniques.md for the full logging playbook: binary-search instrumentation, discriminating log content, boundary-first placement, timing bug logging, and removal discipline.
Quick rules:
If adding logs changes the behavior, treat that as evidence of a timing, lifecycle, or concurrency problem.
| What happened | Rule |
|---|---|
| Patched client pane instead of local pane | Trace the execution path backward before touching any file |
| MCP not loading, switched tools instead of diagnosing | Check server status, API key, config before switching methods |
| Orchestrator said RUNNING but TTS vendor was misconfigured | In multi-stage pipelines, test each stage in isolation |
| Race condition diagnosed as a stale-state bug | For timing-sensitive issues, inspect event timestamps and ordering before state |
| Added logs everywhere and still could not explain the bug | Rewrite each log as a yes/no question. Delete logs that do not rule a hypothesis in or out |
| Reproduced locally but failed in CI | Align the environment first (runtime version, env vars, timezone), then chase the code |
| Stack trace points deep into a library | Walk back 3 frames into your own code; the bug is almost always there, not in the dependency |
| Worked when launched from app, broke when opened via file association / drag-drop / deep link / external proxy | Reproduce using the exact entry point the user described. App-internal init differs from cold-launch-with-file init; state may not be ready when the document arrives. |
| Build passed but UI still looked wrong | Move up the Runtime Evidence Ladder and verify the real rendered surface or artifact. |
Root cause: [what was wrong, file:line]
Fix: [what changed, file:line]
Confirmed: [evidence or test that proves the fix]
Tests: [pass/fail count, regression test location]
Regression guard: [test file:line] or [none, reason]
Status: resolved, resolved with caveats (state them), or blocked (state what is unknown).
Regression guard rule: for any bug that recurred or was previously "fixed", the fix is not done until:
Symptom:
[Original error description, one sentence]
Hypotheses Tested:
1. [Hypothesis 1] → [Test method] → [Result: ruled out because...]
2. [Hypothesis 2] → [Test method] → [Result: ruled out because...]
3. [Hypothesis 3] → [Test method] → [Result: ruled out because...]
Evidence Collected:
- [Log snippets / stack traces / file content]
- [Reproduction steps]
- [Environment info: versions, config, runtime]
Ruled Out:
- [Root causes that have been eliminated]
Unknowns:
- [What is still unclear]
- [What information is missing]
Suggested Next Steps:
1. [Next investigation direction]
2. [External tools or permissions that may be needed]
3. [Additional context the user should provide]
Status: blocked
Activate when: "PDF looks wrong", "page break issue", "font not rendering", broken PDF output, or print layout wrong.
Load references/rendering-debug.md for the full diagnosis checklist (WeasyPrint quirks, font loading, page overflow, browser print CSS). Static analysis first, then reproduce if needed.
For input method, character rendering, or text encoding bugs (IME state, cursor drift, emoji splitting, composition events), check references/ime-unicode.md first before forming a hypothesis.