with one click
result-diagnosis
// Diagnose surprising or negative ML/AI results. Use when methods fail, metrics conflict, seeds vary, baselines win, plots look suspicious, or next action is unclear.
// Diagnose surprising or negative ML/AI results. Use when methods fail, metrics conflict, seeds vary, baselines win, plots look suspicious, or next action is unclear.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | result-diagnosis |
| description | Diagnose surprising or negative ML/AI results. Use when methods fail, metrics conflict, seeds vary, baselines win, plots look suspicious, or next action is unclear. |
| argument-hint | [project-dir] [--result <summary>] [--mode quick|full|debug|decision] |
| allowed-tools | Read, Write, Edit, Bash, Glob, WebSearch, WebFetch |
Diagnose what an experiment result means for the project. This skill is for decision-making after results exist, especially when they are negative, surprising, unstable, or hard to interpret.
Use this skill when:
Do not use this skill to write a polished report. Pair it with experiment-report-writer after the diagnosis is clear.
Pair this skill with:
research-project-memory when the diagnosis should update claims, evidence, risks, actions, or worktree statusexperiment-report-writer when results need a shareable reportalgorithm-design-planner when the diagnosis points to method revisionexperiment-design-planner when the diagnosis requires a new controlled experimentrun-experiment when the next step is a rerun, sanity check, or ablationconference-writing-adapter when the right action is to narrow or reframe paper claims<installed-skill-dir>/
āāā SKILL.md
āāā references/
āāā diagnosis-taxonomy.md
āāā evidence-audit.md
āāā next-decision-rules.md
āāā report-template.md
āāā triage-protocol.md
references/diagnosis-taxonomy.md, references/triage-protocol.md, and references/next-decision-rules.md.references/evidence-audit.md when inspecting logs, configs, metrics, plots, runs, or code state.references/report-template.md for full diagnosis reports.Extract:
Rewrite vague input into:
Expected [method] to improve [metric/diagnostic] over [baseline] on [setting], but observed [result] under [controls].
If expected behavior was never defined, route back to experiment-design-planner.
Read references/diagnosis-taxonomy.md.
Classify the primary symptom:
Then classify likely diagnosis categories:
Read references/evidence-audit.md.
Prefer primary artifacts:
Mark missing evidence rather than guessing.
Read references/triage-protocol.md.
Use this order:
Stop early only when a blocking bug or invalid comparison is found.
For each plausible explanation, state:
At minimum consider:
Read references/next-decision-rules.md.
Choose one primary decision:
debug: result is not trustworthy until a bug or provenance issue is resolvedrerun: result is plausible but underpowered or missing controlsablate: result needs mechanism isolationrevise-method: mechanism likely needs design changenarrow-claim: evidence supports a smaller or different claimwrite: evidence is trustworthy enough to reportpark: result is inconclusive and not worth immediate computekill: claim or direction is falsified under fair controlsDo not pick write if basic provenance or fairness is unresolved.
Use references/report-template.md for full reports.
If saving to a project and no path is given, use:
docs/diagnosis/result_diagnosis_YYYY-MM-DD_<short-name>.md
Required output:
# Result Diagnosis: [Short Name]
## Result Snapshot
## Expected vs Observed
## Symptom Classification
## Evidence Checked
## Competing Explanations
## Most Likely Diagnosis
## Decision
## Next Checks or Actions
## Claim Impact
## Project Memory Writeback
If the project uses research-project-memory, update:
memory/evidence-board.md: observed result, limitations, and source pathsmemory/provenance-board.md: mark result provenance verified, stale, contradictory, or missing when diagnosis depends on source validitymemory/claim-board.md: claims supported, weakened, revised, evidence-needed, provisional, parked, or cutmemory/risk-board.md: bugs, metric risks, baseline risks, mechanism risks, or claim risksmemory/action-board.md: debug, rerun, ablation, method revision, writing, park, or kill actionsmemory/handoff-board.md: create handoffs to method design, experiment design, paper evidence, or writing when diagnosis changes downstream workmemory/phase-dashboard.md: update the active gate when diagnosis advances evidence production or regresses the project to debugging, method revision, or claim narrowingmemory/decision-log.md: durable decisions such as killing a claim, changing method, or narrowing scope.agent/worktree-status.md: latest result and exit condition if a branch/worktree is involvedUse observed for verified results and inferred for explanations. Mark stale claims explicitly.
Before finalizing: