ワンクリックで
experiment-design
Transform validated hypotheses into rigorous, executable experiment designs
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Transform validated hypotheses into rigorous, executable experiment designs
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
Strategy: Attack an isomorphism claim by demanding an explicit structure-preserving map and trying to break it. Targets any multi-language claim of the form 'X ≅ Y ≅ … across N mathematical languages'. Forces the claim to either earn the word 'isomorphism' or be demoted to 'analogy'. Methods: category theory (functor/natural-iso criteria), model theory, Lakatos monster-barring.
Strategy: Dialectic engine retuned for truth-seeking, not survival. A defender steelmans a claim into its MOST falsifiable form, a critic attacks to refute it, a judge classifies the exchange into BROKEN/CORROBORATED/UNFALSIFIABLE — the judge does NOT pick a winner or score persuasiveness. Methods: Irving debate (repurposed), Toulmin argumentation, Mayo severe testing.
Strategy: Run BEFORE building any validator (sandbox/simulation/benchmark). Builds a non-circularity matrix of theory-claim × validator-assumption to detect when a validator would 'confirm' a theory only because it was built on the theory's own premises. A circular validator's PASS carries zero evidential weight. Methods: Cartwright nomological machines, Winsberg sanctioning-of-simulations, tautology detection.
Strategy: Attack a beautiful unified result on the suspicion that its beauty is the bug. Distinguishes EARNED simplicity (forbids/predicts/subsumes) from DECORATIVE simplicity (re-describes/relabels/accommodates). Directly serves the Occam aesthetic by making it a falsifiable bar, not a vibe. Methods: Sober parsimony-as-evidence, MDL, Meehl risky prediction, accommodation-vs-prediction.
Campaign: Truth-seeking adversarial validation for scientific research artifacts (NOT publication defense). Core question: Where have we fooled ourselves, and is each load-bearing claim even falsifiable? Win-condition is INVERTED from survival/resilience to active refutation. Methods: Popper falsificationism, Lakatos Proofs and Refutations, Mayo severe testing, Platt strong inference.
Strategy: Attack the evidential weight of an 'independent convergence' claim. When N reasoning paths all reach the same conclusion, the confidence boost is real only if the paths were actually independent. Measures shared-prior / shared-blindspot contamination and corrects the over-counted confidence. Methods: Bayesian agreement-as-evidence, correlated-error analysis, jury theorem assumptions.
| name | experiment-design |
| description | Transform validated hypotheses into rigorous, executable experiment designs |
| version | 1.0.0 |
| category | experiment-execution |
| type | campaign |
| strategies | ["factor-level-design","ablation-design","comparison-design","scaling-design","robustness-design"] |
| tactics | ["statistical-method-selection","reproducibility-protocol","budget-constrained-design"] |
| dependencies | {"strategies":["ablation-design","comparison-design","experiment-execution-factor-level-design","robustness-design","scaling-design"],"tactics":["budget-constrained-design","reproducibility-protocol","statistical-method-selection"],"sops":["context-checkpoint","context-init","design-synthesis","experiment-execution-paper-overview","experiment-execution-paper-research","experiment-execution-paper-search","experiment-execution-quality-gate-check","experiment-execution-saturation-detection","experiment-execution-web-research","experiment-execution-web-search"]} |
Positioning: What experiment to run — transform a validated hypothesis into a rigorous experiment design that maximizes information yield per compute dollar.
Before entering this campaign, the following must be satisfied:
| Gate | Requirement |
|---|---|
| Hypothesis | A falsifiable hypothesis with clearly stated IV/DV exists |
| Scope | Research question is bounded (not open-ended exploration) |
| Resources | Preliminary compute/time budget is stated |
| Prior Work | Relevant baselines and datasets have been identified |
If any gate fails, route back to hypothesis-generation or research-question refinement.
Produce a complete experiment design document that specifies:
| Signal in Hypothesis | Strategy | When to Use |
|---|---|---|
| "Factor X affects Y" | factor-level-design | Testing effects of specific variables |
| "Component C contributes to performance" | ablation-design | Understanding component contributions |
| "Method M outperforms baseline B" | comparison-design | Claiming superiority over existing work |
| "Performance scales with resource R" | scaling-design | Understanding scaling behavior |
| "Method works under condition C" | robustness-design | Testing failure boundaries |
Multiple strategies may be composed for complex hypotheses.
| Tier | GPU-hours | Max Factors | Max Runs | Strategy Constraint |
|---|---|---|---|---|
| Micro | < 10 | 3 | 20 | Fractional factorial or single ablation |
| Small | 10-100 | 5 | 50 | Full factorial on key factors |
| Medium | 100-1000 | 8 | 200 | Multi-strategy composition |
| Large | > 1000 | Unlimited | Unlimited | Full design space exploration |
Every campaign invocation must produce at minimum:
研究过程经 context-management 落盘,与最终报告分属不同文件:
experiment-design,
建立本 campaign 的过程 context 文件。init 幂等——同 Phase 重入返回原文件。experiment-design-report 文件落盘(见该 SOP)。Optional, no fixed order; the final leaf is always a sop.
| Strategy | When to use |
|---|---|
| ablation-design | Design ablation studies to isolate component contributions in ML systems |
| comparison-design | Design fair comparison experiments against baselines and competing methods |
| experiment-execution-factor-level-design | Design factorial experiments to test how specific factors affect outcomes |
| robustness-design | Design experiments to identify failure boundaries and robustness limits |
| scaling-design | Design scaling experiments to characterize performance-resource relationships |
Optional, no fixed order; the final leaf is always a sop.
| Tactic | When to use |
|---|---|
| budget-constrained-design | Optimize experiment design under compute and time budget constraints |
| reproducibility-protocol | Ensure experiment reproducibility through systematic environment and seed control |
| statistical-method-selection | Select appropriate statistical methods for experiment analysis |
Optional, no fixed order; the final leaf is always a sop.
| SOP | When to use |
|---|---|
| context-checkpoint | Append research process and results to the current Phase's context file. Each append MUST contain >=500 lines of markdown covering both process and results. Use this skill at plan-designated checkpoint points — typically after each strategy completes or at key decision nodes within a research Phase. |
| context-init | Create a new context file for a research Phase. Called once at Phase start to initialize the file that subsequent context-checkpoint calls will append to. Use this skill whenever a new research Phase begins and a fresh context file is needed. |
| design-synthesis | SOP: synthesize complete experiment design report |
| experiment-execution-paper-overview | Import SOP: paper landscape scan (from literature-engine skill) |
| experiment-execution-paper-research | Import SOP: paper full-text reading (from literature-engine skill) |
| experiment-execution-paper-search | Import SOP: paper AI summary reading (from literature-engine skill) |
| experiment-execution-quality-gate-check | Shared SOP: verify quality gate criteria are met before proceeding |
| experiment-execution-saturation-detection | Shared SOP: detect information saturation — know when to stop searching/analyzing |
| experiment-execution-web-research | Import SOP: deep full-page content analysis (from web-browsing skill) |
| experiment-execution-web-search | Import SOP: quick web scan discovery (from web-browsing skill) |