원클릭으로
experiment-design
Transform validated hypotheses into rigorous, executable experiment designs
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Transform validated hypotheses into rigorous, executable experiment designs
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
Strategy: Attack an isomorphism claim by demanding an explicit structure-preserving map and trying to break it. Targets any multi-language claim of the form 'X ≅ Y ≅ … across N mathematical languages'. Forces the claim to either earn the word 'isomorphism' or be demoted to 'analogy'. Methods: category theory (functor/natural-iso criteria), model theory, Lakatos monster-barring.
Strategy: Dialectic engine retuned for truth-seeking, not survival. A defender steelmans a claim into its MOST falsifiable form, a critic attacks to refute it, a judge classifies the exchange into BROKEN/CORROBORATED/UNFALSIFIABLE — the judge does NOT pick a winner or score persuasiveness. Methods: Irving debate (repurposed), Toulmin argumentation, Mayo severe testing.
Strategy: Run BEFORE building any validator (sandbox/simulation/benchmark). Builds a non-circularity matrix of theory-claim × validator-assumption to detect when a validator would 'confirm' a theory only because it was built on the theory's own premises. A circular validator's PASS carries zero evidential weight. Methods: Cartwright nomological machines, Winsberg sanctioning-of-simulations, tautology detection.
Strategy: Attack a beautiful unified result on the suspicion that its beauty is the bug. Distinguishes EARNED simplicity (forbids/predicts/subsumes) from DECORATIVE simplicity (re-describes/relabels/accommodates). Directly serves the Occam aesthetic by making it a falsifiable bar, not a vibe. Methods: Sober parsimony-as-evidence, MDL, Meehl risky prediction, accommodation-vs-prediction.
Campaign: Truth-seeking adversarial validation for scientific research artifacts (NOT publication defense). Core question: Where have we fooled ourselves, and is each load-bearing claim even falsifiable? Win-condition is INVERTED from survival/resilience to active refutation. Methods: Popper falsificationism, Lakatos Proofs and Refutations, Mayo severe testing, Platt strong inference.
Strategy: Attack the evidential weight of an 'independent convergence' claim. When N reasoning paths all reach the same conclusion, the confidence boost is real only if the paths were actually independent. Measures shared-prior / shared-blindspot contamination and corrects the over-counted confidence. Methods: Bayesian agreement-as-evidence, correlated-error analysis, jury theorem assumptions.
| name | experiment-design |
| description | Transform validated hypotheses into rigorous, executable experiment designs |
| version | 1.0.0 |
| category | experiment-execution |
| type | campaign |
| strategies | ["factor-level-design","ablation-design","comparison-design","scaling-design","robustness-design"] |
| tactics | ["statistical-method-selection","reproducibility-protocol","budget-constrained-design"] |
| dependencies | {"strategies":["ablation-design","comparison-design","experiment-execution-factor-level-design","robustness-design","scaling-design"],"tactics":["budget-constrained-design","reproducibility-protocol","statistical-method-selection"],"sops":["context-checkpoint","context-init","design-synthesis","experiment-execution-paper-overview","experiment-execution-paper-research","experiment-execution-paper-search","experiment-execution-quality-gate-check","experiment-execution-saturation-detection","experiment-execution-web-research","experiment-execution-web-search"]} |
Positioning: What experiment to run — transform a validated hypothesis into a rigorous experiment design that maximizes information yield per compute dollar.
Before entering this campaign, the following must be satisfied:
| Gate | Requirement |
|---|---|
| Hypothesis | A falsifiable hypothesis with clearly stated IV/DV exists |
| Scope | Research question is bounded (not open-ended exploration) |
| Resources | Preliminary compute/time budget is stated |
| Prior Work | Relevant baselines and datasets have been identified |
If any gate fails, route back to hypothesis-generation or research-question refinement.
Produce a complete experiment design document that specifies:
| Signal in Hypothesis | Strategy | When to Use |
|---|---|---|
| "Factor X affects Y" | factor-level-design | Testing effects of specific variables |
| "Component C contributes to performance" | ablation-design | Understanding component contributions |
| "Method M outperforms baseline B" | comparison-design | Claiming superiority over existing work |
| "Performance scales with resource R" | scaling-design | Understanding scaling behavior |
| "Method works under condition C" | robustness-design | Testing failure boundaries |
Multiple strategies may be composed for complex hypotheses.
| Tier | GPU-hours | Max Factors | Max Runs | Strategy Constraint |
|---|---|---|---|---|
| Micro | < 10 | 3 | 20 | Fractional factorial or single ablation |
| Small | 10-100 | 5 | 50 | Full factorial on key factors |
| Medium | 100-1000 | 8 | 200 | Multi-strategy composition |
| Large | > 1000 | Unlimited | Unlimited | Full design space exploration |
Every campaign invocation must produce at minimum:
研究过程经 context-management 落盘,与最终报告分属不同文件:
experiment-design,
建立本 campaign 的过程 context 文件。init 幂等——同 Phase 重入返回原文件。experiment-design-report 文件落盘(见该 SOP)。Optional, no fixed order; the final leaf is always a sop.
| Strategy | When to use |
|---|---|
| ablation-design | Design ablation studies to isolate component contributions in ML systems |
| comparison-design | Design fair comparison experiments against baselines and competing methods |
| experiment-execution-factor-level-design | Design factorial experiments to test how specific factors affect outcomes |
| robustness-design | Design experiments to identify failure boundaries and robustness limits |
| scaling-design | Design scaling experiments to characterize performance-resource relationships |
Optional, no fixed order; the final leaf is always a sop.
| Tactic | When to use |
|---|---|
| budget-constrained-design | Optimize experiment design under compute and time budget constraints |
| reproducibility-protocol | Ensure experiment reproducibility through systematic environment and seed control |
| statistical-method-selection | Select appropriate statistical methods for experiment analysis |
Optional, no fixed order; the final leaf is always a sop.
| SOP | When to use |
|---|---|
| context-checkpoint | Append research process and results to the current Phase's context file. Each append MUST contain >=500 lines of markdown covering both process and results. Use this skill at plan-designated checkpoint points — typically after each strategy completes or at key decision nodes within a research Phase. |
| context-init | Create a new context file for a research Phase. Called once at Phase start to initialize the file that subsequent context-checkpoint calls will append to. Use this skill whenever a new research Phase begins and a fresh context file is needed. |
| design-synthesis | SOP: synthesize complete experiment design report |
| experiment-execution-paper-overview | Import SOP: paper landscape scan (from literature-engine skill) |
| experiment-execution-paper-research | Import SOP: paper full-text reading (from literature-engine skill) |
| experiment-execution-paper-search | Import SOP: paper AI summary reading (from literature-engine skill) |
| experiment-execution-quality-gate-check | Shared SOP: verify quality gate criteria are met before proceeding |
| experiment-execution-saturation-detection | Shared SOP: detect information saturation — know when to stop searching/analyzing |
| experiment-execution-web-research | Import SOP: deep full-page content analysis (from web-browsing skill) |
| experiment-execution-web-search | Import SOP: quick web scan discovery (from web-browsing skill) |