원클릭으로 Manus에서 모든 스킬 실행

시작하기

experiment-design

스타378

포크27

업데이트2026년 6월 16일 16:55

Transform validated hypotheses into rigorous, executable experiment designs

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

yogsoth-ai

yogsoth-ai/de-anthropocentric-research-engine

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Campaign: Experiment Design

Positioning: What experiment to run — transform a validated hypothesis into a rigorous experiment design that maximizes information yield per compute dollar.

HARD-GATE

Before entering this campaign, the following must be satisfied:

Gate	Requirement
Hypothesis	A falsifiable hypothesis with clearly stated IV/DV exists
Scope	Research question is bounded (not open-ended exploration)
Resources	Preliminary compute/time budget is stated
Prior Work	Relevant baselines and datasets have been identified

If any gate fails, route back to hypothesis-generation or research-question refinement.

Campaign Goal

Produce a complete experiment design document that specifies:

What factors to vary and at what levels
What to measure and how to determine significance
What baselines to compare against
How to ensure reproducibility
An executable configuration ready for implementation

Strategy Selection

Signal in Hypothesis	Strategy	When to Use
"Factor X affects Y"	factor-level-design	Testing effects of specific variables
"Component C contributes to performance"	ablation-design	Understanding component contributions
"Method M outperforms baseline B"	comparison-design	Claiming superiority over existing work
"Performance scales with resource R"	scaling-design	Understanding scaling behavior
"Method works under condition C"	robustness-design	Testing failure boundaries

Multiple strategies may be composed for complex hypotheses.

Budget Gate

Tier	GPU-hours	Max Factors	Max Runs	Strategy Constraint
Micro	< 10	3	20	Fractional factorial or single ablation
Small	10-100	5	50	Full factorial on key factors
Medium	100-1000	8	200	Multi-strategy composition
Large	> 1000	Unlimited	Unlimited	Full design space exploration

Context Management

Each SOP runs as a subagent to preserve main-thread context
Strategy orchestrators pass structured JSON between SOPs
Final design-synthesis SOP compiles all outputs into a single document
Intermediate artifacts are stored as structured data, not prose

Minimum Yield

Every campaign invocation must produce at minimum:

A design matrix (even if single-factor)
Metric specification with significance threshold
Seed protocol
Estimated total compute cost

Context Recording

研究过程经 context-management 落盘，与最终报告分属不同文件：

进入本 campaign 时：import context-init，topic-slug 传 experiment-design，建立本 campaign 的过程 context 文件。init 幂等——同 Phase 重入返回原文件。
每个 strategy 完成后：import context-checkpoint（硬约束，不可跳过），把该 strategy 的过程与中间产出 append 进上一步的过程文件。
最终报告不写进这个过程文件——由本 campaign 的 synthesis SOP（design-synthesis）另起 experiment-design-report 文件落盘（见该 SOP）。

Available Strategies

Optional, no fixed order; the final leaf is always a sop.

Strategy	When to use
ablation-design	Design ablation studies to isolate component contributions in ML systems
comparison-design	Design fair comparison experiments against baselines and competing methods
experiment-execution-factor-level-design	Design factorial experiments to test how specific factors affect outcomes
robustness-design	Design experiments to identify failure boundaries and robustness limits
scaling-design	Design scaling experiments to characterize performance-resource relationships

Available Tactics

Optional, no fixed order; the final leaf is always a sop.

Tactic	When to use
budget-constrained-design	Optimize experiment design under compute and time budget constraints
reproducibility-protocol	Ensure experiment reproducibility through systematic environment and seed control
statistical-method-selection	Select appropriate statistical methods for experiment analysis

Available SOPs

Optional, no fixed order; the final leaf is always a sop.

SOP	When to use
context-checkpoint	Append research process and results to the current Phase's context file. Each append MUST contain >=500 lines of markdown covering both process and results. Use this skill at plan-designated checkpoint points — typically after each strategy completes or at key decision nodes within a research Phase.
context-init	Create a new context file for a research Phase. Called once at Phase start to initialize the file that subsequent context-checkpoint calls will append to. Use this skill whenever a new research Phase begins and a fresh context file is needed.
design-synthesis	SOP: synthesize complete experiment design report
experiment-execution-paper-overview	Import SOP: paper landscape scan (from literature-engine skill)
experiment-execution-paper-research	Import SOP: paper full-text reading (from literature-engine skill)
experiment-execution-paper-search	Import SOP: paper AI summary reading (from literature-engine skill)
experiment-execution-quality-gate-check	Shared SOP: verify quality gate criteria are met before proceeding
experiment-execution-saturation-detection	Shared SOP: detect information saturation — know when to stop searching/analyzing
experiment-execution-web-research	Import SOP: deep full-page content analysis (from web-browsing skill)
experiment-execution-web-search	Import SOP: quick web scan discovery (from web-browsing skill)

이 저장소의 다른 Skills

같은 저장소

isomorphism-falsification

yogsoth-ai/de-anthropocentric-research-engine

Strategy: Attack an isomorphism claim by demanding an explicit structure-preserving map and trying to break it. Targets any multi-language claim of the form 'X ≅ Y ≅ … across N mathematical languages'. Forces the claim to either earn the word 'isomorphism' or be demoted to 'analogy'. Methods: category theory (functor/natural-iso criteria), model theory, Lakatos monster-barring.

2026-06-21378

adversarial-debate-truthseeking

yogsoth-ai/de-anthropocentric-research-engine

Strategy: Dialectic engine retuned for truth-seeking, not survival. A defender steelmans a claim into its MOST falsifiable form, a critic attacks to refute it, a judge classifies the exchange into BROKEN/CORROBORATED/UNFALSIFIABLE — the judge does NOT pick a winner or score persuasiveness. Methods: Irving debate (repurposed), Toulmin argumentation, Mayo severe testing.

2026-06-21378

circular-validation-audit

yogsoth-ai/de-anthropocentric-research-engine

Strategy: Run BEFORE building any validator (sandbox/simulation/benchmark). Builds a non-circularity matrix of theory-claim × validator-assumption to detect when a validator would 'confirm' a theory only because it was built on the theory's own premises. A circular validator's PASS carries zero evidential weight. Methods: Cartwright nomological machines, Winsberg sanctioning-of-simulations, tautology detection.

2026-06-21378

elegance-trap-probe

yogsoth-ai/de-anthropocentric-research-engine

Strategy: Attack a beautiful unified result on the suspicion that its beauty is the bug. Distinguishes EARNED simplicity (forbids/predicts/subsumes) from DECORATIVE simplicity (re-describes/relabels/accommodates). Directly serves the Occam aesthetic by making it a falsifiable bar, not a vibe. Methods: Sober parsimony-as-evidence, MDL, Meehl risky prediction, accommodation-vs-prediction.

2026-06-21378

falsification-first-stress-test

yogsoth-ai/de-anthropocentric-research-engine

Campaign: Truth-seeking adversarial validation for scientific research artifacts (NOT publication defense). Core question: Where have we fooled ourselves, and is each load-bearing claim even falsifiable? Win-condition is INVERTED from survival/resilience to active refutation. Methods: Popper falsificationism, Lakatos Proofs and Refutations, Mayo severe testing, Platt strong inference.

2026-06-21378

independent-convergence-audit

yogsoth-ai/de-anthropocentric-research-engine

Strategy: Attack the evidential weight of an 'independent convergence' claim. When N reasoning paths all reach the same conclusion, the confidence boost is real only if the paths were actually independent. Measures shared-prior / shared-blindspot contamination and corrects the over-counted confidence. Methods: Bayesian agreement-as-evidence, correlated-error analysis, jury theorem assumptions.

2026-06-21378

name	experiment-design
description	Transform validated hypotheses into rigorous, executable experiment designs
version	1.0.0
category	experiment-execution
type	campaign
strategies	["factor-level-design","ablation-design","comparison-design","scaling-design","robustness-design"]
tactics	["statistical-method-selection","reproducibility-protocol","budget-constrained-design"]
dependencies	{"strategies":["ablation-design","comparison-design","experiment-execution-factor-level-design","robustness-design","scaling-design"],"tactics":["budget-constrained-design","reproducibility-protocol","statistical-method-selection"],"sops":["context-checkpoint","context-init","design-synthesis","experiment-execution-paper-overview","experiment-execution-paper-research","experiment-execution-paper-search","experiment-execution-quality-gate-check","experiment-execution-saturation-detection","experiment-execution-web-research","experiment-execution-web-search"]}

Campaign: Experiment Design

Positioning: What experiment to run — transform a validated hypothesis into a rigorous experiment design that maximizes information yield per compute dollar.

HARD-GATE

Before entering this campaign, the following must be satisfied:

Gate	Requirement
Hypothesis	A falsifiable hypothesis with clearly stated IV/DV exists
Scope	Research question is bounded (not open-ended exploration)
Resources	Preliminary compute/time budget is stated
Prior Work	Relevant baselines and datasets have been identified

If any gate fails, route back to hypothesis-generation or research-question refinement.

Campaign Goal

Produce a complete experiment design document that specifies:

What factors to vary and at what levels
What to measure and how to determine significance
What baselines to compare against
How to ensure reproducibility
An executable configuration ready for implementation

Strategy Selection

Signal in Hypothesis	Strategy	When to Use
"Factor X affects Y"	factor-level-design	Testing effects of specific variables
"Component C contributes to performance"	ablation-design	Understanding component contributions
"Method M outperforms baseline B"	comparison-design	Claiming superiority over existing work
"Performance scales with resource R"	scaling-design	Understanding scaling behavior
"Method works under condition C"	robustness-design	Testing failure boundaries

Multiple strategies may be composed for complex hypotheses.

Budget Gate

Tier	GPU-hours	Max Factors	Max Runs	Strategy Constraint
Micro	< 10	3	20	Fractional factorial or single ablation
Small	10-100	5	50	Full factorial on key factors
Medium	100-1000	8	200	Multi-strategy composition
Large	> 1000	Unlimited	Unlimited	Full design space exploration

Context Management

Each SOP runs as a subagent to preserve main-thread context
Strategy orchestrators pass structured JSON between SOPs
Final design-synthesis SOP compiles all outputs into a single document
Intermediate artifacts are stored as structured data, not prose

Minimum Yield

Every campaign invocation must produce at minimum:

A design matrix (even if single-factor)
Metric specification with significance threshold
Seed protocol
Estimated total compute cost

Context Recording

研究过程经 context-management 落盘，与最终报告分属不同文件：

进入本 campaign 时：import context-init，topic-slug 传 experiment-design，建立本 campaign 的过程 context 文件。init 幂等——同 Phase 重入返回原文件。
每个 strategy 完成后：import context-checkpoint（硬约束，不可跳过），把该 strategy 的过程与中间产出 append 进上一步的过程文件。
最终报告不写进这个过程文件——由本 campaign 的 synthesis SOP（design-synthesis）另起 experiment-design-report 文件落盘（见该 SOP）。

Available Strategies

Optional, no fixed order; the final leaf is always a sop.

Strategy	When to use
ablation-design	Design ablation studies to isolate component contributions in ML systems
comparison-design	Design fair comparison experiments against baselines and competing methods
experiment-execution-factor-level-design	Design factorial experiments to test how specific factors affect outcomes
robustness-design	Design experiments to identify failure boundaries and robustness limits
scaling-design	Design scaling experiments to characterize performance-resource relationships

Available Tactics

Optional, no fixed order; the final leaf is always a sop.

Tactic	When to use
budget-constrained-design	Optimize experiment design under compute and time budget constraints
reproducibility-protocol	Ensure experiment reproducibility through systematic environment and seed control
statistical-method-selection	Select appropriate statistical methods for experiment analysis

Available SOPs

Optional, no fixed order; the final leaf is always a sop.

SOP	When to use
context-checkpoint	Append research process and results to the current Phase's context file. Each append MUST contain >=500 lines of markdown covering both process and results. Use this skill at plan-designated checkpoint points — typically after each strategy completes or at key decision nodes within a research Phase.
context-init	Create a new context file for a research Phase. Called once at Phase start to initialize the file that subsequent context-checkpoint calls will append to. Use this skill whenever a new research Phase begins and a fresh context file is needed.
design-synthesis	SOP: synthesize complete experiment design report
experiment-execution-paper-overview	Import SOP: paper landscape scan (from literature-engine skill)
experiment-execution-paper-research	Import SOP: paper full-text reading (from literature-engine skill)
experiment-execution-paper-search	Import SOP: paper AI summary reading (from literature-engine skill)
experiment-execution-quality-gate-check	Shared SOP: verify quality gate criteria are met before proceeding
experiment-execution-saturation-detection	Shared SOP: detect information saturation — know when to stop searching/analyzing
experiment-execution-web-research	Import SOP: deep full-page content analysis (from web-browsing skill)
experiment-execution-web-search	Import SOP: quick web scan discovery (from web-browsing skill)