| name | experiment-pipeline |
| description | Stage 2 of the research workflow: turn a validated idea into implemented experiments, completed runs, analyzed results, and a writing-ready narrative package. Use when the user wants the experiment stage only. |
Experiment Pipeline
Run the experiment stage for: $ARGUMENTS
Purpose
This skill handles only the experiment stage. It starts from a validated idea and ends with completed results plus a writing-ready summary.
Use this skill to go from:
validated idea -> experiment plan -> implementation -> launched runs -> monitored jobs -> analyzed results -> narrative package
Do not use this skill to:
- discover or validate the core research idea from scratch
- draft the paper
- generate final publication figures
Those belong to /idea-discovery and /paper-writing.
Design Principles
- Plan before coding. The run matrix should be explicit before jobs are launched.
- Implementation must serve evaluation. Save outputs in ways that later analysis and paper writing can consume.
- Results before narrative. Do not write a polished story until the runs are complete and analyzed.
- Reproducibility is mandatory. Configs, seeds, metrics, and outputs must be recoverable.
Domain Persona
When the topic is multi-agent formation control or cooperative control, work like a senior IEEE controls researcher:
- design experiments that test stability, tracking, collision avoidance, communication failures, graph variation, and scalability
- distinguish theorem-validating experiments from engineering performance experiments
- treat unsupported robustness claims as a hard blocker for paper-stage overclaiming
Constants
- STRATEGY_MODEL =
opus - Use for deciding the minimum convincing experiment package and judging evidence sufficiency
- EXECUTION_MODEL =
sonnet - Use for implementation, logging, configs, result extraction, and narrative packaging
- REVIEWER_MODEL =
gpt-5.4 - Optional for plan or result sanity review when needed
- AUTO_PROCEED = true - Continue across phases unless blocked
- DEFAULT_SEEDS = 3 - Use more only when justified
- TARGET_OUTPUT_PLAN =
EXPERIMENT_PLAN.md
- TARGET_OUTPUT_RESULTS =
RESULTS_SUMMARY.md
- TARGET_OUTPUT_NARRATIVE =
NARRATIVE_REPORT.md
If opus or sonnet are not available in the host, preserve the same split with the strongest local replacements.
Required Inputs
Gather at least:
FINAL_PROPOSAL.md or equivalent refined idea package
- codebase or experiment workspace
- known compute environment or execution target
- optional benchmark constraints, evaluation rules, or budget limits
If the idea is still unstable, redirect back to /idea-discovery.
Outputs
This skill should produce:
EXPERIMENT_PLAN.md
- implementation changes in the project codebase
- completed run artifacts and logs
RESULTS_SUMMARY.md
NARRATIVE_REPORT.md
These outputs should be sufficient for /paper-writing to begin.
Workflow
Phase 1: Convert the Idea into an Experiment Plan
Invoke:
/experiment-plan "$ARGUMENTS"
Goal:
- define primary claim tests
- define baselines
- define ablations
- define metrics and datasets
- define run order and compute budget
The plan must distinguish:
- must-run experiments
- should-run experiments
- nice-to-have experiments
The Strategy Model decides what is truly necessary for publication-level evidence. The Execution Model writes the concrete run plan.
Phase 2: Bridge Plan to Code
Invoke:
/experiment-bridge "EXPERIMENT_PLAN.md"
Goal:
- map experiment blocks to actual code paths
- implement missing config, logging, export, and evaluation plumbing
- ensure outputs are machine-readable for later analysis
Before launching, verify:
- seeds are configurable
- outputs save to JSON/CSV or equivalent structured format
- key metrics are logged consistently
- run names and directories are stable
Phase 3: Launch Experiments
Invoke:
/run-experiment "[command or plan target]"
Goal:
- launch the must-run block first
- avoid starting the full matrix blindly if early runs already expose failures
- preserve reproducible commands and configs
Phase 4: Monitor and Recover
Invoke:
/monitor-experiment "[target]"
Goal:
- track job health
- catch crashes early
- resume or relaunch failed runs when appropriate
- record incomplete or invalid runs explicitly
Do not silently drop failed experiments from later summaries.
Phase 5: Analyze Results
Invoke:
/analyze-results "[results directory or key outputs]"
Goal:
- aggregate seeds and metrics
- compute comparisons
- identify which claims are supported, weakly supported, or unsupported
- separate real gains from noise
This stage should create RESULTS_SUMMARY.md.
Phase 6: Prepare the Writing Handoff
Write NARRATIVE_REPORT.md from the analyzed results.
It should include:
- problem and method summary
- experiment setup
- key quantitative findings
- ablation takeaways
- failure cases and limitations
- claims that remain unsupported
This document is a bridge to /paper-writing, not a finished paper.
Output Format
EXPERIMENT_PLAN.md should contain:
# Experiment Plan
## Claims to Validate
## Datasets and Metrics
## Baselines
## Ablations
## Run Matrix
## Run Order
## Budget and Risks
RESULTS_SUMMARY.md should contain:
# Results Summary
## Executive Summary
## Main Results
## Ablations
## Robustness / Sensitivity
## Unsupported or Mixed Claims
## Failures and Caveats
NARRATIVE_REPORT.md should contain:
# Narrative Report
## Problem
## Method
## Experimental Setup
## Main Findings
## Interpretation
## Limitations
## Assets for Paper Stage
Quality Gates
Before finishing, verify:
- at least one recommended run block has completed successfully
- must-run baselines are present or explicitly missing with justification
- results are saved in structured files usable by later figure generation
- unsupported claims are labeled honestly
- the narrative handoff is grounded in actual results
Stop Conditions
Stop and report clearly if:
- code implementation is incomplete
- compute is unavailable
- must-run experiments fail repeatedly
- results are too incomplete to support any meaningful claim
In that case, report what is blocked and what minimal next action would unblock it.
Composition
Typical previous command:
/idea-discovery "research direction"
Typical next command:
/paper-writing "NARRATIVE_REPORT.md"