Run any Skill in Manus with one click

experiment-pipeline

Stage 2 of the research workflow: turn a validated idea into implemented experiments, completed runs, analyzed results, and a writing-ready narrative package. Use when the user wants the experiment stage only.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/tqLi99/claude-skills-for-writing --skill experiment-pipeline

Copy and paste this command into Claude Code to install the skill

Source

tqLi99/claude-skills-for-writing

Stars0

Forks0

UpdatedApril 2, 2026 at 02:21

SKILL.md

readonly

name	experiment-pipeline
description	Stage 2 of the research workflow: turn a validated idea into implemented experiments, completed runs, analyzed results, and a writing-ready narrative package. Use when the user wants the experiment stage only.

Experiment Pipeline

Run the experiment stage for: $ARGUMENTS

Purpose

This skill handles only the experiment stage. It starts from a validated idea and ends with completed results plus a writing-ready summary.

Use this skill to go from:

validated idea -> experiment plan -> implementation -> launched runs -> monitored jobs -> analyzed results -> narrative package

Do not use this skill to:

discover or validate the core research idea from scratch
draft the paper
generate final publication figures

Those belong to /idea-discovery and /paper-writing.

Design Principles

Plan before coding. The run matrix should be explicit before jobs are launched.
Implementation must serve evaluation. Save outputs in ways that later analysis and paper writing can consume.
Results before narrative. Do not write a polished story until the runs are complete and analyzed.
Reproducibility is mandatory. Configs, seeds, metrics, and outputs must be recoverable.

Domain Persona

When the topic is multi-agent formation control or cooperative control, work like a senior IEEE controls researcher:

design experiments that test stability, tracking, collision avoidance, communication failures, graph variation, and scalability
distinguish theorem-validating experiments from engineering performance experiments
treat unsupported robustness claims as a hard blocker for paper-stage overclaiming

Constants

STRATEGY_MODEL = opus - Use for deciding the minimum convincing experiment package and judging evidence sufficiency
EXECUTION_MODEL = sonnet - Use for implementation, logging, configs, result extraction, and narrative packaging
REVIEWER_MODEL = gpt-5.4 - Optional for plan or result sanity review when needed
AUTO_PROCEED = true - Continue across phases unless blocked
DEFAULT_SEEDS = 3 - Use more only when justified
TARGET_OUTPUT_PLAN = EXPERIMENT_PLAN.md
TARGET_OUTPUT_RESULTS = RESULTS_SUMMARY.md
TARGET_OUTPUT_NARRATIVE = NARRATIVE_REPORT.md

If opus or sonnet are not available in the host, preserve the same split with the strongest local replacements.

Required Inputs

Gather at least:

FINAL_PROPOSAL.md or equivalent refined idea package
codebase or experiment workspace
known compute environment or execution target
optional benchmark constraints, evaluation rules, or budget limits

If the idea is still unstable, redirect back to /idea-discovery.

Outputs

This skill should produce:

EXPERIMENT_PLAN.md
implementation changes in the project codebase
completed run artifacts and logs
RESULTS_SUMMARY.md
NARRATIVE_REPORT.md

These outputs should be sufficient for /paper-writing to begin.

Workflow

Phase 1: Convert the Idea into an Experiment Plan

Invoke:

/experiment-plan "$ARGUMENTS"

Goal:

define primary claim tests
define baselines
define ablations
define metrics and datasets
define run order and compute budget

The plan must distinguish:

must-run experiments
should-run experiments
nice-to-have experiments

The Strategy Model decides what is truly necessary for publication-level evidence. The Execution Model writes the concrete run plan.

Phase 2: Bridge Plan to Code

Invoke:

/experiment-bridge "EXPERIMENT_PLAN.md"

Goal:

map experiment blocks to actual code paths
implement missing config, logging, export, and evaluation plumbing
ensure outputs are machine-readable for later analysis

Before launching, verify:

seeds are configurable
outputs save to JSON/CSV or equivalent structured format
key metrics are logged consistently
run names and directories are stable

Phase 3: Launch Experiments

Invoke:

/run-experiment "[command or plan target]"

Goal:

launch the must-run block first
avoid starting the full matrix blindly if early runs already expose failures
preserve reproducible commands and configs

Phase 4: Monitor and Recover

Invoke:

/monitor-experiment "[target]"

Goal:

track job health
catch crashes early
resume or relaunch failed runs when appropriate
record incomplete or invalid runs explicitly

Do not silently drop failed experiments from later summaries.

Phase 5: Analyze Results

Invoke:

/analyze-results "[results directory or key outputs]"

Goal:

aggregate seeds and metrics
compute comparisons
identify which claims are supported, weakly supported, or unsupported
separate real gains from noise

This stage should create RESULTS_SUMMARY.md.

Phase 6: Prepare the Writing Handoff

Write NARRATIVE_REPORT.md from the analyzed results.

It should include:

problem and method summary
experiment setup
key quantitative findings
ablation takeaways
failure cases and limitations
claims that remain unsupported

This document is a bridge to /paper-writing, not a finished paper.

Output Format

EXPERIMENT_PLAN.md should contain:

# Experiment Plan

## Claims to Validate
## Datasets and Metrics
## Baselines
## Ablations
## Run Matrix
## Run Order
## Budget and Risks

RESULTS_SUMMARY.md should contain:

# Results Summary

## Executive Summary
## Main Results
## Ablations
## Robustness / Sensitivity
## Unsupported or Mixed Claims
## Failures and Caveats

NARRATIVE_REPORT.md should contain:

# Narrative Report

## Problem
## Method
## Experimental Setup
## Main Findings
## Interpretation
## Limitations
## Assets for Paper Stage

Quality Gates

Before finishing, verify:

at least one recommended run block has completed successfully
must-run baselines are present or explicitly missing with justification
results are saved in structured files usable by later figure generation
unsupported claims are labeled honestly
the narrative handoff is grounded in actual results

Stop Conditions

Stop and report clearly if:

code implementation is incomplete
compute is unavailable
must-run experiments fail repeatedly
results are too incomplete to support any meaningful claim

In that case, report what is blocked and what minimal next action would unblock it.

Composition

Typical previous command:

/idea-discovery "research direction"

Typical next command:

/paper-writing "NARRATIVE_REPORT.md"

Experiment Pipeline

Run the experiment stage for: $ARGUMENTS

Purpose

This skill handles only the experiment stage. It starts from a validated idea and ends with completed results plus a writing-ready summary.

Use this skill to go from:

validated idea -> experiment plan -> implementation -> launched runs -> monitored jobs -> analyzed results -> narrative package

Do not use this skill to:

discover or validate the core research idea from scratch
draft the paper
generate final publication figures

Those belong to /idea-discovery and /paper-writing.

Design Principles

Plan before coding. The run matrix should be explicit before jobs are launched.
Implementation must serve evaluation. Save outputs in ways that later analysis and paper writing can consume.
Results before narrative. Do not write a polished story until the runs are complete and analyzed.
Reproducibility is mandatory. Configs, seeds, metrics, and outputs must be recoverable.

Domain Persona

When the topic is multi-agent formation control or cooperative control, work like a senior IEEE controls researcher:

design experiments that test stability, tracking, collision avoidance, communication failures, graph variation, and scalability
distinguish theorem-validating experiments from engineering performance experiments
treat unsupported robustness claims as a hard blocker for paper-stage overclaiming

Constants

STRATEGY_MODEL = opus - Use for deciding the minimum convincing experiment package and judging evidence sufficiency
EXECUTION_MODEL = sonnet - Use for implementation, logging, configs, result extraction, and narrative packaging
REVIEWER_MODEL = gpt-5.4 - Optional for plan or result sanity review when needed
AUTO_PROCEED = true - Continue across phases unless blocked
DEFAULT_SEEDS = 3 - Use more only when justified
TARGET_OUTPUT_PLAN = EXPERIMENT_PLAN.md
TARGET_OUTPUT_RESULTS = RESULTS_SUMMARY.md
TARGET_OUTPUT_NARRATIVE = NARRATIVE_REPORT.md

If opus or sonnet are not available in the host, preserve the same split with the strongest local replacements.

Required Inputs

Gather at least:

FINAL_PROPOSAL.md or equivalent refined idea package
codebase or experiment workspace
known compute environment or execution target
optional benchmark constraints, evaluation rules, or budget limits

If the idea is still unstable, redirect back to /idea-discovery.

Outputs

This skill should produce:

EXPERIMENT_PLAN.md
implementation changes in the project codebase
completed run artifacts and logs
RESULTS_SUMMARY.md
NARRATIVE_REPORT.md

These outputs should be sufficient for /paper-writing to begin.

Workflow

Phase 1: Convert the Idea into an Experiment Plan

Invoke:

/experiment-plan "$ARGUMENTS"

Goal:

define primary claim tests
define baselines
define ablations
define metrics and datasets
define run order and compute budget

The plan must distinguish:

must-run experiments
should-run experiments
nice-to-have experiments

The Strategy Model decides what is truly necessary for publication-level evidence. The Execution Model writes the concrete run plan.

Phase 2: Bridge Plan to Code

Invoke:

/experiment-bridge "EXPERIMENT_PLAN.md"

Goal:

map experiment blocks to actual code paths
implement missing config, logging, export, and evaluation plumbing
ensure outputs are machine-readable for later analysis

Before launching, verify:

seeds are configurable
outputs save to JSON/CSV or equivalent structured format
key metrics are logged consistently
run names and directories are stable

Phase 3: Launch Experiments

Invoke:

/run-experiment "[command or plan target]"

Goal:

launch the must-run block first
avoid starting the full matrix blindly if early runs already expose failures
preserve reproducible commands and configs

Phase 4: Monitor and Recover

Invoke:

/monitor-experiment "[target]"

Goal:

track job health
catch crashes early
resume or relaunch failed runs when appropriate
record incomplete or invalid runs explicitly

Do not silently drop failed experiments from later summaries.

Phase 5: Analyze Results

Invoke:

/analyze-results "[results directory or key outputs]"

Goal:

aggregate seeds and metrics
compute comparisons
identify which claims are supported, weakly supported, or unsupported
separate real gains from noise

This stage should create RESULTS_SUMMARY.md.

Phase 6: Prepare the Writing Handoff

Write NARRATIVE_REPORT.md from the analyzed results.

It should include:

problem and method summary
experiment setup
key quantitative findings
ablation takeaways
failure cases and limitations
claims that remain unsupported

This document is a bridge to /paper-writing, not a finished paper.

Output Format

EXPERIMENT_PLAN.md should contain:

# Experiment Plan

## Claims to Validate
## Datasets and Metrics
## Baselines
## Ablations
## Run Matrix
## Run Order
## Budget and Risks

RESULTS_SUMMARY.md should contain:

# Results Summary

## Executive Summary
## Main Results
## Ablations
## Robustness / Sensitivity
## Unsupported or Mixed Claims
## Failures and Caveats

NARRATIVE_REPORT.md should contain:

# Narrative Report

## Problem
## Method
## Experimental Setup
## Main Findings
## Interpretation
## Limitations
## Assets for Paper Stage

Quality Gates

Before finishing, verify:

at least one recommended run block has completed successfully
must-run baselines are present or explicitly missing with justification
results are saved in structured files usable by later figure generation
unsupported claims are labeled honestly
the narrative handoff is grounded in actual results

Stop Conditions

Stop and report clearly if:

code implementation is incomplete
compute is unavailable
must-run experiments fail repeatedly
results are too incomplete to support any meaningful claim

In that case, report what is blocked and what minimal next action would unblock it.

Composition

Typical previous command:

/idea-discovery "research direction"

Typical next command:

/paper-writing "NARRATIVE_REPORT.md"

experiment-pipeline

Experiment Pipeline

Purpose

Design Principles

Domain Persona

Constants

Required Inputs

Outputs

Workflow

Phase 1: Convert the Idea into an Experiment Plan

Phase 2: Bridge Plan to Code

Phase 3: Launch Experiments

Phase 4: Monitor and Recover

Phase 5: Analyze Results

Phase 6: Prepare the Writing Handoff

Output Format

Quality Gates

Stop Conditions

Composition

More from this repository

More from this repository

Experiment Pipeline

Purpose

Design Principles

Domain Persona

Constants

Required Inputs

Outputs

Workflow

Phase 1: Convert the Idea into an Experiment Plan

Phase 2: Bridge Plan to Code

Phase 3: Launch Experiments

Phase 4: Monitor and Recover

Phase 5: Analyze Results

Phase 6: Prepare the Writing Handoff

Output Format

Quality Gates

Stop Conditions

Composition