ワンクリックで
experiment
// Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.
// Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.
Autonomous multi-session campaign agent. Decomposes large work into phases, delegates to sub-agents, reviews output, and maintains campaign state across context windows. Use for work that spans multiple sessions and needs persistent state, quality judgment, and strategic decomposition.
Intake-to-delivery pipeline. Processes pending items from .planning/intake/: briefs new ideas, executes approved work through research → plan → build → verify. Drop a file in .planning/intake/ and invoke this skill.
Generates and maintains a design manifest for visual consistency. In existing projects, reads current styles and documents the design language. In new projects, asks a few questions and generates a starter manifest. The post-edit hook reads the manifest and flags deviations.
Unified router that auto-routes user intent to the right orchestrator or skill. Classifies input by scope, complexity, persistence needs, and parallelism, then dispatches to the cheapest path that can handle it: direct command, skill, marshal, archon, or fleet. Single entry point for all work.
Research-driven multi-cycle improvement director. Forms causal hypotheses about why scores are low, validates them with scout agents before attacking, dispatches axis-parallel fleet attacks, extracts transferable patterns, and runs indefinitely within a budget envelope. Accumulates a persistent belief model and pattern library across sessions.
Parallel campaign orchestrator. Runs multiple campaigns in coordinated waves within a single session. Spawns 2-3 agents per wave in isolated worktrees, collects discoveries, shares context between waves. Use when work decomposes into 3+ independent streams that can run simultaneously.
| name | experiment |
| description | Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns. |
| user-invocable | true |
| auto-trigger | false |
| last-updated | "2026-03-21T00:00:00.000Z" |
The user provides three things:
npm run build 2>&1 | tail -1 | grep -oP '\d+')If any input is missing, ask for it. The metric MUST output a single number to stdout.
Baseline: {value} ({metric command})For each iteration (up to budget):
isolation: "worktree")node scripts/run-with-timeout.js 300)Iteration {N}: {value} ({delta from baseline}) → {KEEP|DISCARD}
Change: {one-line description of what was tried}
After each iteration, check:
Write results to .planning/research/experiment-{slug}.md:
# Experiment: {Description}
> Metric: `{command}`
> Direction: {lower|higher} is better
> Scope: {glob pattern}
> Budget: {N iterations}
> Date: {ISO date}
## Results
| Iteration | Value | Delta | Verdict | Change |
|-----------|-------|-------|---------|--------|
| baseline | {N} | — | — | — |
| 1 | {N} | {+/-} | KEEP | {desc} |
| 2 | {N} | {+/-} | DISCARD | {desc} |
## Outcome
- **Start**: {baseline}
- **End**: {final value}
- **Improvement**: {percentage}
- **Iterations**: {kept}/{total}
- **Stop reason**: {convergence|diminishing|budget}
## Kept Changes
{List of changes that were kept, with commit hashes}
Also log to .planning/telemetry/agent-runs.jsonl:
{"event":"experiment-complete","slug":"{slug}","baseline":0,"final":0,"improvement":"0%","kept":0,"total":0,"timestamp":"ISO"}
| Goal | Metric Command |
|---|---|
| Reduce bundle size | npm run build 2>&1 | grep -oP 'Total size: \K\d+' |
| Reduce type errors | npx tsc --noEmit 2>&1 | grep -c 'error TS' |
| Increase test pass rate | npm test 2>&1 | grep -oP '\d+ passing' |
| Reduce file count | find src -name '*.ts' | wc -l |
| Reduce line count | wc -l src/**/*.ts | tail -1 | awk '{print $1}' |
Disclosure: "Running experiment loop on [target] with fitness: [function]. Each iteration commits. Budget: [N iterations]."
Reversibility: amber — modifies source files across iterations; each iteration is committed; undo with git revert on kept commits.
Trust gates:
.planning/research/experiment-{slug}.md with all iteration rows filledMetric command outputs nothing or non-numeric text: Treat as a metric failure. Ask the user to provide a command that outputs a single number to stdout before starting iterations.
No worktree support (e.g., shallow clone): Fall back to branch isolation. Create a branch, run changes there, measure, then delete or merge the branch. Never modify the working tree directly.
If .planning/research/ does not exist: Create it before writing the experiment report. If .planning/ itself doesn't exist, create the full path or output the report inline.
Budget exhausted with zero kept iterations: Report outcome as "no improvement found". This is a valid result — do not continue past the budget.
---HANDOFF---
- Experiment: {description}
- Result: {baseline} → {final} ({improvement}%)
- Kept: {N}/{total} iterations
- Stop reason: {reason}
- Report: .planning/research/experiment-{slug}.md
- Reversibility: amber — undo kept iterations with `git revert` on each kept commit
---