| name | experiment |
| description | Scientific method for development rules. Every rule is a hypothesis. Nothing proven until tested. Triggers: experiment, hypothesis, prove, test rule, validate methodology, scientific, evidence. |
| allowed-tools | Read, Grep, Bash, Write |
Rules without evidence are superstitions. Apply scientific method: register hypothesis, design experiment, execute, record evidence, update confidence, graduate or kill.
Deep patterns, SQL schema, domain design templates, confidence-scoring examples: skills/experiment/reference/experiment-research.md
--domain : filter to domain (methodology|coordination|testing|git|security|performance|quality)
--status : filter by status (unproven|testing|supported|refuted|graduated)
--confidence : filter by minimum confidence
-
Seed — parse CLAUDE.md for imperative/assertive statements → register as hypotheses
- (gate: each hypothesis has id, statement, source, domain, status=unproven, confidence=0.0)
- Auto-assign H001, H002, … ; classify domain from list
agentdb learn hypothesis "H{N}: {statement}" "{source}:{line}"
-
Design — for each hypothesis under test, define experiment BEFORE running anything
- method: what you will do (A/B, timed comparison, fuzz, track-and-count)
- measurement: quantitative metric preferred (time, error count, rework frequency)
- control condition: what happens WITHOUT the rule applied
- pass_criteria: specific observable that supports the hypothesis
- fail_criteria: specific observable that refutes it
- (gate: experiment_designed — must be falsifiable; if no outcome could refute, redesign)
-
Execute — run the experiment; record raw observations
- (gate: control condition was actually tested, not just assumed)
- minimum sample sizes: methodology/coordination/git/quality ≥ 3 comparisons; testing ≥ 5 tasks per condition; security ≥ 50 fuzz inputs
-
Score — apply Bayesian update to confidence
- supports:
confidence += (1 - confidence) * 0.25
- refutes:
confidence -= confidence * 0.3
- inconclusive: no change (record in evidence log)
agentdb learn experiment "H{N} result={supports|refutes|inconclusive}" "{evidence}"
-
Transition — update hypothesis status per lifecycle rules
- unproven → testing: first experiment registered
- testing → supported: confidence ≥ 0.8 AND evidence_for ≥ 3 AND ratio ≥ 3:1
- testing → refuted: confidence < 0.2 AND evidence_against ≥ 2
- supported → graduated: human approval after sustained confidence
- refuted → killed: human approval to remove from rules
- any → unproven: rule is modified (resets all evidence)
- (gate: evidence_recorded — verdict must include specific, measurable evidence string)
-
Report — surface verdict
- verdict: SUPPORTED | REFUTED | INCONCLUSIVE
- confidence: current 0.0–1.0 value
- evidence: required (specific, measurable, not narrative)
- if GRADUATED: propose rule promotion to CLAUDE.md with human approval
- if KILLED: propose rule removal from CLAUDE.md with human approval
<anti_patterns>
Confirm a hypothesis without running a real experiment.
Use a single data point to graduate a hypothesis.
Ignore refuting evidence because the rule "feels right".
Test a hypothesis with a method that can only confirm (design for falsifiability).
Modify the hypothesis after seeing results (that is a new hypothesis).
</anti_patterns>
<on_complete>
agentdb write-end '{"skill":"experiment","hypotheses_tested":N,"graduated":[],"killed":[],"inconclusive":[]}'
</on_complete>