| name | explore-run |
| description | Rigor Improve / Rigor Explore run leaf skill for bounded exploratory evidence in deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with fair-comparison caveats and no-overclaim summaries in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline execution, conservative training verification, default routing, verified SOTA claims, or implicit experimentation. |
explore-run
Use this as the Rigor Improve / Rigor Explore run leaf skill. The installed slug
remains explore-run for compatibility.
Use the shared operating principles in
../../references/agent-operating-principles.md; this skill should guide
candidate run planning while preserving model judgment about the active repo.
When to apply
- When the researcher explicitly authorizes exploratory runs.
- When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
- When the output should rank candidate runs rather than certify trusted success.
When not to apply
- When the user wants trusted training execution or conservative verification.
- When there is no explicit exploratory authorization.
- When the task is repository setup, intake, or debugging.
Clear boundaries
- This skill owns exploratory execution planning and summary only.
- Use
ai-research-explore instead when the task spans both current_research coordination and exploratory code changes.
- It may hand off actual command execution to
minimal-run-and-audit or run-train.
- It should keep experiment state isolated from the trusted baseline.
- It should prefer small-subset and short-cycle checks before heavier exploratory runs.
- It should label run results as bounded evidence and explain when a comparison
is not directly fair.
Ranking Semantics
- Pre-execution candidate selection uses three factors:
cost, success_rate, and expected_gain.
- Default weights should stay conservative unless the researcher explicitly provides
selection_weights.
- Budget pruning still applies after scoring through
max_variants and max_short_cycle_runs.
- If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.
Variant Spec Hints
- Use
variant_axes to define the candidate dimension grid.
- Use
subset_sizes and short_run_steps to express exploratory run scale.
- Use
selection_weights to rebalance cost, success_rate, and expected_gain.
- Use
primary_metric and metric_goal so downstream ranking can order executed candidates consistently.
Output expectations
explore_outputs/CHANGESET.md
explore_outputs/SCIENTIFIC_CHANGELOG.md
explore_outputs/COMPARABILITY_REPORT.md
explore_outputs/TOP_RUNS.md
explore_outputs/status.json
Notes
Use references/execution-policy.md, ../../references/explore-variant-spec.md, ../../references/deep-learning-experiment-principles.md, scripts/plan_variants.py, and scripts/write_outputs.py.