| name | experiment-run-scripts |
| description | Write and operate robust, reusable experiment runner scripts (Bash) for ML/data pipelines: config-driven runs, GPU sharding, CPU fallback, structured logs, checkpoint/resume, and post-processing chaining. Use when the user asks how to run a project efficiently via scripts, or asks you to create/standardize run scripts. |
Experiment Run Scripts (Fast execution + robust authoring)
Goals
- Run efficiently: launch parallel workers (often per-GPU) with clean logs and deterministic inputs.
- Write maintainable scripts: keep scripts as orchestrators (resource selection, sharding, logging, chaining), with the experiment logic living in Python modules/configs.
Non-negotiable conventions
- Run from repo root (or make paths robust): treat configs/data/outputs as repo-relative by default.
- Config-driven: the script takes a single
CONFIG path (YAML/JSON) and passes it through; avoid editing code to switch datasets/params.
- Stable outputs: results and logs always land in a predictable
outputs/<run-name>/... folder (or a config-defined output dir).
- Resume-first: rerunning the same command should be safe. Prefer idempotent “skip if output exists” behavior in the underlying program.
Operator quickstart (copy/paste)
Environment
- Python deps: install with the project’s dependency manager (
requirements.txt, pyproject.toml, conda env, etc.).
- GPU deps (optional): if
nvidia-smi is available, the script can shard across GPUs; otherwise it must fallback to CPU single-shard.
Common pattern: batch parallelism over seeds/tasks
Typical invocation:
bash scripts/run_experiment.sh path/to/config.yaml
Behavioral requirements:
- Parallel model: launch N background jobs, then
wait for the batch to finish.
- Per-unit logs: one log per seed/task/shard, named predictably.
- Final aggregation: after all workers finish, optionally run a merge/aggregate step.
Common pattern: GPU sharding with CPU fallback
Run:
bash scripts/run_experiment.sh path/to/config.yaml
Logs (example):
outputs/<run-name>/logs/shard_*.log
Failure handling:
- Any shard fails: rerun the same script. If resume is implemented, completed work is skipped automatically.
- Insufficient GPU memory: the script should refuse to launch and print actionable diagnostics (free memory, processes).
Post-processing chaining
Typical sequence (names are project-specific; keep the pattern):
python -m package.module --mode merge --config path/to/config.yaml
python -m package.module --mode postprocess --config path/to/config.yaml
python -m package.module --mode analyze --config path/to/config.yaml
Quality checks & debugging (preferred: dedicated scripts)
Spot-check outputs (human-auditable report)
Requirements for a good spotcheck tool:
- Accept
--input_dir / --out_dir, --n, --seed, optional --stratify.
- Produce CSV + HTML (or Markdown) so humans can audit correctness quickly.
Diagnose “feature extraction / peak detection / metrics not found”
Requirements for a good diagnostics tool:
- Never re-run expensive inference; scan existing artifacts.
- Save a summary JSON plus a sweep CSV for threshold sensitivity.
How to author new runner scripts (the agent must follow)
0) Provide one stable entrypoint
- Naming:
scripts/run_<pipeline>.sh (one script per pipeline).
- Usage:
bash scripts/run_<pipeline>.sh [config_path] with a sensible default config.
- Inputs: accept one primary argument (
CONFIG) and keep the rest as constants or environment-variable overrides.
1) Required Bash skeleton
The script must include:
set -euo pipefail
CONFIG=${1:-"path/to/default.yaml"}
LOG_DIR="outputs/<run-name>/logs" and mkdir -p "$LOG_DIR"
- GPU detection / fallback:
- If
nvidia-smi is missing: run single-shard and tee logs to shard_0.log.
- If
nvidia-smi exists: select GPUs (by free memory or explicit allowlist) and set TOTAL_SHARDS=${#AVAILABLE_GPUS[@]}.
- Shard launch: for
shard_idx=0..TOTAL_SHARDS-1, set CUDA_VISIBLE_DEVICES=$gpu_id and pass:
--shard "$shard_idx"
--total_shards "$TOTAL_SHARDS"
- redirect logs to
"$LOG_DIR/shard_${shard_idx}.log"
- Wait + failure summary: store
PIDS[], wait each PID, count failures, exit non-zero if any failed.
- Post steps: run merge/aggregate and optional analysis steps.
2) Logging & outputs (must be enforceable)
- One log per shard:
shard_${shard_idx}.log.
- Output dir is a parameter: do not hardcode file names for downstream artifacts; print where to find results.
- Print grep-friendly banners: config path, start/end time, GPU list, per-shard PID, and failing shard log paths.
3) Minimum reproducibility bar
- No code edits for runs: dataset/size/output must be configurable.
- Safe to rerun: reruns should not corrupt completed artifacts; prefer atomic writes and “skip-if-exists”.
4) Common pitfalls to avoid
- Confusing physical GPU id with shard id: shard indices should be
0..TOTAL_SHARDS-1 even if physical GPU ids are [2,5,7].
- Brittle paths: avoid assuming the current directory unless explicitly enforced.
- Silent failures: always surface exit codes and point to the exact failing log file.