name	experiment-run-scripts
description	Write and operate robust, reusable experiment runner scripts (Bash) for ML/data pipelines: config-driven runs, GPU sharding, CPU fallback, structured logs, checkpoint/resume, and post-processing chaining. Use when the user asks how to run a project efficiently via scripts, or asks you to create/standardize run scripts.

Experiment Run Scripts (Fast execution + robust authoring)

Goals

Run efficiently: launch parallel workers (often per-GPU) with clean logs and deterministic inputs.
Write maintainable scripts: keep scripts as orchestrators (resource selection, sharding, logging, chaining), with the experiment logic living in Python modules/configs.

Non-negotiable conventions

Run from repo root (or make paths robust): treat configs/data/outputs as repo-relative by default.
Config-driven: the script takes a single CONFIG path (YAML/JSON) and passes it through; avoid editing code to switch datasets/params.
Stable outputs: results and logs always land in a predictable outputs/<run-name>/... folder (or a config-defined output dir).
Resume-first: rerunning the same command should be safe. Prefer idempotent “skip if output exists” behavior in the underlying program.

Operator quickstart (copy/paste)

Environment

Python deps: install with the project’s dependency manager (requirements.txt, pyproject.toml, conda env, etc.).
GPU deps (optional): if nvidia-smi is available, the script can shard across GPUs; otherwise it must fallback to CPU single-shard.

Common pattern: batch parallelism over seeds/tasks

Typical invocation:

bash scripts/run_experiment.sh path/to/config.yaml

Behavioral requirements:

Parallel model: launch N background jobs, then wait for the batch to finish.
Per-unit logs: one log per seed/task/shard, named predictably.
Final aggregation: after all workers finish, optionally run a merge/aggregate step.

Common pattern: GPU sharding with CPU fallback

Run:

bash scripts/run_experiment.sh path/to/config.yaml

Logs (example):

outputs/<run-name>/logs/shard_*.log

Failure handling:

Any shard fails: rerun the same script. If resume is implemented, completed work is skipped automatically.
Insufficient GPU memory: the script should refuse to launch and print actionable diagnostics (free memory, processes).

Post-processing chaining

Typical sequence (names are project-specific; keep the pattern):

python -m package.module --mode merge --config path/to/config.yaml
python -m package.module --mode postprocess --config path/to/config.yaml
python -m package.module --mode analyze --config path/to/config.yaml

Quality checks & debugging (preferred: dedicated scripts)

Spot-check outputs (human-auditable report)

Requirements for a good spotcheck tool:

Accept --input_dir / --out_dir, --n, --seed, optional --stratify.
Produce CSV + HTML (or Markdown) so humans can audit correctness quickly.

Diagnose “feature extraction / peak detection / metrics not found”

Requirements for a good diagnostics tool:

Never re-run expensive inference; scan existing artifacts.
Save a summary JSON plus a sweep CSV for threshold sensitivity.

How to author new runner scripts (the agent must follow)

0) Provide one stable entrypoint

Naming: scripts/run_<pipeline>.sh (one script per pipeline).
Usage: bash scripts/run_<pipeline>.sh [config_path] with a sensible default config.
Inputs: accept one primary argument (CONFIG) and keep the rest as constants or environment-variable overrides.

1) Required Bash skeleton

The script must include:

set -euo pipefail
CONFIG=${1:-"path/to/default.yaml"}
LOG_DIR="outputs/<run-name>/logs" and mkdir -p "$LOG_DIR"
GPU detection / fallback:
- If nvidia-smi is missing: run single-shard and tee logs to shard_0.log.
- If nvidia-smi exists: select GPUs (by free memory or explicit allowlist) and set TOTAL_SHARDS=${#AVAILABLE_GPUS[@]}.
Shard launch: for shard_idx=0..TOTAL_SHARDS-1, set CUDA_VISIBLE_DEVICES=$gpu_id and pass:
- --shard "$shard_idx"
- --total_shards "$TOTAL_SHARDS"
- redirect logs to "$LOG_DIR/shard_${shard_idx}.log"
Wait + failure summary: store PIDS[], wait each PID, count failures, exit non-zero if any failed.
Post steps: run merge/aggregate and optional analysis steps.

2) Logging & outputs (must be enforceable)

One log per shard: shard_${shard_idx}.log.
Output dir is a parameter: do not hardcode file names for downstream artifacts; print where to find results.
Print grep-friendly banners: config path, start/end time, GPU list, per-shard PID, and failing shard log paths.

3) Minimum reproducibility bar

No code edits for runs: dataset/size/output must be configurable.
Safe to rerun: reruns should not corrupt completed artifacts; prefer atomic writes and “skip-if-exists”.

4) Common pitfalls to avoid

Confusing physical GPU id with shard id: shard indices should be 0..TOTAL_SHARDS-1 even if physical GPU ids are [2,5,7].
Brittle paths: avoid assuming the current directory unless explicitly enforced.
Silent failures: always surface exit codes and point to the exact failing log file.

name	experiment-run-scripts
description	Write and operate robust, reusable experiment runner scripts (Bash) for ML/data pipelines: config-driven runs, GPU sharding, CPU fallback, structured logs, checkpoint/resume, and post-processing chaining. Use when the user asks how to run a project efficiently via scripts, or asks you to create/standardize run scripts.

Experiment Run Scripts (Fast execution + robust authoring)

Goals

Run efficiently: launch parallel workers (often per-GPU) with clean logs and deterministic inputs.
Write maintainable scripts: keep scripts as orchestrators (resource selection, sharding, logging, chaining), with the experiment logic living in Python modules/configs.

Non-negotiable conventions

Run from repo root (or make paths robust): treat configs/data/outputs as repo-relative by default.
Config-driven: the script takes a single CONFIG path (YAML/JSON) and passes it through; avoid editing code to switch datasets/params.
Stable outputs: results and logs always land in a predictable outputs/<run-name>/... folder (or a config-defined output dir).
Resume-first: rerunning the same command should be safe. Prefer idempotent “skip if output exists” behavior in the underlying program.

Operator quickstart (copy/paste)

Environment

Python deps: install with the project’s dependency manager (requirements.txt, pyproject.toml, conda env, etc.).
GPU deps (optional): if nvidia-smi is available, the script can shard across GPUs; otherwise it must fallback to CPU single-shard.

Common pattern: batch parallelism over seeds/tasks

Typical invocation:

bash scripts/run_experiment.sh path/to/config.yaml

Behavioral requirements:

Parallel model: launch N background jobs, then wait for the batch to finish.
Per-unit logs: one log per seed/task/shard, named predictably.
Final aggregation: after all workers finish, optionally run a merge/aggregate step.

Common pattern: GPU sharding with CPU fallback

Run:

bash scripts/run_experiment.sh path/to/config.yaml

Logs (example):

outputs/<run-name>/logs/shard_*.log

Failure handling:

Any shard fails: rerun the same script. If resume is implemented, completed work is skipped automatically.
Insufficient GPU memory: the script should refuse to launch and print actionable diagnostics (free memory, processes).

Post-processing chaining

Typical sequence (names are project-specific; keep the pattern):

python -m package.module --mode merge --config path/to/config.yaml
python -m package.module --mode postprocess --config path/to/config.yaml
python -m package.module --mode analyze --config path/to/config.yaml

Quality checks & debugging (preferred: dedicated scripts)

Spot-check outputs (human-auditable report)

Requirements for a good spotcheck tool:

Accept --input_dir / --out_dir, --n, --seed, optional --stratify.
Produce CSV + HTML (or Markdown) so humans can audit correctness quickly.

Diagnose “feature extraction / peak detection / metrics not found”

Requirements for a good diagnostics tool:

Never re-run expensive inference; scan existing artifacts.
Save a summary JSON plus a sweep CSV for threshold sensitivity.

How to author new runner scripts (the agent must follow)

0) Provide one stable entrypoint

Naming: scripts/run_<pipeline>.sh (one script per pipeline).
Usage: bash scripts/run_<pipeline>.sh [config_path] with a sensible default config.
Inputs: accept one primary argument (CONFIG) and keep the rest as constants or environment-variable overrides.

1) Required Bash skeleton

The script must include:

set -euo pipefail
CONFIG=${1:-"path/to/default.yaml"}
LOG_DIR="outputs/<run-name>/logs" and mkdir -p "$LOG_DIR"
GPU detection / fallback:
- If nvidia-smi is missing: run single-shard and tee logs to shard_0.log.
- If nvidia-smi exists: select GPUs (by free memory or explicit allowlist) and set TOTAL_SHARDS=${#AVAILABLE_GPUS[@]}.
Shard launch: for shard_idx=0..TOTAL_SHARDS-1, set CUDA_VISIBLE_DEVICES=$gpu_id and pass:
- --shard "$shard_idx"
- --total_shards "$TOTAL_SHARDS"
- redirect logs to "$LOG_DIR/shard_${shard_idx}.log"
Wait + failure summary: store PIDS[], wait each PID, count failures, exit non-zero if any failed.
Post steps: run merge/aggregate and optional analysis steps.

2) Logging & outputs (must be enforceable)

One log per shard: shard_${shard_idx}.log.
Output dir is a parameter: do not hardcode file names for downstream artifacts; print where to find results.
Print grep-friendly banners: config path, start/end time, GPU list, per-shard PID, and failing shard log paths.

3) Minimum reproducibility bar

No code edits for runs: dataset/size/output must be configurable.
Safe to rerun: reruns should not corrupt completed artifacts; prefer atomic writes and “skip-if-exists”.

4) Common pitfalls to avoid

Confusing physical GPU id with shard id: shard indices should be 0..TOTAL_SHARDS-1 even if physical GPU ids are [2,5,7].
Brittle paths: avoid assuming the current directory unless explicitly enforced.
Silent failures: always surface exit codes and point to the exact failing log file.

experiment-run-scripts

Experiment Run Scripts (Fast execution + robust authoring)

Goals

Non-negotiable conventions

Operator quickstart (copy/paste)

Environment

Common pattern: batch parallelism over seeds/tasks

Common pattern: GPU sharding with CPU fallback

Post-processing chaining

Quality checks & debugging (preferred: dedicated scripts)

Spot-check outputs (human-auditable report)

Diagnose “feature extraction / peak detection / metrics not found”

How to author new runner scripts (the agent must follow)

0) Provide one stable entrypoint

1) Required Bash skeleton

2) Logging & outputs (must be enforceable)

3) Minimum reproducibility bar

4) Common pitfalls to avoid

Experiment Run Scripts (Fast execution + robust authoring)

Goals

Non-negotiable conventions

Operator quickstart (copy/paste)

Environment

Common pattern: batch parallelism over seeds/tasks

Common pattern: GPU sharding with CPU fallback

Post-processing chaining

Quality checks & debugging (preferred: dedicated scripts)

Spot-check outputs (human-auditable report)

Diagnose “feature extraction / peak detection / metrics not found”

How to author new runner scripts (the agent must follow)

0) Provide one stable entrypoint

1) Required Bash skeleton

2) Logging & outputs (must be enforceable)

3) Minimum reproducibility bar

4) Common pitfalls to avoid