en un clic
monitor-run
// Monitor an ongoing prime-rl training run — find the output directory, tail logs, check key metrics, inspect SLURM jobs, and restart safely. Use when asked to check on a run, debug training, or investigate performance.
// Monitor an ongoing prime-rl training run — find the output directory, tail logs, check key metrics, inspect SLURM jobs, and restart safely. Use when asked to check on a run, debug training, or investigate performance.
| name | monitor-run |
| description | Monitor an ongoing prime-rl training run — find the output directory, tail logs, check key metrics, inspect SLURM jobs, and restart safely. Use when asked to check on a run, debug training, or investigate performance. |
{output_dir}/configs/ (start with rl.toml).{output_dir}/STATUS.md.Default cadence: 1 hour (researcher can override). At each check-in:
{output_dir}/STATUS.md (never overwrite):## YYYY-MM-DD HH:MM UTC
**Step**: {current_step} / {max_steps}
**Health**: {Healthy | Degraded | Down}
**Progress**: reward/mean, seq_len, truncation, eval scores, env-specific metrics.
**Stability**: entropy, mismatch_kl, grad_norm — flag spikes.
**Performance**: trainer vs orchestrator step time, env lag, inference pressure.
**Notes**: anything unusual (errors, restarts, hangs). Omit if nothing notable.
Never restart unless the researcher explicitly asked. Confirm the exact restart command and the conditions that warrant one.
Never run kill or launch commands from your own shell. Dispatch them to the tmux Launcher window so the researcher sees what was executed:
SESSION=$(tmux display-message -p '#S')
tmux send-keys -t "$SESSION:Launcher" 'your command here' Enter
After a restart, verify all processes are back up and progress resumed before the next check-in.
scripts/tmux.sh launches the run with a Launcher window in the named tmux session. The Claude window receives the output dir and session name in its appended prompt — if either is missing, ask rather than guess.{output_dir}/configs/ — resolved TOMLs (rl.toml has the full picture).{output_dir}/logs/ — see below.{output_dir}/rollouts/step_N/ — saved rollouts.{output_dir}/logs/
├── trainer.log # rank 0 stdout
├── orchestrator.log # orchestrator stdout
├── inference.log # vLLM stdout
├── trainer/
│ ├── node_*.log # per-node (multi-node only)
│ └── torchrun/ # per-rank stdout/stderr
├── inference/
│ ├── node_*.log # per-node (multi-node only)
│ └── router_0.log # vllm-router per replica (multi-node only)
└── envs/{train,eval}/{env_name}/
├── env_server.log
└── env_worker_*.log
Usually tailing trainer.log, orchestrator.log, and inference.log is enough. Drop into per-node or per-rank logs only when debugging. All logs are loguru with HH:mm:ss LEVEL message; levels: DEBUG, INFO, SUCCESS, WARNING, ERROR.
Scan for problems:
grep -E "WARNING|ERROR" {output_dir}/logs/{trainer,orchestrator,inference}.log
grep -E "WARNING|ERROR" {output_dir}/logs/envs/train/*/env_{server,worker_*}.log
All metrics print to the console log (and W&B when configured).
Progress — orchestrator log:
| Metric | Description |
|---|---|
reward/{all,env}/mean | mean training reward |
seq_len/{all,env}/mean | avg sequence length (tokens) |
num_turns/{all,env}/mean | avg turns per rollout (multi-turn only) |
is_truncated/{all,env}/mean | fraction truncated |
empty_rollouts/{all,env}, errored_rollouts/{all,env} | fraction empty/errored |
metrics/{env}/{metric} | env-specific (e.g. pass rate) |
eval/{env}/{avg@k,pass@k} | eval scores when configured |
Stability — trainer log:
| Metric | Description |
|---|---|
mismatch_kl/{all,env}/{mean,std,max} | KL between trainer and (old) inference policy over trainable tokens |
entropy/{all,env}/{mean,std,max} | policy entropy over trainable tokens |
masked_advantage_{positive,negative}/mean | fraction of DPPO-masked tokens with +/- advantage |
optim/grad_norm | spikes may precede divergence |
Performance — trainer and orchestrator step independently, so comparing step times shows who's waiting on whom.
| Source | Metric | Description |
|---|---|---|
| trainer | time/step | total trainer step |
| trainer | time/wait_for_batch | high → orchestrator is bottleneck |
| trainer | time/forward_backward, time/broadcast_weights, time/save_ckpt | phase timings |
| trainer | perf/throughput, perf/mfu | tokens/s and MFU % |
| orchestrator | time/step, time/generate_completions, time/update_weights | phase timings |
| orchestrator | time/wait_for_ckpt | high → trainer is bottleneck |
| orchestrator | scheduler/async_level, scheduler/inflight_rollouts | scheduler state |
| env server | event loop lag (min/mean/p90/p99/max), active task distribution | periodic |
For live vLLM stats, query Prometheus directly:
curl -s http://localhost:8000/metrics | grep -E "num_requests|gpu_cache_usage"
# vllm:num_requests_running, vllm:num_requests_waiting, vllm:gpu_cache_usage_perc (→1.0 = KV cache saturated)
{output_dir}/rollouts/step_N/
├── train_rollouts.jsonl # all train rollouts (vf.RolloutOutput, trajectory excluded)
├── eval_rollouts.jsonl # only present when eval ran
└── train_rollouts.bin # binary batch consumed by the trainer
wc -l {output_dir}/rollouts/step_42/train_rollouts.jsonl
head -1 {output_dir}/rollouts/step_42/train_rollouts.jsonl | python -m json.tool
jq '.reward' {output_dir}/rollouts/step_42/train_rollouts.jsonl
A few warnings are normal. Escalate when errors are persistent, growing, or hit a large fraction of rollouts.
All processes use setproctitle so they're visible in ps/htop/pstree:
PRIME-RL::Launcher
├── PRIME-RL::Inference (vLLM server, GPU 0)
├── PRIME-RL::Orchestrator (CPU-only)
│ └── Verifiers::EnvServer (ZMQ env server per environment)
│ └── Verifiers::EnvWorker0..N
├── torchrun
│ └── PRIME-RL::Trainer (GPU 1+)
└── tail trainer.log
For multi-node runs, trainer and inference processes are on separate nodes — use srun or ssh to inspect them.
How the prime-rl config system works — TOML files, CLI overrides, composition, and special patterns. Use when creating configs, debugging config errors, or overriding values via CLI.
How to prepare and publish GitHub releases for prime-rl. Use when drafting release notes, tagging versions, or publishing releases.
How to install prime-rl and its optional dependencies. Use when setting up the project, installing extras like deep-gemm for FP8 models, or troubleshooting dependency issues.
Launch and monitor prime-rl training runs. Use when starting, supervising, or debugging an RL/SFT run. Routes to `start-run` (entrypoints + how to launch) and `monitor-run` (logs, metrics, check-ins).
How to launch prime-rl training runs — the `rl`, `sft`, and `inference` entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.