mit einem Klick
configs
// How the prime-rl config system works — TOML files, CLI overrides, composition, and special patterns. Use when creating configs, debugging config errors, or overriding values via CLI.
// How the prime-rl config system works — TOML files, CLI overrides, composition, and special patterns. Use when creating configs, debugging config errors, or overriding values via CLI.
How to prepare and publish GitHub releases for prime-rl. Use when drafting release notes, tagging versions, or publishing releases.
How to install prime-rl and its optional dependencies. Use when setting up the project, installing extras like deep-gemm for FP8 models, or troubleshooting dependency issues.
Monitor an ongoing prime-rl training run — find the output directory, tail logs, check key metrics, inspect SLURM jobs, and restart safely. Use when asked to check on a run, debug training, or investigate performance.
Launch and monitor prime-rl training runs. Use when starting, supervising, or debugging an RL/SFT run. Routes to `start-run` (entrypoints + how to launch) and `monitor-run` (logs, metrics, check-ins).
How to launch prime-rl training runs — the `rl`, `sft`, and `inference` entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.
| name | configs |
| description | How the prime-rl config system works — TOML files, CLI overrides, composition, and special patterns. Use when creating configs, debugging config errors, or overriding values via CLI. |
prime-rl uses pydantic-config — a Pydantic-based TOML + CLI config system (no tyro). Every entrypoint accepts TOML files via @ and CLI overrides.
uv run rl @ examples/reverse_text/rl.toml # single TOML
uv run rl @ examples/reverse_text/rl.toml --max-steps 50 # CLI override
uv run rl @ base.toml @ overlay.toml # left-to-right merge
uv run rl --model @ model.toml --data @ data.toml # nested section files
uv run rl @ base.toml --trainer @ trainer.toml --trainer.lr 1e-3 # mixed
Resolution order: CLI > config files (left-to-right) > class defaults. Merging is deep — unset fields in an overlay are preserved from the base.
Naming: CLI uses kebab-case (--model.max-model-len); TOML uses snake_case (max_model_len).
uv run rl --help # all fields and defaults
uv run rl @ rl.toml --dry-run --output-dir /tmp/x # write resolved TOML to /tmp/x/configs
Incompatible combinations (e.g. CP requires flash attention) must raise in a model_validator at resolve time, not at runtime. When renaming a field, emit a deprecation warning with a migration hint — never silently drop.
Booleans — CLI --flag / --no-flag; TOML must be explicit (enforce_eager = true).
None — TOML has no null, use the string "None" (max_model_len = "None"); CLI: --model.max-model-len None.
Lists — TOML uses array of tables; later config files replace lists wholesale, so overlays must include the full desired list:
[[orchestrator.env]]
id = "reverse-text"
CLI: --env.0.id reverse-text --env.1.id math-env.
Dicts — TOML uses a section; CLI takes a JSON string: --vllm-extra '{"key1": "value1"}'.
Discriminated unions — set the type field to pick the variant ([trainer.loss] type = "sft"). Omit type to keep the default variant.
BaseModel | None fields — bare flag enables defaults; nested override enables and sets:
--model.compile # enables compile with defaults
--model.compile.fullgraph # enables and sets fullgraph=true
In TOML, an empty section header ([ckpt]) does the same.
For rollout debugging, enable trainer-side token export under trainer.experimental.token_export (or experimental.token_export when running the trainer entrypoint directly). It writes one JSONL record per exported sequence under output_dir/token_exports/step_<step>/rank_<rank>.jsonl. Each record stores aligned per-token arrays for token ids, loss mask, advantage, reward, entropy, mismatch KL, inference/trainer logprobs, importance ratios, probability deltas, and masking diagnostics. It does not decode token text in the trainer.
[trainer.experimental.token_export]
Leave it unset for normal training. When enabled, it exports every sequence from each exporting rank.
packages/prime-rl-configs/src/prime_rl/ — config classes under configs/; utils/config.py re-exports BaseConfig and cliconfigs/debug/ — minimal debug configsconfigs/private/ — private configs submodule (internal)examples/ — full example configs