com um clique
start-run
// How to launch prime-rl training runs — the `rl`, `sft`, and `inference` entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.
// How to launch prime-rl training runs — the `rl`, `sft`, and `inference` entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.
How the prime-rl config system works — TOML files, CLI overrides, composition, and special patterns. Use when creating configs, debugging config errors, or overriding values via CLI.
How to prepare and publish GitHub releases for prime-rl. Use when drafting release notes, tagging versions, or publishing releases.
How to install prime-rl and its optional dependencies. Use when setting up the project, installing extras like deep-gemm for FP8 models, or troubleshooting dependency issues.
Monitor an ongoing prime-rl training run — find the output directory, tail logs, check key metrics, inspect SLURM jobs, and restart safely. Use when asked to check on a run, debug training, or investigate performance.
Launch and monitor prime-rl training runs. Use when starting, supervising, or debugging an RL/SFT run. Routes to `start-run` (entrypoints + how to launch) and `monitor-run` (logs, metrics, check-ins).
| name | start-run |
| description | How to launch prime-rl training runs — the `rl`, `sft`, and `inference` entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint. |
All entrypoints run via uv run <command> and accept TOML configs via @ path/to.toml plus CLI overrides.
pydantic-config — Pydantic-based TOML + CLI loader. Highlights (see the configs skill for full mechanics):
@ path (TOML / YAML / JSON); CLI args layer on top, deep-merged with class defaults.--flag enables, --no-flag disables (nested too).WandbConfig | None): bare --wandb enables defaults; --wandb @ wandb.toml enables from a file; --no-wandb disables.type tag (e.g. --optimizer.type muon).model_validator(mode="before").--help panels from Field(description=...) or PEP 224 docstrings.rl — RL trainingLaunches inference server, orchestrator, and trainer as subprocesses.
uv run rl @ examples/reverse_text/rl.toml
uv run rl @ examples/reverse_text/rl.toml @ examples/reverse_text/slurm_rl.toml # SLURM
uv run rl @ examples/reverse_text/rl.toml --dry-run # write scripts, don't run
RLConfig (packages/prime-rl-configs/src/prime_rl/configs/rl.py)src/prime_rl/entrypoints/rl.pysft — SFT trainingLaunches torchrun internally — never call torchrun directly.
uv run sft @ examples/reverse_text/sft.toml
uv run sft @ examples/reverse_text/sft.toml --slurm
uv run sft @ examples/reverse_text/sft.toml --dry-run
SFTConfig (packages/prime-rl-configs/src/prime_rl/configs/sft.py)src/prime_rl/entrypoints/sft.pyinference — vLLM serverOpenAI-compatible API plus prime-rl custom endpoints (/update_weights, /load_lora_adapter, /init_broadcaster). Always use this entrypoint — never vllm serve directly.
uv run inference @ configs/debug/infer.toml
uv run inference --model.name Qwen/Qwen3-0.6B --model.enforce-eager
Smoke checks:
curl http://<host>:<port>/health
curl http://<host>:<port>/v1/models
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 50}'
InferenceConfig (packages/prime-rl-configs/src/prime_rl/configs/inference.py)src/prime_rl/entrypoints/inference.py| Command | Purpose | Typical use |
|---|---|---|
rl | Full RL pipeline | Production RL training |
sft | Supervised fine-tuning | SFT and hard-distill |
inference | vLLM server | Standalone serving / debugging |
src/prime_rl/entrypoints/ — rl, sft, inference (+ trainer, orchestrator for direct launches)packages/prime-rl-configs/src/prime_rl/configs/ — all config classesconfigs/debug/ — minimal debug configsexamples/ — full example configs (e.g. reverse_text/)