mit einem Klick
config-conventions
// Configuration conventions for NeMo-RL. YAML is the single source of truth for defaults. Covers TypedDict usage, exemplar YAML updates, and forbidden default patterns.
// Configuration conventions for NeMo-RL. YAML is the single source of truth for defaults. Covers TypedDict usage, exemplar YAML updates, and forbidden default patterns.
| name | config-conventions |
| description | Configuration conventions for NeMo-RL. YAML is the single source of truth for defaults. Covers TypedDict usage, exemplar YAML updates, and forbidden default patterns. |
| when_to_use | Adding or modifying config fields; reviewing config changes; 'where do I set defaults', 'TypedDict pattern', 'exemplar YAML', 'forbidden default patterns', during code review of config files. |
YAML is the single source of truth for defaults. Do not set non-None defaults in code for configuration values. The loaded YAML (and any user overrides) must supply required values.
For required attributes, write code like policy_cfg["precision"] and assume it is present. Do not introduce hidden defaults deep in the code.
Use typing.NotRequired to mark optional attributes. Optional attributes may be absent/None; code may check for their presence.
examples/configs/*.yaml include documented defaults.examples/configs/recipes/**/*.yaml are runnable snapshots and may omit documentation.When adding a new config key to a TypedDict subclass, document:
Reflect the default in the exemplar YAMLs under examples/configs/*.yaml.
defaultsRecipe YAMLs under examples/configs/recipes/**/*.yaml must set defaults: <exemplar>.yaml to inherit from one of the exemplar configs in examples/configs/*.yaml. This keeps recipes minimal — they only override what differs from the exemplar.
If a recipe YAML does not have a defaults key, run:
uv run ./tools/config_cli.py minimize <recipe.yaml>
This will minimize the config and assign the appropriate defaults key.
When accessing a NotRequired field, use an in check or .get(key) / .get(key, None). Never provide a non-None default — that hides behavior and defeats the purpose of making the field optional.
Do:
# .get() with None (not a hidden default)
stop_properly_penalty_coef = cfg.get("stop_properly_penalty_coef", None)
# Truthiness check for optional booleans
if master_config.grpo.get("skip_reference_policy_logprobs_calculation"):
...
# Nested NotRequired: check presence at each level explicitly
if "megatron_cfg" in policy_config and policy_config["megatron_cfg"]["enabled"]:
...
Don't:
# Hidden boolean default — should come from YAML
disable_ppo_ratio = cfg.get("disable_ppo_ratio", False)
# Hidden non-trivial default — caller has no idea True is the fallback
normalize_rewards = grpo_config.get("normalize_rewards", True)
# Chained .get() with hidden defaults at each level
megatron_enable = config.get("megatron_cfg", {}).get("enabled", False)
If a NotRequired field is absent, the code should handle that explicitly — not paper over it with a magic default.
Don't:
# Hidden default in code
precision = policy_cfg.get("precision", "bfloat16")
# Function parameter defaulting a config value
def build_policy(policy_cfg, precision: str = "bfloat16"):
...
Do:
# Required attribute: expect it from YAML or user override
precision: str = policy_cfg["precision"]
# Optional attribute: check for presence
if "milestones" in scheduler_cfg:
configure_milestones(scheduler_cfg["milestones"])
See also: @docs/design-docs/design-and-philosophy.md (TypedDict and Configuration Defaults section).
Autonomous NeMo-RL research agent workflow for directed hypothesis testing and open-ended discovery. Guides agents through the full experiment lifecycle: understanding recipes and environments, wiring RL or NeMo-gym runs, launching reproducible baselines and iterations, analyzing results, preserving human oversight, and using git plus TSV logs as the research ledger.
Brev instance operating guidance for NeMo-RL agents working in /home/ubuntu/RL with limited workspace disk, a larger /ephemeral volume, and optional /home/ubuntu/RL/.env secrets. Use when running auto-research campaigns, experiments, training jobs, model or dataset downloads, shared cache-heavy commands, log-producing runs, checkpoint generation, W&B or Hugging Face authenticated workflows, or any workflow that may create large files on Brev.
Manage durable working-session memory for coding agents. Use when a user asks to preserve or recover agent context across disconnects, VS Code restarts, long-running work, handoffs, or any session where important state should be written periodically under the repo's session directory.
Build and dependency management for NeMo-RL. Covers Docker image building and running, uv usage, venv setup, and adding dependencies.
CI/CD reference for NeMo-RL. Covers GitHub Actions pipeline structure, CI triggering via /ok to test, and CI failure investigation.
Contribution conventions for NeMo-RL. Covers PR title format, commit sign-off, and CI triggering.