com um clique
build-and-dependency
// Dev environment setup for NeMo AutoModel — container-based development, uv package management, installation options, environment variables, and common build pitfalls.
// Dev environment setup for NeMo AutoModel — container-based development, uv package management, installation options, environment variables, and common build pitfalls.
Maintain the NeMo AutoModel Fern docs site under fern/ — add, update, move, or remove pages; manage redirects, slugs, navigation, and version aliases; run validation and previews.
Guide for onboarding new model families into NeMo AutoModel, including architecture discovery, implementation patterns, registration, and validation.
CI/CD reference for NeMo AutoModel — pipeline structure, commit and PR workflow, CI failure investigation, and common failure patterns.
Guide for selecting and configuring distributed training strategies in NeMo AutoModel, including FSDP2, Megatron FSDP, DDP, and parallelism settings.
Configure NeMo AutoModel job launches for interactive runs, Slurm clusters, and SkyPilot cloud execution.
Code style and quality rules for NeMo AutoModel — ruff configuration, naming conventions, type hints, docstrings, copyright headers, and the code review checklist.
| name | build-and-dependency |
| description | Dev environment setup for NeMo AutoModel — container-based development, uv package management, installation options, environment variables, and common build pitfalls. |
| when_to_use | Setting up a dev environment, adding or removing dependencies, switching container images, configuring environment variables, 'uv sync fails', 'ModuleNotFoundError', 'TransformerEngine version mismatch', stale .venv issues. |
Clone and install:
git clone https://github.com/NVIDIA-NeMo/Automodel.git && cd Automodel
uv sync --locked --all-groups --extra all
Or use the NeMo-AutoModel container from NVIDIA NGC (pick a published tag from
the NGC catalog —
e.g. 26.04):
docker pull nvcr.io/nvidia/nemo-automodel:26.04
docker run --gpus all -it nvcr.io/nvidia/nemo-automodel:26.04
The container ships with all dependencies pre-installed at /opt/Automodel
(WORKDIR) with the venv at /opt/venv. Run as-is:
docker run --gpus all --network=host -it --rm --shm-size=32g \
nvcr.io/nvidia/nemo-automodel:26.04 /bin/bash
To develop against your host checkout, bind-mount it over /opt/Automodel to
override the installed source:
docker run --gpus all --network=host -it --rm --shm-size=32g \
-v <local-Automodel-path>:/opt/Automodel \
nvcr.io/nvidia/nemo-automodel:26.04 /bin/bash
Inside the container, patch pyproject.toml / uv.lock for the PyTorch base
image, then re-sync:
cd /opt/Automodel
bash docker/common/update_pyproject_pytorch.sh /opt/Automodel
uv sync --locked --all-groups --extra all
Warning: the
update_pyproject_pytorch.shstep is required. Without it,uv syncwill try to reinstalltorch, which leads to CUDA version mismatches and TE import failures — uv cannot recognize the torch baked into the PyTorch base container.
--all-groups pulls the build, docs, and test dev groups (defined in
pyproject.toml); drop it for a runtime-only install.
uv sync --locked --all-groups # base + dev groups
uv sync --locked --all-groups --extra cuda # CUDA support
uv sync --locked --all-groups --extra fa # flash-attention
uv sync --locked --all-groups --extra moe # mixture-of-experts
uv sync --locked --all-groups --extra vlm # vision-language models
uv sync --locked --all-groups --extra diffusion # diffusion models
uv sync --locked --all-groups --extra delta-databricks # Delta Lake / Databricks
uv sync --locked --all-groups --extra all # everything
Full install (matches uv sync --extra all):
pip install -e ".[all]"
Login-node / submitter-only install — lightweight package for SLURM, k8s, or NeMo-Run job submission without local CUDA deps:
pip install nemo-automodel[cli]
Always use uv. Do not introduce pip install commands in scripts or docs.
| Task | Command |
|---|---|
| Install from lockfile | uv sync --locked |
| Add a new dependency | uv add <package> |
| Add an optional dependency | uv add --optional --extra <group> <package> |
| Regenerate the lockfile | uv lock |
export HF_TOKEN="hf_..." # Hugging Face token for gated models
export WANDB_API_KEY="..." # Weights & Biases logging
export HF_HOME="/path/to/hf_cache" # Hugging Face cache directory
The entry point is automodel (defined at nemo_automodel._cli.app:main).
Pattern: automodel <command> <domain> -c <config.yaml>
# LLM
automodel finetune llm -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
automodel pretrain llm -c config.yaml
automodel kd llm -c config.yaml
automodel benchmark llm -c config.yaml
# VLM
automodel finetune vlm -c config.yaml
# Diffusion
automodel finetune diffusion -c config.yaml
# Retrieval
automodel finetune retrieval -c config.yaml
Override any config value from the CLI:
automodel finetune llm -c config.yaml --model.name_or_path meta-llama/Llama-3.2-1B
| Problem | Cause | Fix |
|---|---|---|
Stale .venv after switching branches | Cached environment out of sync | Delete .venv and re-run uv sync --locked |
| Import errors for optional features (TE, flash-attn, MoE) | Missing extras | Install the matching uv extra (--extra fa, --extra moe, etc.) |
| TransformerEngine version mismatch | The TE installed by uv sync takes precedence over the version baked into the container | Set the desired TE version in pyproject.toml / uv.lock and re-run uv sync — the venv's TE wins, not the container's |