with one click
mcore-linting-and-formatting
Linting and formatting for Megatron-LM. Covers running autoformat.sh, tools (ruff, black, isort, pylint, mypy), and code style rules.
Menu
Linting and formatting for Megatron-LM. Covers running autoformat.sh, tools (ruff, black, isort, pylint, mypy), and code style rules.
Container-based dev environment setup and dependency management for Megatron-LM. Covers acquiring and launching the CI container, uv package management, and updating uv.lock.
Bump the NVIDIA PyTorch base image (`nvcr.io/nvidia/pytorch:YY.MM-py3`) used by Megatron-LM CI. Covers the two pin sites (GitHub CI in `docker/.ngc_version.dev` and GitLab CI in `.gitlab/stages/01.build.yml`), the post-bump CI loop (re-run functional tests, refresh golden values, mark broken tests), and the gotchas that bit PRs
CI/CD reference for Megatron-LM. Covers CI pipeline structure, PR scope labels, triggering internal GitLab CI (which force-pushes the current branch to a pull-request/BRANCH ref — always dry-run and verify the destination first; never run against shared or protected branches), and CI failure investigation.
Investigate a failing GitHub Actions run or job and create a GitHub issue for the failure.
Onboard 1-node GitHub MR functional tests for GB200 from existing mr-scoped 2-node tests.
How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions, monitoring, and per-rank failure diagnosis.
| name | mcore-linting-and-formatting |
| description | Linting and formatting for Megatron-LM. Covers running autoformat.sh, tools (ruff, black, isort, pylint, mypy), and code style rules. |
| license | Apache-2.0 |
| when_to_use | Running linting or autoformat; fixing style violations before a PR; 'pre-commit fails', 'ruff error', 'isort', 'mypy', 'style violation', 'how do I format', 'autoformat.sh'. |
| metadata | {"author":"Philip Petrakian <ppetrakian@nvidia.com>"} |
Run before opening a PR:
# Check mode (no changes applied)
BASE_REF=main CHECK_ONLY=true SKIP_DOCS=false bash tools/autoformat.sh
# Fix mode
BASE_REF=main CHECK_ONLY=false bash tools/autoformat.sh
Tools invoked: black, isort, pylint, ruff, mypy.
After editing imports in any Python files, always run uv run isort on those
files before committing:
uv run isort <file1>.py <file2>.py
Inside the container:
uv sync --locked --only-group linting
This installs ruff, black, isort, pylint — the same tools used by
tools/autoformat.sh and CI's linting job.
X | None, not Optional[X].snake_case for functions and variables, PascalCase for classes.pyproject.toml).except: always catch specific exception types.