一键导入
reproduce-ci-locally
// Reproduce RAPIDS CI builds and tests locally using Docker containers that mirror the CI environment. Use when the user asks to reproduce CI, run CI locally, debug a CI failure, or simulate a CI job on their machine.
// Reproduce RAPIDS CI builds and tests locally using Docker containers that mirror the CI environment. Use when the user asks to reproduce CI, run CI locally, debug a CI failure, or simulate a CI job on their machine.
| name | reproduce-ci-locally |
| description | Reproduce RAPIDS CI builds and tests locally using Docker containers that mirror the CI environment. Use when the user asks to reproduce CI, run CI locally, debug a CI failure, or simulate a CI job on their machine. |
RAPIDS CI jobs are shell scripts executed inside CI containers. You can reproduce them locally by running the same container and scripts.
Reference: https://docs.rapids.ai/resources/reproducing-ci
docker run --rm hello-world)--gpus flag support)GH_TOKEN (GitHub personal access token with repo scope)Before running anything, collect these values (ask for anything not provided):
| Parameter | Description | Examples |
|---|---|---|
| CI script | Which ci/*.sh script to run | ci/build_cpp.sh, ci/test_python.sh |
| Container image | The exact CI container image tag (see below) | rapidsai/ci-conda:cuda12.8.1-ubuntu24.04-py3.12-26.06-latest |
| Build type | One of pull-request, nightly, branch | pull-request |
| Ref name | PR number, branch, or nightly branch | pull-request/123, main, release/26.06 |
| Nightly date | Date for nightly builds (YYYY-MM-DD) | 2025-12-12 (only for nightly) |
CI jobs run across a matrix of container images that vary by CUDA version, OS, Python version, and architecture. When the user asks to reproduce a CI job, you should look up all the matrix variants from the CI run (see below) and determine which ones are compatible with the host machine.
NVIDIA supports CUDA Enhanced Compatibility: a container image built for CUDA X.Y can run on any host driver that supports CUDA major version X, regardless of the minor version. As a rule of thumb, any image from the same CUDA major version as the host driver works. For example, a host with driver 535 (which ships with CUDA 12.2) can run images for CUDA 12.2, 12.5, 12.8, 12.9, etc. — but not CUDA 13.x images.
To determine the host's CUDA major version, check the driver version with
nvidia-smi and map it to the CUDA major version:
| Driver series | CUDA major version |
|---|---|
| 525–560+ | 12 |
| 570+ | 13 |
When there are multiple compatible images for the same CI job (e.g., CUDA 12.2.2 and CUDA 12.9.1 variants both compatible via CEC), always present all compatible options to the user and ask them to choose. Do not silently pick one. List each option with its key attributes (CUDA version, OS, Python version, GPU type used in CI) so the user can make an informed decision.
Only auto-select an image when exactly one variant is compatible with the host.
If the user provides the image tag directly, use it. Otherwise, look up the images from the CI run:
actions/runs/{run_id}/jobs endpoint).GH_TOKEN with repo
scope) and look for the docker pull line in the Initialize Containers
/ Starting job container section — it contains the full image tag
(e.g., rapidsai/ci-conda:26.06-cuda12.9.1-ubuntu22.04-py3.11)..github/workflows/ that set container_image
explicitly, read the value from the workflow file. However, most jobs use
shared workflows (e.g., conda-cpp-tests.yaml) which select the image
from a build matrix — in that case the CI job logs are the only reliable
source.Conda builds (no GPU needed):
ci/build_cpp.sh — Build C++ conda packageci/build_python.sh — Build Python conda package (needs C++ artifacts)ci/build_docs.sh — Build documentation (needs C++ and Python artifacts)Conda tests (GPU needed):
ci/test_cpp.sh — Test C++ conda packageci/test_python.sh — Test Python conda packageci/test_cpp_memcheck.sh — Test C++ with compute-sanitizer memcheckWheel builds (no GPU needed):
ci/build_wheel_librapidsmpf.sh — Build librapidsmpf wheelci/build_wheel_rapidsmpf.sh — Build rapidsmpf wheelci/build_wheel_singlecomm.sh — Build rapidsmpf single-comm wheel (no MPI/UCXX)Wheel tests (GPU needed):
ci/test_wheel.sh — Test rapidsmpf wheelLinters / checks (no GPU needed):
ci/check_style.sh — Pre-commit style checksci/cpp_linters.sh — C++ clang-tidy lintingFor build jobs (no GPU required):
docker run \
--rm \
--pull=always \
--volume $PWD:/repo \
--workdir /repo \
<CONTAINER_IMAGE>
For test jobs (GPU required):
docker run \
--rm \
--gpus all \
--pull=always \
--cap-add CAP_SYS_PTRACE \
--shm-size=8g \
--ulimit nofile=1000000:1000000 \
--volume $PWD:/repo \
--workdir /repo \
<CONTAINER_IMAGE>
The --cap-add, --shm-size, and --ulimit flags match what CI uses for
RapidsMPF test jobs (see container-options in the workflow files).
Note on -it flag: Do NOT use -it (or -t) when running a command
non-interactively via bash -c "...". Docker requires a real TTY for -t and
will fail with "the input device is not a TTY" in background/scripted execution.
Only add -it when launching an interactive shell for manual exploration.
Test scripts download build artifacts from GitHub Actions. Inside the container, set these variables so the download commands don't prompt interactively:
export RAPIDS_BUILD_TYPE=pull-request # or "nightly" or "branch"
export RAPIDS_REPOSITORY=rapidsai/rapidsmpf
# For pull-request builds:
export RAPIDS_REF_NAME=pull-request/123
# For nightly builds:
export RAPIDS_REF_NAME=main
export RAPIDS_NIGHTLY_DATE=2025-12-12
# For branch builds:
export RAPIDS_REF_NAME=release/26.06
RAPIDS_SHA (required for test jobs)The rapids-download-conda-from-github / rapids-download-from-github scripts
need the exact commit SHA that produced the build artifacts. This must match the
SHA used by the CI build job — not the current tip of the branch. For
example, if nightly ran on commit abcdef0 but a later commit abcdef1 was
pushed to main afterward, the artifacts only exist for abcdef0. Using
abcdef1 will fail to find them.
When the local repo is a fork (i.e., origin points to a user fork rather
than rapidsai/rapidsmpf), the scripts also cannot determine the SHA
automatically and will fail with:
There was a problem acquiring the HEAD commit sha from the current directory.
Always set RAPIDS_SHA explicitly to the SHA from the CI run you want to
reproduce. There are three ways to look it up:
Option A — GitHub REST API (no extra tools needed):
For nightly builds, query the build.yaml workflow runs:
curl -s "https://api.github.com/repos/rapidsai/rapidsmpf/actions/workflows/build.yaml/runs?per_page=5" \
| python3 -c "import sys,json; runs=json.load(sys.stdin)['workflow_runs']; print('\n'.join(f\"{r['head_sha'][:12]} {r['created_at']}\" for r in runs))"
For pull-request builds, query the pr.yaml workflow runs for a specific PR
branch:
curl -s "https://api.github.com/repos/rapidsai/rapidsmpf/actions/workflows/pr.yaml/runs?branch=pull-request/123&per_page=5" \
| python3 -c "import sys,json; runs=json.load(sys.stdin)['workflow_runs']; print('\n'.join(f\"{r['head_sha'][:12]} {r['created_at']}\" for r in runs))"
Pick the head_sha from the run matching your target date.
Option B — gh CLI (if installed):
gh run list --repo rapidsai/rapidsmpf --workflow build.yaml --json headSha,createdAt --limit 5
Option C — GitHub web UI:
Browse to https://github.com/rapidsai/rapidsmpf/actions/workflows/build.yaml, click the run for the target date, and note the commit SHA shown at the top.
After obtaining the SHA, check out that exact commit locally so the repo contents match the artifacts, then set the variable:
git fetch <upstream-remote> <branch>
git checkout <sha>
export RAPIDS_SHA=<sha>
Where <upstream-remote> is the remote pointing to rapidsai/rapidsmpf
(commonly upstream). For example, for a nightly build on main:
RAPIDS_SHA=$(curl -s "https://api.github.com/repos/rapidsai/rapidsmpf/actions/workflows/build.yaml/runs?per_page=1" \
| python3 -c "import sys,json; print(json.load(sys.stdin)['workflow_runs'][0]['head_sha'])")
git fetch upstream main
git checkout $RAPIDS_SHA
export RAPIDS_SHA
Test scripts use rapids-download-conda-from-github /
rapids-download-from-github which require GitHub authentication via GH_TOKEN.
Option A — pass the token directly:
docker run \
...
--env "GH_TOKEN=<your-token>" \
...
Option B — use an env file:
Create a .env file with GH_TOKEN=<your-token> and mount it:
docker run \
...
--env-file "$(pwd)/.env" \
...
On shared machines, passing tokens via --env may expose them in ps output.
Prefer --env-file in that case.
Ask the user for their GH_TOKEN if not already set in the environment.
./ci/build_cpp.sh # or whichever script you want to reproduce
CI builds and tests run on separate machines. To do a complete local build+test cycle without downloading artifacts from GitHub, redirect the channel variables to use locally-built packages:
sed -ri '/rapids-download.*from-github/ s/_CHANNEL=.*/_CHANNEL=${RAPIDS_CONDA_BLD_OUTPUT_DIR}/' ci/*.sh
./ci/build_cpp.sh
./ci/build_python.sh
./ci/test_cpp.sh
./ci/test_python.sh
Important: this sed modifies CI scripts in your working tree. Either do
this on a throwaway branch or revert the changes afterward
(git checkout -- ci/).
Some builds need version info from git tags. If you see errors like
'GIT_DESCRIBE_NUMBER' is undefined, fetch tags from upstream:
git fetch git@github.com:rapidsai/rapidsmpf.git --tags
CI tests may run on different GPU driver versions. If local test results differ
from CI, compare nvidia-smi output between your machine and the CI job logs.
Locally-built artifacts cannot be uploaded to CI artifact storage. Fix build failures locally, push to PR, and let CI produce the artifacts for test jobs.
The agent should construct and run the full docker run command based on user
input. Here is an example reproducing a failing test_python.sh from PR #123:
docker run \
--rm \
--gpus all \
--pull=always \
--cap-add CAP_SYS_PTRACE \
--shm-size=8g \
--ulimit nofile=1000000:1000000 \
--volume $PWD:/repo \
--workdir /repo \
--env-file "$(pwd)/.env" \
--env RAPIDS_BUILD_TYPE=pull-request \
--env RAPIDS_REPOSITORY=rapidsai/rapidsmpf \
--env RAPIDS_REF_NAME=pull-request/123 \
--env RAPIDS_SHA=<commit-sha> \
<CONTAINER_IMAGE> \
bash -c "./ci/test_python.sh"
Replace <CONTAINER_IMAGE> with the exact image from the CI job logs (see
"Identifying the container image" above) and <commit-sha> with the full SHA
from the CI run (see "Setting RAPIDS_SHA" above).
To get an interactive shell instead, drop bash -c "..." and add -it:
docker run \
--rm \
--gpus all \
--pull=always \
--cap-add CAP_SYS_PTRACE \
--shm-size=8g \
--ulimit nofile=1000000:1000000 \
--volume $PWD:/repo \
--workdir /repo \
--env-file "$(pwd)/.env" \
--env RAPIDS_BUILD_TYPE=pull-request \
--env RAPIDS_REPOSITORY=rapidsai/rapidsmpf \
--env RAPIDS_REF_NAME=pull-request/123 \
--env RAPIDS_SHA=<commit-sha> \
-it <CONTAINER_IMAGE>