| name | reproduce-ci-locally |
| description | Reproduce RAPIDS CI builds and tests locally using Docker containers that mirror the CI environment. Use when the user asks to reproduce CI, run CI locally, debug a CI failure, or simulate a CI job on their machine. |
Reproduce CI Locally
RAPIDS CI jobs are shell scripts executed inside
CI containers. You can reproduce them
locally by running the same container and scripts.
Reference: https://docs.rapids.ai/resources/reproducing-ci
Prerequisites
- Docker installed and working (
docker run --rm hello-world)
- For test jobs: NVIDIA GPU + NVIDIA Container Toolkit (
--gpus flag support)
- For artifact downloads: a
GH_TOKEN (GitHub personal access token with repo scope)
Information to gather from the user
Before running anything, collect these values (ask for anything not provided):
| Parameter | Description | Examples |
|---|
| CI script | Which ci/*.sh script to run | ci/build_cpp.sh, ci/test_python.sh |
| Container image | The exact CI container image tag (see below) | rapidsai/ci-conda:cuda12.8.1-ubuntu24.04-py3.12-26.06-latest |
| Build type | One of pull-request, nightly, branch | pull-request |
| Ref name | PR number, branch, or nightly branch | pull-request/123, main, release/26.06 |
| Nightly date | Date for nightly builds (YYYY-MM-DD) | 2025-12-12 (only for nightly) |
Identifying the container image
CI jobs run across a matrix of container images that vary by CUDA version,
OS, Python version, and architecture. When the user asks to reproduce a CI job,
you should look up all the matrix variants from the CI run (see below) and
determine which ones are compatible with the host machine.
CUDA Enhanced Compatibility (CEC)
NVIDIA supports CUDA Enhanced Compatibility: a container image built for
CUDA X.Y can run on any host driver that supports CUDA major version X,
regardless of the minor version. As a rule of thumb, any image from the same
CUDA major version as the host driver works. For example, a host with driver
535 (which ships with CUDA 12.2) can run images for CUDA 12.2, 12.5, 12.8,
12.9, etc. — but not CUDA 13.x images.
To determine the host's CUDA major version, check the driver version with
nvidia-smi and map it to the CUDA major version:
| Driver series | CUDA major version |
|---|
| 525–560+ | 12 |
| 570+ | 13 |
Choosing an image when multiple are compatible
When there are multiple compatible images for the same CI job (e.g.,
CUDA 12.2.2 and CUDA 12.9.1 variants both compatible via CEC), always
present all compatible options to the user and ask them to choose. Do not
silently pick one. List each option with its key attributes (CUDA version,
OS, Python version, GPU type used in CI) so the user can make an informed
decision.
Only auto-select an image when exactly one variant is compatible with the host.
Finding the images from CI job logs
If the user provides the image tag directly, use it. Otherwise, look up the
images from the CI run:
- Query the GitHub API for the jobs in the workflow run (see the
actions/runs/{run_id}/jobs endpoint).
- For each relevant job, download its logs (requires
GH_TOKEN with repo
scope) and look for the docker pull line in the Initialize Containers
/ Starting job container section — it contains the full image tag
(e.g., rapidsai/ci-conda:26.06-cuda12.9.1-ubuntu22.04-py3.11).
- Alternatively, for jobs in
.github/workflows/ that set container_image
explicitly, read the value from the workflow file. However, most jobs use
shared workflows (e.g., conda-cpp-tests.yaml) which select the image
from a build matrix — in that case the CI job logs are the only reliable
source.
Available CI scripts in this project
Conda builds (no GPU needed):
ci/build_cpp.sh — Build C++ conda package
ci/build_python.sh — Build Python conda package (needs C++ artifacts)
ci/build_docs.sh — Build documentation (needs C++ and Python artifacts)
Conda tests (GPU needed):
ci/test_cpp.sh — Test C++ conda package
ci/test_python.sh — Test Python conda package
ci/test_cpp_memcheck.sh — Test C++ with compute-sanitizer memcheck
Wheel builds (no GPU needed):
ci/build_wheel_librapidsmpf.sh — Build librapidsmpf wheel
ci/build_wheel_rapidsmpf.sh — Build rapidsmpf wheel
ci/build_wheel_singlecomm.sh — Build rapidsmpf single-comm wheel (no MPI/UCXX)
Wheel tests (GPU needed):
ci/test_wheel.sh — Test rapidsmpf wheel
Linters / checks (no GPU needed):
ci/check_style.sh — Pre-commit style checks
ci/cpp_linters.sh — C++ clang-tidy linting
Running a CI job locally
Step 1: Launch the container
For build jobs (no GPU required):
docker run \
--rm \
--pull=always \
--volume $PWD:/repo \
--workdir /repo \
<CONTAINER_IMAGE>
For test jobs (GPU required):
docker run \
--rm \
--gpus all \
--pull=always \
--cap-add CAP_SYS_PTRACE \
--shm-size=8g \
--ulimit nofile=1000000:1000000 \
--volume $PWD:/repo \
--workdir /repo \
<CONTAINER_IMAGE>
The --cap-add, --shm-size, and --ulimit flags match what CI uses for
RapidsMPF test jobs (see container-options in the workflow files).
Note on -it flag: Do NOT use -it (or -t) when running a command
non-interactively via bash -c "...". Docker requires a real TTY for -t and
will fail with "the input device is not a TTY" in background/scripted execution.
Only add -it when launching an interactive shell for manual exploration.
Step 2: Set environment variables for artifact downloads
Test scripts download build artifacts from GitHub Actions. Inside the container,
set these variables so the download commands don't prompt interactively:
export RAPIDS_BUILD_TYPE=pull-request
export RAPIDS_REPOSITORY=rapidsai/rapidsmpf
export RAPIDS_REF_NAME=pull-request/123
export RAPIDS_REF_NAME=main
export RAPIDS_NIGHTLY_DATE=2025-12-12
export RAPIDS_REF_NAME=release/26.06
Setting RAPIDS_SHA (required for test jobs)
The rapids-download-conda-from-github / rapids-download-from-github scripts
need the exact commit SHA that produced the build artifacts. This must match the
SHA used by the CI build job — not the current tip of the branch. For
example, if nightly ran on commit abcdef0 but a later commit abcdef1 was
pushed to main afterward, the artifacts only exist for abcdef0. Using
abcdef1 will fail to find them.
When the local repo is a fork (i.e., origin points to a user fork rather
than rapidsai/rapidsmpf), the scripts also cannot determine the SHA
automatically and will fail with:
There was a problem acquiring the HEAD commit sha from the current directory.
Always set RAPIDS_SHA explicitly to the SHA from the CI run you want to
reproduce. There are three ways to look it up:
Option A — GitHub REST API (no extra tools needed):
For nightly builds, query the build.yaml workflow runs:
curl -s "https://api.github.com/repos/rapidsai/rapidsmpf/actions/workflows/build.yaml/runs?per_page=5" \
| python3 -c "import sys,json; runs=json.load(sys.stdin)['workflow_runs']; print('\n'.join(f\"{r['head_sha'][:12]} {r['created_at']}\" for r in runs))"
For pull-request builds, query the pr.yaml workflow runs for a specific PR
branch:
curl -s "https://api.github.com/repos/rapidsai/rapidsmpf/actions/workflows/pr.yaml/runs?branch=pull-request/123&per_page=5" \
| python3 -c "import sys,json; runs=json.load(sys.stdin)['workflow_runs']; print('\n'.join(f\"{r['head_sha'][:12]} {r['created_at']}\" for r in runs))"
Pick the head_sha from the run matching your target date.
Option B — gh CLI (if installed):
gh run list --repo rapidsai/rapidsmpf --workflow build.yaml --json headSha,createdAt --limit 5
Option C — GitHub web UI:
Browse to https://github.com/rapidsai/rapidsmpf/actions/workflows/build.yaml,
click the run for the target date, and note the commit SHA shown at the top.
After obtaining the SHA, check out that exact commit locally so the repo
contents match the artifacts, then set the variable:
git fetch <upstream-remote> <branch>
git checkout <sha>
export RAPIDS_SHA=<sha>
Where <upstream-remote> is the remote pointing to rapidsai/rapidsmpf
(commonly upstream). For example, for a nightly build on main:
RAPIDS_SHA=$(curl -s "https://api.github.com/repos/rapidsai/rapidsmpf/actions/workflows/build.yaml/runs?per_page=1" \
| python3 -c "import sys,json; print(json.load(sys.stdin)['workflow_runs'][0]['head_sha'])")
git fetch upstream main
git checkout $RAPIDS_SHA
export RAPIDS_SHA
Step 3: Authenticate with GitHub (for test jobs)
Test scripts use rapids-download-conda-from-github /
rapids-download-from-github which require GitHub authentication via GH_TOKEN.
Option A — pass the token directly:
docker run \
...
--env "GH_TOKEN=<your-token>" \
...
Option B — use an env file:
Create a .env file with GH_TOKEN=<your-token> and mount it:
docker run \
...
--env-file "$(pwd)/.env" \
...
On shared machines, passing tokens via --env may expose them in ps output.
Prefer --env-file in that case.
Ask the user for their GH_TOKEN if not already set in the environment.
Step 4: Run the CI script
./ci/build_cpp.sh
Full build-then-test workflow (local artifacts)
CI builds and tests run on separate machines. To do a complete local
build+test cycle without downloading artifacts from GitHub, redirect the
channel variables to use locally-built packages:
sed -ri '/rapids-download.*from-github/ s/_CHANNEL=.*/_CHANNEL=${RAPIDS_CONDA_BLD_OUTPUT_DIR}/' ci/*.sh
./ci/build_cpp.sh
./ci/build_python.sh
./ci/test_cpp.sh
./ci/test_python.sh
Important: this sed modifies CI scripts in your working tree. Either do
this on a throwaway branch or revert the changes afterward
(git checkout -- ci/).
Common issues
Missing git tags
Some builds need version info from git tags. If you see errors like
'GIT_DESCRIBE_NUMBER' is undefined, fetch tags from upstream:
git fetch git@github.com:rapidsai/rapidsmpf.git --tags
GPU driver version mismatch
CI tests may run on different GPU driver versions. If local test results differ
from CI, compare nvidia-smi output between your machine and the CI job logs.
Build artifacts cannot be uploaded
Locally-built artifacts cannot be uploaded to CI artifact storage. Fix build
failures locally, push to PR, and let CI produce the artifacts for test jobs.
Putting it all together — example
The agent should construct and run the full docker run command based on user
input. Here is an example reproducing a failing test_python.sh from PR #123:
docker run \
--rm \
--gpus all \
--pull=always \
--cap-add CAP_SYS_PTRACE \
--shm-size=8g \
--ulimit nofile=1000000:1000000 \
--volume $PWD:/repo \
--workdir /repo \
--env-file "$(pwd)/.env" \
--env RAPIDS_BUILD_TYPE=pull-request \
--env RAPIDS_REPOSITORY=rapidsai/rapidsmpf \
--env RAPIDS_REF_NAME=pull-request/123 \
--env RAPIDS_SHA=<commit-sha> \
<CONTAINER_IMAGE> \
bash -c "./ci/test_python.sh"
Replace <CONTAINER_IMAGE> with the exact image from the CI job logs (see
"Identifying the container image" above) and <commit-sha> with the full SHA
from the CI run (see "Setting RAPIDS_SHA" above).
To get an interactive shell instead, drop bash -c "..." and add -it:
docker run \
--rm \
--gpus all \
--pull=always \
--cap-add CAP_SYS_PTRACE \
--shm-size=8g \
--ulimit nofile=1000000:1000000 \
--volume $PWD:/repo \
--workdir /repo \
--env-file "$(pwd)/.env" \
--env RAPIDS_BUILD_TYPE=pull-request \
--env RAPIDS_REPOSITORY=rapidsai/rapidsmpf \
--env RAPIDS_REF_NAME=pull-request/123 \
--env RAPIDS_SHA=<commit-sha> \
-it <CONTAINER_IMAGE>