| name | ci-workflow-guide |
| description | Guide to SGLang CI workflow orchestration — stage ordering, fast-fail, gating, partitioning, execution modes, and debugging CI failures. Use when modifying CI workflows, adding stages, debugging CI pipeline issues, or understanding how tests are dispatched and gated across stages. |
SGLang CI Workflow Orchestration Guide
This skill covers the CI infrastructure layer — how tests are dispatched, gated, and fast-failed across stages. For test authoring (templates, fixtures, registration, model selection), see the write-sglang-test skill.
Naming Conventions
- Suite:
base-{a,b,c}-test-{gpu_count}-gpu-{hardware} (e.g., base-b-test-1-gpu-small)
- Test group: Directory-level registered test group under
test/registered/ (e.g., hicache maps to test/registered/hicache/test_*.py)
- CI runner:
{gpu_count}-gpu-{hardware} (e.g., 1-gpu-5090, 4-gpu-h100, 8-gpu-h200)
Key Files
| File | Role |
|---|
.github/workflows/pr-test.yml | Main workflow — all stages, jobs, conditions, matrix definitions |
.github/workflows/pr-test-extra.yml | Extra workflow — gated by BOTH run-ci and run-ci-extra labels |
.github/workflows/pr-gate.yml | PR gating: draft check, run-ci label, per-user rate limiting |
.github/actions/check-pr-test-health/action.yml | Cross-job fast-fail: queries API for any failed job |
.github/actions/wait-for-jobs/action.yml | Stage gating: polls API until stage jobs complete |
.github/actions/check-maintenance/action.yml | Maintenance mode check |
test/run_suite.py | Suite runner: collects, filters, partitions, executes tests |
python/sglang/test/ci/ci_register.py | Test registration (AST-parsed markers), LPT auto-partition |
python/sglang/test/ci/ci_utils.py | run_unittest_files(): execution, retry, continue-on-error |
scripts/ci/utils/slash_command_handler.py | Handles slash commands from PR comments |
Architecture Overview
┌──────────────┐
│ build kernel │
└──────┬───────┘
│
├─ check-changes ──── detects which packages changed
│ (main_package, sgl_kernel, jit_kernel, multimodal_gen)
│
├─ call-gate ──────── pr-gate.yml (draft? label? rate limit?)
│
├─────────────────────────────────────────────────────┐
│ │
▼ │
┌─────────────────────────────────────┐ │
│ Base A (~3 min) │ │
│ pre-flight check │ │
│ │ │
│ ┌─────────────────────────────┐ │ │
│ │ base-a-test-1-gpu-small │ │ │
│ │ (small GPUs) │ │ │
│ └─────────────────────────────┘ │ │
│ ┌─────────────────────────────┐ │ │
│ │ base-a-test-cpu │ │ │
│ │ (CPU) │ │ │
│ └─────────────────────────────┘ │ │
└──────┬──────────────────────────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────┐ ┌──────────────────────────┐
│ Base B (~30 min) │ │ kernel test │
│ base tests │ └──────────────────────────┘
│ │ ┌──────────────────────────┐
│ ┌─────────────────────────────┐ │ │ multimodal gen test │
│ │ base-b-test-1-gpu-small │ │ └──────────────────────────┘
│ │ (small GPUs, e.g. 5090) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-b-test-1-gpu-large │ │
│ │ (large GPUs, e.g. H100) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-b-test-2-gpu-large │ │
│ │ (large GPUs, e.g. H100) │ │
│ └─────────────────────────────┘ │
└──────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Base C (~30 min) │
│ advanced tests │
│ │
│ ┌─────────────────────────────┐ │
│ │ base-c-test-4-gpu-h100 │ │
│ │ (H100 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-c-test-8-gpu-h200 │ │
│ │ (8 x H200 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-c-test-4-gpu-b200 │ │
│ │ (4 x B200 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ Other advanced tests │ │
│ │ (DeepEP, PD Disagg, GB300) │ │
│ └─────────────────────────────┘ │
└──────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ pr-test-finish │
│ aggregates all results, fails if │
│ any job failed/cancelled │
└─────────────────────────────────────┘
Every stage test job includes a check-pr-test-health step after checkout — if any job in the run has already failed, the job fast-fails (red X) with a root cause annotation.
Scheduled runs skip wait-for-base-* jobs, running all stages in parallel. Fast-fail is also disabled.
Fast-Fail Layers
4 layers of fast-fail, from fine to coarse:
| Layer | Mechanism | Granularity | Disabled on schedule? |
|---|
| 1. Test method → file | unittest -f (failfast) | One test method fails → entire test file stops immediately | Yes |
| 2. File → suite | run_unittest_files() default | One test file fails → entire suite stops (--continue-on-error off) | Yes |
| 3. Job → job (same stage) | check-pr-test-health action | One job fails → other waiting jobs in same stage fast-fail (red X) | Yes |
| 4. Stage → stage (cross-stage) | wait-for-base-* + needs | Base A fails → base B/C jobs skip entirely (never get a runner) | Yes (wait jobs skipped) |
- Layer 1:
-f flag appended to all python3 -m pytest / unittest invocations in ci_utils.py
- Layer 2:
--continue-on-error flag in run_suite.py — off for PRs, on for scheduled runs
- Layer 3:
check-pr-test-health auto-detects schedule event and skips; filters out cascade failures to show only root cause jobs
- Layer 4:
wait-for-base-* jobs are conditioned on github.event_name == 'pull_request' — skipped for scheduled runs
Execution Modes
| Aspect | PR (pull_request) | Scheduled (cron, every 6h) | /rerun-stage (workflow_dispatch) |
|---|
| Stage ordering | Sequential: A → B → C via wait-for-base-* | Parallel (all at once) | Single target stage only |
| Cross-job fast-fail | Yes (check-pr-test-health) | Yes | Yes |
| continue-on-error | No (stop at first failure within suite) | Yes (run all tests) | No |
| Retry | Enabled | Enabled | Enabled |
| max_parallel | 3 (default), 14 if high priority label | 14 | 3 (default), 14 if high priority |
| PR gate | Yes (draft, label, rate limit) | Skipped | Skipped |
| Concurrency | cancel-in-progress: true per branch | Queue (no cancel) | Isolated per stage+SHA |
Stage Gating (wait-for-jobs action)
wait-for-base-a and wait-for-base-b are lightweight ubuntu-latest jobs that poll the GitHub Actions API.
How it works:
- Calls
listJobsForWorkflowRun to list all jobs in the current run
- Matches jobs by exact name or prefix (for matrix jobs, e.g.,
base-b-test-1-gpu-small (3))
- If any matched job has
conclusion === 'failure' → fail immediately (fast-fail)
- If all matched jobs are completed and count matches
expected_count → success
- Otherwise → sleep
poll-interval-seconds (default: 60s) and retry
- Timeout after
max-wait-minutes (240 min for base-a, 480 min for base-b)
Job specs example (base-b):
[
{"prefix": "base-b-test-1-gpu-small", "expected_count": 8},
{"prefix": "base-b-test-1-gpu-large", "expected_count": 14},
{"prefix": "base-b-test-2-gpu-large", "expected_count": 4},
{"prefix": "base-b-test-4-gpu-b200", "expected_count": 1}
]
Critical: expected_count must match the matrix size. If you add/remove matrix entries, update the wait job's spec accordingly.
PR only: Condition github.event_name == 'pull_request' && !inputs.target_stage — scheduled runs and /rerun-stage skip these entirely, allowing parallel execution.
Cross-Job Fast-Fail (check-pr-test-health action)
Composite action called after checkout in every stage test job (21 jobs total across pr-test.yml, pr-test-multimodal-gen.yml, pr-test-sgl-kernel.yml, pr-test-jit-kernel.yml).
How it works:
- Queries
listJobsForWorkflowRun for the current workflow run
- Filters for root cause failures only — jobs with
conclusion === 'failure' whose failing step is NOT check-pr-test-health (excludes cascade failures)
- If root cause failures found → calls
core.setFailed() with the list of root cause job names
- If none → does nothing (step succeeds)
Cascade filtering: When job A fast-fails due to health check, it also has conclusion: failure. Without filtering, job B would list both the original failure AND job A's fast-fail. The filter checks each failed job's steps array — if the failing step name contains check-pr-test-health or Check PR test health, it's excluded from the root cause list.
Usage pattern:
steps:
- name: Checkout code
uses: actions/checkout@v4
...
- uses: ./.github/actions/check-pr-test-health
id: pr-test-health
- name: Install dependencies
...
- name: Run test
...
Visual effect: Job shows red X (failure) with error annotation showing root cause job names. Subsequent steps are naturally skipped (default if: success() is false after a failed step). No per-step if guards needed.
No stage filtering: Checks ALL jobs in the run, not just the current stage. Any failure anywhere triggers fast-fail.
Error message example:
Fast-fail: skipping — root cause job(s): base-b-test-1-gpu-small (0), base-b-test-1-gpu-small (1)
Within-Suite Failure Handling
Controlled by run_unittest_files() in python/sglang/test/ci/ci_utils.py.
Flags
| Flag | PR default | Scheduled default | Effect |
|---|
--continue-on-error | Off | On | Off: stop at first failure. On: run all files, report all failures at end |
--enable-retry | On | On | Retry retriable failures (accuracy/perf assertions) |
--max-attempts | 2 | 2 | Max attempts per file including initial run |
Retry Classification
When a test fails and retry is enabled, the output is classified:
Non-retriable (checked first — real code errors):
SyntaxError, ImportError, ModuleNotFoundError, NameError, TypeError, AttributeError, RuntimeError, CUDA out of memory, OOM, Segmentation fault, core dumped, ConnectionRefusedError, FileNotFoundError
Retriable (accuracy/performance):
AssertionError with comparison patterns (not greater than, not less than, not equal to), accuracy, score, latency, throughput, timeout
Default: Unknown AssertionError → retriable. Other unknown failures → not retriable.
How continue_on_error is set
In pr-test.yml's check-changes job:
schedule runs or run_all_tests flag → continue_on_error = 'true'
- PR runs →
continue_on_error = 'false'
Each test job propagates via:
env:
CONTINUE_ON_ERROR_FLAG: ${{ needs.check-changes.outputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
run: |
python3 run_suite.py --hw cuda --suite <name> $CONTINUE_ON_ERROR_FLAG
Test Partitioning
Large suites are split across matrix jobs using the LPT (Longest Processing Time) heuristic in ci_register.py:auto_partition():
- Sort tests by
est_time descending, filename as tie-breaker (deterministic)
- Greedily assign each test to the partition with smallest cumulative time
- Result: roughly equal total time per partition
Partition table (CUDA per-commit suites):
| Suite | Partitions | Runner | max_parallel |
|---|
base-a-test-1-gpu-small | 1 (no matrix) | 1-gpu-5090 | — |
base-a-test-cpu | 4 | ubuntu-latest | — |
base-b-test-1-gpu-small | 8 | 1-gpu-5090 | 8 |
base-b-test-1-gpu-large | 14 | 1-gpu-h100 | dynamic (3 or 14) |
base-b-test-2-gpu-large | 4 | 2-gpu-h100 | — |
base-b-test-4-gpu-b200 | 1 (no matrix) | 4-gpu-b200 | — |
base-b-kernel-unit-1-gpu-large | 1 (no matrix) | 1-gpu-h100 | — |
base-b-kernel-unit-1-gpu-b200 | 1 (no matrix) | 4-gpu-b200 | — |
base-b-kernel-unit-8-gpu-h200 | 1 (no matrix) | 8-gpu-h200 | — |
base-b-kernel-benchmark-1-gpu-large | 1 (no matrix) | 1-gpu-h100 | — |
base-c-test-4-gpu-h100 | 3 | 4-gpu-h100 | — |
base-c-test-8-gpu-h200 | 4 | 8-gpu-h200 | — |
base-c-test-8-gpu-h20 | 2 | 8-gpu-h20 | — |
base-c-test-deepep-4-gpu-h100 | 1 (no matrix) | 4-gpu-h100 | — |
base-c-test-4-gpu-b200 | 3 | 4-gpu-b200 | — |
base-c-test-4-gpu-b200-small | 3 | 4-gpu-b200-low-disk | — |
base-c-test-8-gpu-b200 | registered only | 8-gpu-b200 | — |
base-c-test-4-gpu-gb200 | registered only | 4-gpu-gb200 | — |
Note: Kernel suites (base-b-kernel-*) run via pr-test-jit-kernel.yml and pr-test-sgl-kernel.yml, not the main pr-test.yml. base-c-test-8-gpu-b200 is registered in test/run_suite.py but not wired to PR CI. The GB200 job is currently commented out in pr-test.yml until a company-owned runner is provisioned. Multimodal diffusion uses python/sglang/multimodal_gen/test/run_suite.py, not test/run_suite.py.
Workflow usage:
strategy:
matrix:
partition: [0, 1, 2, 3, 4, 5, 6, 7]
steps:
- run: python3 run_suite.py --hw cuda --suite base-b-test-1-gpu-small \
--auto-partition-id ${{ matrix.partition }} --auto-partition-size 8
check-changes Job
Determines which test suites to run based on file changes.
Detection Methods
| Trigger | Method | Details |
|---|
pull_request | dorny/paths-filter | Detects changes via GitHub diff |
workflow_dispatch (with pr_head_sha) | GitHub API | repos/{repo}/compare/main...{sha} |
schedule / run_all_tests | Force all true | Runs everything |
Output Flags
| Output | Triggers |
|---|
main_package | Base A/B/C test suites |
sgl_kernel | Kernel wheel builds + kernel test suites; also switches B200 jobs to kernel-build runner labels outside target_stage mode |
jit_kernel | JIT kernel test workflow |
multimodal_gen | Multimodal-gen test workflow |
Note: In target_stage mode, sgl_kernel is only active when include_wheel_build=true. Without that opt-in, kernel-change reruns fail validation instead of running a target stage without freshly built wheels. Outside target_stage, sgl_kernel=true switches B200 jobs from 4-gpu-b200 / 4-gpu-b200-low-disk to 4-gpu-b200-kernel / 4-gpu-b200-kernel-low-disk.
Concurrency Control
group: pr-test-{event_name}-{branch}-{pr_sha}-{stage}
| Segment | Source | Purpose |
|---|
event_name | github.event_name | Prevents scheduled runs colliding with fork PRs named main |
branch | github.head_ref || github.ref_name | Per-branch isolation |
pr_sha | inputs.pr_head_sha || 'current' | Isolates /rerun-stage from main runs |
stage | inputs.target_stage || 'all' | Allows parallel stage dispatches |
cancel-in-progress: true for pull_request events (new push cancels old run), false for workflow_call.
How To: Add a New Stage Job
- Define the job in
pr-test.yml with needs: [check-changes, call-gate, wait-for-base-X, ...]
- Copy the
if: condition pattern from an existing same-stage job (handles target_stage, schedule, main_package)
- Add
checkout step
- Add
check-pr-test-health step (after checkout) — if any prior job failed, core.setFailed() fires and all subsequent steps auto-skip via default if: success()
- Add
check-maintenance step
- Add
download-artifact step if sgl_kernel changed
- Add
install dependencies step
- Add
run test step with $CONTINUE_ON_ERROR_FLAG
- Add
upload-cuda-coredumps step with if: always()
- Register the suite name in
PER_COMMIT_SUITES in test/run_suite.py
- If using matrix, add
--auto-partition-id and --auto-partition-size to the run command
- Update
wait-for-base-X job spec with the new job name and expected_count (if matrix)
- Add the job to
pr-test-finish.needs list
How To: Debug CI Failures
| Symptom | Likely cause | What to check |
|---|
| All stage-B/C jobs green but steps skipped | Earlier job failed, check-pr-test-health triggered | Find the actual failed job (red X) |
wait-for-base-b timeout | expected_count doesn't match matrix size | Verify job spec counts match matrix: array length |
pr-test-finish fails but all jobs green | A job was cancelled (counts as failure in finish) | Check concurrency cancellation |
| Tests pass locally but fail in CI | Partition assignment, runner GPU type, or est_time inaccuracy | Check which partition the test lands in; verify runner label |
| Flaky test retried and passed | Retriable failure (accuracy/perf) | Check [CI Retry] markers in job logs |
| Flaky test NOT retried | Matched non-retriable pattern | Check if error matches NON_RETRIABLE_PATTERNS in ci_utils.py |
Slash Commands
| Command | Effect |
|---|
/tag-run-ci-label | Adds run-ci label to PR |
/tag-run-ci-label extra | Adds both run-ci and run-ci-extra labels |
/rerun-failed-ci | Reruns failed jobs in the latest workflow run |
/tag-and-rerun-ci | Adds run-ci label + reruns failed |
/tag-and-rerun-ci extra | Adds both run-ci and run-ci-extra labels + reruns failed |
/rerun-stage <stage> | Deprecated; posts deprecation notice |
/rerun-test <test-file> | Reruns a specific test file via rerun-test.yml |
/rerun-group <group> [<group> ...] | Expands registered test groups, then reuses /rerun-test |
Handled by scripts/ci/utils/slash_command_handler.py → .github/workflows/slash-command-handler.yml.
Label-gated workflow dispatch (pr-test, pr-test-extra)
pr-test.yml and pr-test-extra.yml both listen for pull_request.labeled (in addition to opened/synchronize/reopened). The check-changes.if gate has two clauses:
- For
labeled events: the just-added label must be one of the gating labels (run-ci for pr-test, run-ci or run-ci-extra for pr-test-extra) — otherwise every unrelated label addition would dispatch a full CI run.
- All events: the PR must currently carry the required labels.
This is what lets /tag-run-ci-label (and the extra variant) trigger a fresh CI run without an extra push.
Caveat — skipped runs cannot be un-skipped by run.rerun(): GitHub's rerun API reuses the original event payload, so rerunning a pull_request-event run that was skipped because of missing labels will skip again (label set in the frozen payload doesn't update). The only way to recover a label-skipped run is to add the missing label, which fires a fresh labeled event with the current label set. handle_rerun_failed_ci in the slash handler is for rerunning failed/non-label-skipped runs; it cannot revive label-skipped ones.