| name | ci-efficiency |
| description | Audit GitHub Actions workflows for efficiency and recommend fixes to reduce CI minutes and costs. Use when asked to improve CI performance. |
CI Efficiency Audit
Inspect the repository's GitHub Actions workflows, identify waste sources, and recommend targeted fixes to reduce CI minutes and cost.
This Codebase's CI Structure
The CI entry point is MISE (mise run). All builds and tests go through it:
mise run [--variant fips|non-fips] [--link static|dynamic] <task>
Key workflows:
.github/workflows/pr.yml — pull request CI
.github/workflows/main.yml + main_base.yml — push CI
.github/workflows/test_all.yml — full test matrix
.github/workflows/release.yml — release automation
.github/workflows/packaging.yml, packaging-docker.yml, packaging-tests.yml — packaging
Test matrix variants: sqlite, psql, mariadb, percona, wasm, google_cse, gcp_cmek, otel_export, hsm, redis (non-fips), aws_xks (non-fips), azure_ekm (non-fips), ui (non-fips).
Step 1 — Measure First
rg -n "on:|concurrency:|paths:|paths-ignore:|strategy:|matrix:|cache:" .github/workflows
GH_PAGER=cat gh run list --limit 10 --repo Cosmian/kms
run_id=$(GH_PAGER=cat gh run list --limit 1 --json databaseId --jq '.[0].databaseId' --repo Cosmian/kms)
GH_PAGER=cat gh run view "$run_id" --log-failed --repo Cosmian/kms
Look for:
- Missing dependency caches (Rust
~/.cargo, target/, nix store, pnpm store)
- Missing
concurrency groups to cancel stale runs on the same PR
- Over-broad triggers (full matrix on every push to any branch)
- Duplicate workflow coverage (same job in both
pr.yml and main.yml)
- Expensive jobs running regardless of what changed (e.g. UI E2E triggered by Rust-only changes)
Step 2 — Apply Guardrails
Before recommending any fix, verify it passes all guardrails:
- Does not hide required validation — do not remove FIPS/non-FIPS test matrix legs that have explicit version commitments.
- Does not reduce parallelism without justification.
- Preserves security-critical checks — secret scanning, Dependabot, SBOM generation must not be gated behind path filters.
- Write-back jobs (auto-formatting, CLI doc regeneration) must use opt-in triggers, not run on every PR.
- Nix hash update jobs must not be silently skipped.
Step 3 — Select Top 3 Fixes
From these candidates, keep only those supported by audit evidence AND passing all guardrails. Rank by estimated daily CI minutes saved:
- Dependency caching — Cache the Nix store, Rust
~/.cargo/registry, and pnpm store with lockfile-based keys
- Concurrency cancellation — Add
concurrency: { group: "${{ github.ref }}", cancel-in-progress: true } to PR workflows
- Path-based triggers — Use
paths: filters so Rust-only changes don't trigger the full UI E2E suite and vice versa
- Matrix reduction — Run expensive test variants (hsm, cloud providers) only on push to
develop/main, not on every PR
- Job parallelism — Identify jobs currently running sequentially that could run in parallel
- Duplicate workflow removal — Merge overlapping jobs between
pr.yml and main.yml
- Redundant test detection — Identify tests that exercise the same or near-identical code paths under different names (e.g. two test-vector tests that execute similar flows). Redundancy is not limited to textual repetition — look for semantic overlap in test logic
Step 4 — Verify
If gh CLI is available, validate path-gating and concurrency cancellation with a dry-run check.
If live validation is not possible, state that explicitly.
Required Output
- Waste sources — top cost/latency drivers found in step 1
- Proposed fixes — top 3 (or all remaining) with supporting audit evidence
- Validation — what was proven live vs. checked statically, and any remaining risk
- Impact — expected savings (separate PR wall-clock time from total runner time)
If shell or gh CLI access is unavailable: request the user paste .github/workflows/ contents and gh run list --limit 10 output. Begin static-only responses with: "Static-only analysis (not confirmed with live runs)."