원클릭으로
ci-reproduce-on-gcp-vm
// Reproduce CI test failures on a GCP VM matching the CI environment. Use when a CI job fails and the issue cannot be reproduced locally (e.g., kernel-dependent BPF verifier failures, kernel version-specific bugs).
// Reproduce CI test failures on a GCP VM matching the CI environment. Use when a CI job fails and the issue cannot be reproduced locally (e.g., kernel-dependent BPF verifier failures, kernel version-specific bugs).
Implements a new Calico API resource by plumbing it through all layers of the codebase. Use after API design is complete (see design-kubernetes-api skill).
Designs a best-practice, extensible Kubernetes API resource. Use when designing new API resources for Calico.
| name | ci-reproduce-on-gcp-vm |
| description | Reproduce CI test failures on a GCP VM matching the CI environment. Use when a CI job fails and the issue cannot be reproduced locally (e.g., kernel-dependent BPF verifier failures, kernel version-specific bugs). |
CI runs Felix tests on GCP VMs using specific Ubuntu image families. The local dev machine may have a different kernel, so some failures only reproduce on the CI kernel. This skill creates a GCP VM matching the CI environment, runs the failing test, and cleans up.
gcloud CLI authenticated with access to the tigera-dev projectCheck felix/.semaphore/fv-prologue to find the image family for the failing CI job. The mapping is based on FELIX_TEST_GROUP:
| Test group pattern | IMAGE_FAMILY | Ubuntu version |
|---|---|---|
22.04 | ubuntu-2204-lts | 22.04 Jammy |
24.04 | ubuntu-2404-lts-amd64 | 24.04 Noble |
25.10 | ubuntu-2510-amd64 | 25.10 Plucky |
For example, the bpf-24.04-ipt-with-ut test group uses ubuntu-2404-lts-amd64.
If in doubt, read felix/.semaphore/fv-prologue and .semaphore/vms/vm-bootstrap.sh for the latest mappings.
zone=us-central1-a
vm_name=<user>-debug
image_family=ubuntu-2404-lts-amd64 # from Step 1
gcloud config set project tigera-dev
gcloud --quiet compute instances create "${vm_name}" \
--zone=${zone} \
--image-family=${image_family} \
--image-project=ubuntu-os-cloud \
--machine-type=n4-highcpu-4 \
--boot-disk-size=20G \
--boot-disk-type=hyperdisk-balanced
The machine type and disk size above match CI defaults (see felix/.semaphore/fv-prologue).
The VM bootstrap in CI is done by .semaphore/vms/vm-bootstrap.sh. Replicate its key steps:
ssh_cmd="gcloud --quiet compute ssh --zone=${zone} ubuntu@${vm_name} --"
# Wait for SSH
for i in $(seq 1 10); do
${ssh_cmd} echo "SSH ready" && break
sleep 2
done
# Install prerequisites
${ssh_cmd} "sudo apt-get update -y && sudo apt-get install -y --no-install-recommends apt-transport-https ca-certificates curl software-properties-common"
# Add Docker repo (DEB822 format, matching .semaphore/vms/vm-bootstrap.sh)
${ssh_cmd} "sudo install -d -m 0755 /etc/apt/keyrings && curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /tmp/docker.gpg && sudo mv /tmp/docker.gpg /etc/apt/keyrings/docker.asc && sudo chmod 0644 /etc/apt/keyrings/docker.asc"
${ssh_cmd} "ubuntu_codename=\$(. /etc/os-release && echo \"\${UBUNTU_CODENAME:-\$VERSION_CODENAME}\") && printf '%s\n' 'Types: deb' 'URIs: https://download.docker.com/linux/ubuntu' \"Suites: \${ubuntu_codename}\" 'Components: stable' 'Architectures: amd64' 'Signed-By: /etc/apt/keyrings/docker.asc' | sudo tee /etc/apt/sources.list.d/docker.sources > /dev/null"
${ssh_cmd} "sudo apt-get update -y"
# Install Docker and tools — pin versions to match CI (see .semaphore/vms/vm-bootstrap.sh)
# Noble (24.04): docker-ce=5:27.5.1-1~ubuntu.24.04~noble
# Jammy (22.04): docker-ce=5:20.10.14~3-0~ubuntu-jammy
# If unsure, omit the version pin to get the latest.
${ssh_cmd} "sudo apt-get install -y --no-install-recommends docker-ce docker-ce-cli docker-buildx-plugin containerd.io git make iproute2 wireguard"
# Post-install setup
${ssh_cmd} "sudo usermod -a -G docker ubuntu"
${ssh_cmd} "sudo modprobe ipip"
# Configure Docker with IPv6 (required by many FV tests)
${ssh_cmd} 'echo "{\"ipv6\": true, \"fixed-cidr-v6\": \"2001:db8:1::/64\"}" | sudo tee /etc/docker/daemon.json'
${ssh_cmd} "sudo systemctl restart docker"
# Match CI's sysctl setting (loose reverse path filtering)
${ssh_cmd} "sudo sysctl -w net.ipv4.conf.all.rp_filter=2"
# Get the current branch name
branch=$(git rev-parse --abbrev-ref HEAD)
remote_url=$(git remote get-url origin)
${ssh_cmd} "git clone ${remote_url} calico && cd calico && git checkout ${branch}"
If the remote is an SSH URL and the VM doesn't have SSH keys, use the HTTPS URL instead:
# Convert git@github.com:user/repo.git to https://github.com/user/repo.git
https_url=$(echo "${remote_url}" | sed 's|git@github.com:|https://github.com/|')
${ssh_cmd} "git clone ${https_url} calico && cd calico && git checkout ${branch}"
Check the kernel version first to confirm it differs from local:
${ssh_cmd} "uname -r"
Then run the specific test. Common patterns:
# BPF unit test (e.g., verifier loadability)
${ssh_cmd} "cd calico/felix && make FOCUS=TestPrecompiledBinariesAreLoadable ut-bpf"
# Specific BPF unit test
${ssh_cmd} "cd calico/felix && make FOCUS=TestNATNodePortNoFWD ut-bpf"
# Felix FV test
${ssh_cmd} "cd calico/felix && make fv GINKGO_FOCUS='TestName'"
# BPF FV test
${ssh_cmd} "cd calico/felix && make fv-bpf GINKGO_FOCUS='TestName'"
The first run will be slow (pulls Docker build images). Subsequent runs are faster.
Always delete the VM when done:
gcloud --quiet compute instances delete ${vm_name} --zone=${zone}
| File | Purpose |
|---|---|
felix/.semaphore/fv-prologue | Maps test groups to image families, sets env vars |
.semaphore/vms/vm-bootstrap.sh | VM startup script (Docker install, sysctl, IPv6) |
.semaphore/vms/run-tests-on-vms | Orchestrates VM creation and test execution |
.semaphore/vms/configure-test-vm | Per-VM configuration after bootstrap |
.semaphore/semaphore.yml.d/blocks/20-felix.yml | Felix CI job definitions and test groups |