| name | parallel-test-execution |
| description | Use when planning, reviewing, or reporting addon smoke/functional/chaos tests on resource-rich clusters such as idc, idc2, or idc4. Ensures independent suites run in parallel safely by checking capacity, namespace isolation, batch sizing, cleanup scope, and environment-vs-product failure classification. |
| allowed-tools | Bash(kubectl *) Bash(rg *) Read |
Parallel Test Execution
Hard Rules
- Parallelize independent suites, not everything.
- Measure capacity before choosing N. CPU, memory, storage, pod IPs, namespace quota, and image pull path all count.
- Every parallel suite needs a unique namespace or resource prefix.
- Every suite needs its own evidence directory and cleanup selector.
- Destructive chaos must prove target disjointness before it runs beside another suite.
- Environment pressure is not a product failure. Setup, image pull, PVC, CNI, or runner startup failure must be classified before blaming the addon.
- Increase N gradually. Start with N=1, then N=2, then N=4; stop at the first shared-resource pressure signal.
- Escalate real IDC environment issues with evidence. For idc / idc2 / idc4 IP shortage, cluster instability, entry/routing failure, or base resource shortage, send a plain-language evidence packet to @Musk3 for Feishu user 李国银. Do not send vague "environment unstable" reports.
When To Invoke
Use this skill when:
- planning tests on idc / idc2 / idc4 or another resource-rich shared cluster
- deciding whether to run addon suites sequentially or in parallel
- reviewing a runner that starts multiple namespaces / topologies / chaos suites
- summarizing a parallel test batch
- investigating failures that appear only when several suites run together
Workflow
- Classify suites
| Lane | Examples | Parallel Rule |
|---|
| Static / read-only | render, lint, schema, image audit | freely parallel unless checkout state is shared |
| Isolated functional | smoke in unique namespace | parallel up to capacity cap |
| Destructive chaos | pod kill, network partition, disk pressure | parallel only with disjoint targets/selectors |
| Long soak / large data | 24h soak, large restore, high writer count | reserve capacity; avoid blind mixing |
-
Check cluster capacity
kubectl get nodes -o wide
- namespace quota / ResourceQuota
- pod IP / CNI capacity if known
- StorageClass / CSI health
- image registry / mirror availability
- KubeBlocks controller health
-
Assign suite identity
- namespace:
<addon>-<suite>-<version>-<run-id>
- evidence directory: same run id
- cleanup selector: same run id
- report row: same run id
-
Ramp N gradually
- N=1: baseline
- N=2: isolation proof
- N=4: useful batch
- N=8+: only with explicit headroom evidence
-
Watch shared-resource pressure
- Pending pods by reason
- image pull latency / failures
- PVC pending / attach latency
- pod IP allocation errors
- controller reconcile lag
- namespace residue after cleanup
-
Classify failures
- setup/env failure: namespace, image, PVC, CNI, runner startup, quota
- harness failure: cleanup selector overlap, shared temp path, kubeconfig/context bleed
- product failure: product path fails again in isolated rerun or has direct product evidence
-
Escalate only confirmed IDC environment blockers
- collect environment name, namespace/run id, first blocked command, exact output, and capacity signal
- state whether lowering N, splitting the batch, or isolated rerun was already tried
- ask @Musk3 to relay to Feishu user 李国银 with the concrete request
- continue non-dependent evidence collection or test execution while waiting
Review Checklist
Before approving a parallel run:
Closeout Format
Use this shape:
Parallel batch accepted: cluster=<idc2>, requested_N=4, effective_N=4,
4 isolated namespaces, product_fail=0, env_pressure=0, cleanup_residue=0,
no cross-suite selector overlap.
If blocked:
Parallel batch blocked at N=8: pod IP exhaustion observed before product path.
Lower cap to N=4 and rerun; do not count this as addon failure.
Escalation: @Musk3 -> Feishu user 李国银, with env=idc2, namespace/run-id,
first blocked command, CNI/IP event, impact, and requested action.
Related Docs
docs/addon-resource-rich-cluster-parallel-test-guide.md
docs/addon-idc-vcluster-migration-checklist-guide.md
docs/addon-multi-ns-registry-scan-preflight-guide.md
docs/addon-test-environment-gate-hygiene-guide.md
docs/addon-test-intensity-templates-guide.md