| name | stress-test |
| description | Generates comprehensive smoke and stress tests for infrastructure components (sandbox runtimes, container orchestration, firewall rules, etc.), prioritizing end-to-end realism that catches real setup issues across platforms. Activate when the user asks to "stress test", "smoke test", "test the sandbox", "verify isolation", or wants comprehensive infrastructure testing for a component.
|
Stress Test Skill
Generate comprehensive smoke and stress tests for infrastructure components. Prioritize end-to-end realism over unit-test purity — the goal is catching setup issues that surface on real platforms (Linux, macOS Colima/OrbStack, WSL2), not exercising code paths in isolation.
When to Use
- "Stress test X" / "Smoke test X"
- "Add tests for the sandbox" / "Test the container setup"
- "Verify isolation" / "Test the firewall"
- "Make sure this works on macOS/Linux/WSL2"
- Any request for comprehensive infrastructure testing
Principles
- Two tiers of tests: Static config validation (pytest, no Docker needed, runs everywhere) + live runtime checks (bash, needs Docker, proves isolation actually works).
- Parametrize aggressively: If 3+ tests share a pattern (check property X on services A/B/C), use
@pytest.mark.parametrize. One parametrized test > three copy-pasted tests.
- No duplicates: Before writing a new test, search existing test files for overlap. If
test_claude_wrapper.py already checks runtime defaults, don't re-check in test_sandbox_config.py.
- Fixtures for repeated I/O: If multiple tests read the same file (compose YAML, allowlist JSON), use a
@pytest.fixture so the file is parsed once per session.
- Catch real failures, not keywords:
assert "chown" in content is weak. assert "chown -R root:root" in content catches the actual operation. Prefer asserting on the specific string that would break if the behavior changed.
- Bash smoke tests should be self-contained: Each check should pass/fail/warn independently. Use
FAILURES counter, not set -e for individual checks. Print PASS:/FAIL: prefixes for grep-ability.
- Guard against false positives: When comparing values that could be empty (e.g., namespace readlinks), assert non-empty before comparing. An empty-vs-empty comparison silently passes.
- Test what matters for each platform: macOS issues are usually volume mounts (virtiofs/9p) and runtime installation (Colima SSH). Linux issues are usually KVM availability and cgroup drivers. WSL2 issues are usually nested virtualization.
Workflow
Step 1: Identify the Component
Determine what infrastructure is being tested: sandbox runtime (runsc/kata), network isolation (firewall/squid/dnsmasq), container orchestration (compose lifecycle), entrypoint hardening, or cross-platform setup scripts.
Step 2: Inventory Existing Tests
Search tests/ and bin/check-*.bash for existing coverage. List what's already tested to avoid duplication.
Step 3: Static Config Tests (pytest)
Write pytest tests that validate configuration correctness by parsing config files (YAML, JSON, bash scripts) directly. These catch misconfigurations without needing Docker:
- Parse
docker-compose.yml → verify capabilities, network topology, resource limits, environment variables
- Parse domain allowlist → verify access modes, no wildcards, no raw IPs
- Parse entrypoint/firewall scripts → verify hardening steps present
- Parse setup.bash → verify platform detection logic exists
Use @pytest.mark.parametrize for service-level properties that repeat across containers.
Step 4: Live Runtime Tests (bash)
Write bash scripts that exercise the actual runtime. These require Docker but prove isolation works end-to-end:
- Container starts with the sandbox runtime
- Process isolation:
/proc only shows container processes
- Device isolation: no host devices visible
- Network isolation: internal network blocks egress
- Filesystem: read-only enforcement works
- Volume mounts: bind mounts work correctly (critical on macOS)
- Capabilities:
CapEff=0 with --cap-drop=ALL
Step 5: CI Workflow
Create a GitHub Actions workflow that runs both tiers:
- Static tests: run on every PR (fast, no Docker needed)
- Runtime tests: run on sandbox-related path changes (needs Docker, installs runtime)
- Both push and pull_request triggers with consistent
paths filters
- Always set
timeout-minutes
Step 6: Iterative Critique-Fix Loop
After writing tests, run an iterative critique-fix loop until a full pass finds nothing worth changing. Read what you actually wrote, not what you intended to write.
Each pass, hunt for:
- Duplicates: tests that overlap with existing files (search
tests/ before writing)
- Constants-as-tests:
assert value == "hardcoded" that will break when someone legitimately changes the config. Replace with invariant/relationship checks (e.g., "proxy env vars reference the firewall's IP" reads both from compose, not a hardcoded IP)
- Compression: 3+ tests sharing a pattern → parametrize into one. Near-identical assertions → table-driven
- False positives: comparisons where both sides could be empty (assert non-empty first), substring checks that match comments instead of code
- Fragile bash:
A && B || C (not if-then-else), missing cleanup for Docker networks/temp dirs, set -e killing the whole script on a non-critical check failure
- Missing coverage: important invariant not tested, platform-specific failure mode not covered
Fix each issue, then start a fresh pass — the previous critique output is stale after fixes (fixes introduce their own bugs).
Stop when a full pass returns nothing actionable. Cap at 3 passes; if real issues remain at pass 3, list them and ask the user.
Skip for trivial additions (one new test for an obvious gap) — say so explicitly.
Examples
Example 1: "Stress test the runsc setup"
Claude's actions:
- Searches
tests/ for existing runsc tests, finds test_compose_runtime_defaults_to_runsc in wrapper tests
- Creates
tests/test_sandbox_config.py with parametrized config checks (caps, limits, network, proxy)
- Creates
bin/check-runsc-smoke.bash with live isolation checks (process, device, network, mount, filesystem)
- Creates
.github/workflows/runsc-smoke.yaml with static + runtime jobs
- Runs pytest locally, verifies all pass
- Self-critiques: finds 3 duplicate tests with wrapper file, removes them; parametrizes 8 similar tests into 2
Example 2: "Test the firewall rules"
Claude's actions:
- Checks existing
check-compose-lifecycle.bash — finds basic connectivity check
- Adds pytest tests parsing
init-firewall.bash for DROP policies, IPv6 lockdown, egress quota, conntrack limits
- Adds bash tests that run inside compose stack: blocked domain unreachable, allowed domain resolves, squid blocks POST to ro domain, DNS exfil via subdomain blocked
- Parametrizes the "blocked vs allowed" tests over a domain list