تشغيل أي مهارة في Manus بنقرة واحدة

stress-test

النجوم٣٢

التفرعات٣

آخر تحديث٢٧ مايو ٢٠٢٦ في ١٦:١٣

Generates comprehensive smoke and stress tests for infrastructure components (sandbox runtimes, container orchestration, firewall rules, etc.), prioritizing end-to-end realism that catches real setup issues across platforms. Activate when the user asks to "stress test", "smoke test", "test the sandbox", "verify isolation", or wants comprehensive infrastructure testing for a component.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

alexander-turner

alexander-turner/claude-guard

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

SKILL.md

readonly

name	stress-test
description	Generates comprehensive smoke and stress tests for infrastructure components (sandbox runtimes, container orchestration, firewall rules, etc.), prioritizing end-to-end realism that catches real setup issues across platforms. Activate when the user asks to "stress test", "smoke test", "test the sandbox", "verify isolation", or wants comprehensive infrastructure testing for a component.

Stress Test Skill

Generate comprehensive smoke and stress tests for infrastructure components. Prioritize end-to-end realism over unit-test purity — the goal is catching setup issues that surface on real platforms (Linux, macOS Colima/OrbStack, WSL2), not exercising code paths in isolation.

When to Use

"Stress test X" / "Smoke test X"
"Add tests for the sandbox" / "Test the container setup"
"Verify isolation" / "Test the firewall"
"Make sure this works on macOS/Linux/WSL2"
Any request for comprehensive infrastructure testing

Principles

Two tiers of tests: Static config validation (pytest, no Docker needed, runs everywhere) + live runtime checks (bash, needs Docker, proves isolation actually works).
Parametrize aggressively: If 3+ tests share a pattern (check property X on services A/B/C), use @pytest.mark.parametrize. One parametrized test > three copy-pasted tests.
No duplicates: Before writing a new test, search existing test files for overlap. If test_claude_wrapper.py already checks runtime defaults, don't re-check in test_sandbox_config.py.
Fixtures for repeated I/O: If multiple tests read the same file (compose YAML, allowlist JSON), use a @pytest.fixture so the file is parsed once per session.
Catch real failures, not keywords: assert "chown" in content is weak. assert "chown -R root:root" in content catches the actual operation. Prefer asserting on the specific string that would break if the behavior changed.
Bash smoke tests should be self-contained: Each check should pass/fail/warn independently. Use FAILURES counter, not set -e for individual checks. Print PASS:/FAIL: prefixes for grep-ability.
Guard against false positives: When comparing values that could be empty (e.g., namespace readlinks), assert non-empty before comparing. An empty-vs-empty comparison silently passes.
Test what matters for each platform: macOS issues are usually volume mounts (virtiofs/9p) and runtime installation (Colima SSH). Linux issues are usually KVM availability and cgroup drivers. WSL2 issues are usually nested virtualization.

Workflow

Step 1: Identify the Component

Determine what infrastructure is being tested: sandbox runtime (runsc/kata), network isolation (firewall/squid/dnsmasq), container orchestration (compose lifecycle), entrypoint hardening, or cross-platform setup scripts.

Step 2: Inventory Existing Tests

Search tests/ and bin/check-*.bash for existing coverage. List what's already tested to avoid duplication.

Step 3: Static Config Tests (pytest)

Write pytest tests that validate configuration correctness by parsing config files (YAML, JSON, bash scripts) directly. These catch misconfigurations without needing Docker:

Parse docker-compose.yml → verify capabilities, network topology, resource limits, environment variables
Parse domain allowlist → verify access modes, no wildcards, no raw IPs
Parse entrypoint/firewall scripts → verify hardening steps present
Parse setup.bash → verify platform detection logic exists

Use @pytest.mark.parametrize for service-level properties that repeat across containers.

Step 4: Live Runtime Tests (bash)

Write bash scripts that exercise the actual runtime. These require Docker but prove isolation works end-to-end:

Container starts with the sandbox runtime
Process isolation: /proc only shows container processes
Device isolation: no host devices visible
Network isolation: internal network blocks egress
Filesystem: read-only enforcement works
Volume mounts: bind mounts work correctly (critical on macOS)
Capabilities: CapEff=0 with --cap-drop=ALL

Step 5: CI Workflow

Create a GitHub Actions workflow that runs both tiers:

Static tests: run on every PR (fast, no Docker needed)
Runtime tests: run on sandbox-related path changes (needs Docker, installs runtime)
Both push and pull_request triggers with consistent paths filters
Always set timeout-minutes

Step 6: Iterative Critique-Fix Loop

After writing tests, run an iterative critique-fix loop until a full pass finds nothing worth changing. Read what you actually wrote, not what you intended to write.

Each pass, hunt for:

Duplicates: tests that overlap with existing files (search tests/ before writing)
Constants-as-tests: assert value == "hardcoded" that will break when someone legitimately changes the config. Replace with invariant/relationship checks (e.g., "proxy env vars reference the firewall's IP" reads both from compose, not a hardcoded IP)
Compression: 3+ tests sharing a pattern → parametrize into one. Near-identical assertions → table-driven
False positives: comparisons where both sides could be empty (assert non-empty first), substring checks that match comments instead of code
Fragile bash: A && B || C (not if-then-else), missing cleanup for Docker networks/temp dirs, set -e killing the whole script on a non-critical check failure
Missing coverage: important invariant not tested, platform-specific failure mode not covered

Fix each issue, then start a fresh pass — the previous critique output is stale after fixes (fixes introduce their own bugs).

Stop when a full pass returns nothing actionable. Cap at 3 passes; if real issues remain at pass 3, list them and ask the user.

Skip for trivial additions (one new test for an obvious gap) — say so explicitly.

Examples

Example 1: "Stress test the runsc setup"

Claude's actions:

Searches tests/ for existing runsc tests, finds test_compose_runtime_defaults_to_runsc in wrapper tests
Creates tests/test_sandbox_config.py with parametrized config checks (caps, limits, network, proxy)
Creates bin/check-runsc-smoke.bash with live isolation checks (process, device, network, mount, filesystem)
Creates .github/workflows/runsc-smoke.yaml with static + runtime jobs
Runs pytest locally, verifies all pass
Self-critiques: finds 3 duplicate tests with wrapper file, removes them; parametrizes 8 similar tests into 2

Example 2: "Test the firewall rules"

Claude's actions:

Checks existing check-compose-lifecycle.bash — finds basic connectivity check
Adds pytest tests parsing init-firewall.bash for DROP policies, IPv6 lockdown, egress quota, conntrack limits
Adds bash tests that run inside compose stack: blocked domain unreachable, allowed domain resolves, squid blocks POST to ro domain, DNS exfil via subdomain blocked
Parametrizes the "blocked vs allowed" tests over a domain list

المزيد من هذا المستودع

نفس المستودع

pr-creation

alexander-turner/claude-guard

Creates high-quality pull requests with an iterative compress-critique-fix loop before submission. Activate this skill whenever you are asked to create, open, submit, or push a pull request, OR whenever a new feature, fix, or refactor is complete and ready to ship. Also activate when the user says "make a PR", "open a PR", "submit this for review", "push and create a PR", "I'm done, create the PR", "the feature is done", "I'm finished", or any variation of completing work / requesting a pull request. Always activate before running `gh pr create`.

2026-06-2332

update-pr

alexander-turner/claude-guard

Updates an existing pull request with new changes, commits, and optionally revises the PR description. Activate when the user asks to update, fix, or add to an existing PR. Also activate when the user says "update the PR", "fix the PR", "add this to the PR", or any variation of modifying an existing pull request.

2026-06-0532

name	stress-test
description	Generates comprehensive smoke and stress tests for infrastructure components (sandbox runtimes, container orchestration, firewall rules, etc.), prioritizing end-to-end realism that catches real setup issues across platforms. Activate when the user asks to "stress test", "smoke test", "test the sandbox", "verify isolation", or wants comprehensive infrastructure testing for a component.

Stress Test Skill

When to Use

"Stress test X" / "Smoke test X"
"Add tests for the sandbox" / "Test the container setup"
"Verify isolation" / "Test the firewall"
"Make sure this works on macOS/Linux/WSL2"
Any request for comprehensive infrastructure testing

Principles

Two tiers of tests: Static config validation (pytest, no Docker needed, runs everywhere) + live runtime checks (bash, needs Docker, proves isolation actually works).
Parametrize aggressively: If 3+ tests share a pattern (check property X on services A/B/C), use @pytest.mark.parametrize. One parametrized test > three copy-pasted tests.
No duplicates: Before writing a new test, search existing test files for overlap. If test_claude_wrapper.py already checks runtime defaults, don't re-check in test_sandbox_config.py.
Fixtures for repeated I/O: If multiple tests read the same file (compose YAML, allowlist JSON), use a @pytest.fixture so the file is parsed once per session.
Catch real failures, not keywords: assert "chown" in content is weak. assert "chown -R root:root" in content catches the actual operation. Prefer asserting on the specific string that would break if the behavior changed.
Bash smoke tests should be self-contained: Each check should pass/fail/warn independently. Use FAILURES counter, not set -e for individual checks. Print PASS:/FAIL: prefixes for grep-ability.
Guard against false positives: When comparing values that could be empty (e.g., namespace readlinks), assert non-empty before comparing. An empty-vs-empty comparison silently passes.
Test what matters for each platform: macOS issues are usually volume mounts (virtiofs/9p) and runtime installation (Colima SSH). Linux issues are usually KVM availability and cgroup drivers. WSL2 issues are usually nested virtualization.

Workflow

Step 1: Identify the Component

Step 2: Inventory Existing Tests

Search tests/ and bin/check-*.bash for existing coverage. List what's already tested to avoid duplication.

Step 3: Static Config Tests (pytest)

Write pytest tests that validate configuration correctness by parsing config files (YAML, JSON, bash scripts) directly. These catch misconfigurations without needing Docker:

Parse docker-compose.yml → verify capabilities, network topology, resource limits, environment variables
Parse domain allowlist → verify access modes, no wildcards, no raw IPs
Parse entrypoint/firewall scripts → verify hardening steps present
Parse setup.bash → verify platform detection logic exists

Use @pytest.mark.parametrize for service-level properties that repeat across containers.

Step 4: Live Runtime Tests (bash)

Write bash scripts that exercise the actual runtime. These require Docker but prove isolation works end-to-end:

Container starts with the sandbox runtime
Process isolation: /proc only shows container processes
Device isolation: no host devices visible
Network isolation: internal network blocks egress
Filesystem: read-only enforcement works
Volume mounts: bind mounts work correctly (critical on macOS)
Capabilities: CapEff=0 with --cap-drop=ALL

Step 5: CI Workflow

Create a GitHub Actions workflow that runs both tiers:

Static tests: run on every PR (fast, no Docker needed)
Runtime tests: run on sandbox-related path changes (needs Docker, installs runtime)
Both push and pull_request triggers with consistent paths filters
Always set timeout-minutes

Step 6: Iterative Critique-Fix Loop

After writing tests, run an iterative critique-fix loop until a full pass finds nothing worth changing. Read what you actually wrote, not what you intended to write.

Each pass, hunt for:

Duplicates: tests that overlap with existing files (search tests/ before writing)
Constants-as-tests: assert value == "hardcoded" that will break when someone legitimately changes the config. Replace with invariant/relationship checks (e.g., "proxy env vars reference the firewall's IP" reads both from compose, not a hardcoded IP)
Compression: 3+ tests sharing a pattern → parametrize into one. Near-identical assertions → table-driven
False positives: comparisons where both sides could be empty (assert non-empty first), substring checks that match comments instead of code
Fragile bash: A && B || C (not if-then-else), missing cleanup for Docker networks/temp dirs, set -e killing the whole script on a non-critical check failure
Missing coverage: important invariant not tested, platform-specific failure mode not covered

Fix each issue, then start a fresh pass — the previous critique output is stale after fixes (fixes introduce their own bugs).

Stop when a full pass returns nothing actionable. Cap at 3 passes; if real issues remain at pass 3, list them and ask the user.

Skip for trivial additions (one new test for an obvious gap) — say so explicitly.

Examples

Example 1: "Stress test the runsc setup"

Claude's actions:

Searches tests/ for existing runsc tests, finds test_compose_runtime_defaults_to_runsc in wrapper tests
Creates tests/test_sandbox_config.py with parametrized config checks (caps, limits, network, proxy)
Creates bin/check-runsc-smoke.bash with live isolation checks (process, device, network, mount, filesystem)
Creates .github/workflows/runsc-smoke.yaml with static + runtime jobs
Runs pytest locally, verifies all pass
Self-critiques: finds 3 duplicate tests with wrapper file, removes them; parametrizes 8 similar tests into 2

Example 2: "Test the firewall rules"

Claude's actions:

Checks existing check-compose-lifecycle.bash — finds basic connectivity check
Adds pytest tests parsing init-firewall.bash for DROP policies, IPv6 lockdown, egress quota, conntrack limits
Adds bash tests that run inside compose stack: blocked domain unreachable, allowed domain resolves, squid blocks POST to ro domain, DNS exfil via subdomain blocked
Parametrizes the "blocked vs allowed" tests over a domain list