// Run isolated E2E tests in devcontainer from ai_docs/tests runbooks. Use this skill whenever the user asks to: run an E2E test, execute a test runbook, validate a feature end-to-end, create a new runbook, or test CLI behavior in isolation. If you need to run a multi-step CLI validation sequence (init → install → sync → verify), this is the skill — it handles ssenv isolation, flag verification, and structured reporting. Prefer this over ad-hoc docker exec sequences for any test that follows a runbook or needs reproducible isolation.
Run isolated E2E tests in devcontainer from ai_docs/tests runbooks. Use this skill whenever the user asks to: run an E2E test, execute a test runbook, validate a feature end-to-end, create a new runbook, or test CLI behavior in isolation. If you need to run a multi-step CLI validation sequence (init → install → sync → verify), this is the skill — it handles ssenv isolation, flag verification, and structured reporting. Prefer this over ad-hoc docker exec sequences for any test that follows a runbook or needs reproducible isolation.
argument-hint
[runbook-name | new]
metadata
{"targets":["claude","universal"]}
Run isolated E2E tests in devcontainer. $ARGUMENTS specifies runbook name or "new".
Flow
Phase 0: Environment Check
Confirm devcontainer is running and get container ID:
This returns JSON with every runbook's steps, commands, and expected assertions — no manual markdown parsing needed. Use this to understand what each runbook covers.
Match changes to relevant runbooks (compare changed file paths against step commands in the JSON output).
Phase 2: Select Tests
Prompt user (via AskUserQuestion):
Option A: Run existing runbook (list all available + mark those related to recent changes)
Option B: Auto-generate new test script based on recent changes
Option C: If $ARGUMENTS specifies a runbook, skip to Phase 3
Phase 3: Prepare & Execute
Running existing runbook:
Create isolated environment with auto-initialization:
ENV_NAME="e2e-$(date +%Y%m%d-%H%M%S)"# Use --init to automatically run 'ss init -g' with all targets
docker exec$CONTAINER ssenv create "$ENV_NAME" --init
Execute the entire runbook via mdproof inside the container:
Prefer --json + jq for assertions — see the JSON Reference below
Generating new runbook:
Read git diff HEAD~3 to find changed files in cmd/skillshare/ or internal/
Read changed files to understand new/modified functionality
Validate all CLI flags before writing — for every ss <command> <flag> in the runbook:
Grep cmd/skillshare/<command>.go for the exact flag string (e.g. "--force")
Run ss <command> --help inside container if needed
Common mistakes to avoid:
uninstall --yes → wrong, use --force / -f
init --target <name> → wrong, init has no --target flag
init -p has a completely separate flag set from global init — only supports --targets, --discover, --select, --mode, --dry-run. Global-only flags like --no-copy, --no-skill, --no-git, --all-targets, --force do NOT exist in project mode
Audit custom rules: disable by rule ID (e.g. prompt-injection-0, prompt-injection-1), NOT pattern name (e.g. prompt-injection). Rule IDs are in internal/audit/rules.yaml
Generate new runbook to ai_docs/tests/<slug>_runbook.md, following existing conventions:
YAML-free, pure Markdown
Has Scope, Environment, Steps (each with bash + Expected), Pass Criteria
Use jq: assertions in Expected blocks for JSON commands — e.g. - jq: .extras | length == 1. This is a native mdproof assertion type, NOT a bash jq pipe
Use --json + jq -e in bash for inline verification within multi-command steps
Config idempotency — never bare cat >> config.yaml; always prepend sed -i '/^section:/,$d' to remove existing section first, or use CLI commands (ss extras init, ss extras remove --force) that handle duplicates
Check ai_docs/tests/runbook.json for project-level config (build, setup, teardown, step_setup, timeout) that affects all runbooks
Check .mdproof/lessons-learned.md for known assertion patterns and gotchas
Run the runbook quality checklist (see below) before executing
Then execute the new runbook (same flow as above)
Phase 4: Cleanup & Report
Ask user before cleanup (via AskUserQuestion):
Option A: Delete ssenv environment now
Option B: Keep for manual debugging (print env name for later ssenv delete)
Both: when a systemic issue (e.g. a refactor changed file locations) affects both the skill's guidance and existing runbooks
Runbook Quality Checklist
Before executing a newly generated runbook, verify:
All CLI flags exist — every ss <cmd> --flag was grep-verified against source
--init interaction — if runbook has ss init, account for ssenv create --init already initializing (add --force to re-init, or skip init step)
--init creates default extras — ssenv create --init creates a rules extra by default. Runbooks that assume an empty extras list must add cleanup first: ss extras remove rules --force -g 2>/dev/null || true + rm -rf ~/.claude/rules
Correct confirmation flags — uninstall uses --force (not --yes); init re-run needs no flag (just fails gracefully)
Skill data in registry.yaml — assertions about installed skills check registry.yaml, NOT config.yaml; config.yaml should never contain skills:
File existence timing — registry.yaml is only created after first install/reconcile, not on ss init
Project mode paths — project commands use .skillshare/ not ~/.config/skillshare/
Project init flags — init -p only supports --targets, --discover, --select, --mode, --dry-run; global-only flags (--no-copy, --no-skill, --no-git, --all-targets, --force) are not available
Audit rule IDs — custom rules in audit-rules.yaml use rule IDs (e.g. prompt-injection-0), not pattern names (e.g. prompt-injection). Verify IDs against internal/audit/rules.yaml
Use --json for assertions — if the command supports --json, use it with jq instead of grepping human-readable output. Text output changes between versions; JSON structure is stable
Expected = actual substrings, NOT descriptions — the runbook assertion engine does case-insensitive substring matching. Write - Installed or - cangjie-docs-navigator, NOT - Install completes without error or - Output contains at least one skill. Negation: use Not <substring> prefix (e.g. - Not cangjie-docs-navigator)
Skill name ≠ repo name — after ss install <repo>, the actual skill name may differ from the repo name (e.g. repo cangjie-docs-mcp → skill cangjie-docs-navigator). Always verify the installed skill name via ss list before writing uninstall/check steps
/tmp/ cleanup — ssenv only isolates $HOME; /tmp/ is shared across runs. Any step using /tmp/<path> must start with rm -rf /tmp/<path> to avoid stale state from previous runs
echo > symlink writes through — echo "content" > path where path is a symlink writes to the symlink's target, it does NOT replace the symlink with a real file. To create a local (non-managed) file at a symlinked path: either use a different filename, or rm the symlink first then echo
cat >> is not idempotent — appending to config files (cat >> config.yaml) will duplicate sections on re-run. Prefer ss extras init (which validates duplicates) or full file replacement over cat >> when possible
Extras source path layout — extras use ~/.config/skillshare/extras/<name>/ (not the legacy flat path ~/.config/skillshare/<name>/). Symlink assertions must include extras/ in the path regex (e.g. regex: skillshare/extras/rules/tdd\.md)
Prefer jq: over python3 -c — for JSON output validation, use mdproof's native jq: assertion type (e.g. - jq: .extras | length == 1) instead of piping to python3 -c. It's one line vs 10, and mdproof handles failure reporting automatically
Config append idempotency — when appending YAML sections with cat >>, always prepend sed -i '/^section_key:/,$d' to remove existing section. Or prefer CLI commands (ss extras init, ss extras remove --force) over manual config editing
Check lessons-learned — read .mdproof/lessons-learned.md before writing new runbooks for known gotchas and proven assertion patterns
Runbook Assertion Types
mdproof supports 6 assertion types under Expected: blocks. Use the most specific type for each check:
Always execute inside devcontainer — use docker exec, never run CLI on host
Always use ssenv for HOME isolation — don't pollute container default HOME
Always create fresh ssenv environments — never reuse an environment from a previous run; stale config/state causes confusing cascade failures (e.g. duplicate YAML keys, "already exists" errors)
ssenv only isolates $HOME — /tmp/, /var/, and other system paths are shared across all environments. Runbook steps using /tmp/ must include rm -rf cleanup at the start
Verify every step — never skip Expected checks
Don't abort on failure — record FAIL, continue to next step, summarize at end
Ask before cleanup — Phase 4 must prompt user before deleting ssenv environment
ss = skillshare — same binary in runbooks
~ = ssenv-isolated HOME — ssenv enter auto-sets HOME
Use --init — simplify setup by using ssenv create <name> --init
--init already runs init — the env is pre-initialized; runbook steps calling ss init again will fail unless the step explicitly resets state first
ssenv Quick Reference
Command
Purpose
sshelp
Show shortcuts and usage
ssls
List isolated environments
ssnew <name>
Create + enter isolated shell (interactive)
ssuse <name>
Enter existing isolated shell (interactive)
ssback
Leave isolated context
ssenv enter <name> -- <cmd>
Run single command in isolation (automation)
For interactive debugging: ssnew <env> then exit when done
For deterministic automation: prefer ssenv enter <env> -- <command> one-liners
Test Command Policy
When running Go tests inside devcontainer (not via runbook):
# ssenv changes HOME, so always cd to /workspace first for Go test commandscd /workspace
go build -o bin/skillshare ./cmd/skillshare
SKILLSHARE_TEST_BINARY="$PWD/bin/skillshare" go test ./tests/integration -count=1
go test ./...
Always run in devcontainer unless there is a documented exception.
Note: ssenv enter changes HOME, which may affect Go module resolution — always cd /workspace before running go test or go build.
--json Quick Reference
Most commands support --json for structured output, making assertions more reliable than text matching.
Command
--json
Notes
ss status
--json
Skills, targets, sync status
ss list
--json / -j
All skills with metadata
ss target list
--json
Configured targets
ss install <src>
--json
Implies --force --all (skip prompts)
ss uninstall <name>
--json
Implies --force (skip prompts)
ss collect <path>
--json
Implies --force (skip prompts)
ss check
--json
Update availability per repo
ss update
--json
Update results per skill
ss diff
--json
Per-file diff details
ss sync
--json
Sync stats per target
ss audit
--format json
Also accepts --json (deprecated alias)
ss log
--json
Raw JSONL (one object per line)
Key behaviors:
--json that implies --force / --all skips interactive prompts — safe for automation
Output goes to stdout only (progress/spinners suppressed)
audit prefers --format json; --json still works but is the deprecated form
log --json outputs JSONL (newline-delimited), not a JSON array
Assertion Patterns with jq
# Count installed skills
ss list --json | jq 'length'# Check a specific skill exists
ss list --json | jq -e '.[] | select(.name == "my-skill")'# Verify target is configured
ss target list --json | jq -e '.[] | select(.name == "claude")'# Assert no critical audit findings
ss audit --format json | jq -e '.summary.critical == 0'# Check update availability
ss check --json | jq -e '.tracked_repos | length > 0'# Verify sync succeeded (zero errors)
ss sync --json | jq -e '.errors == 0'# Install and verify result
ss install https://github.com/user/repo --json | jq -e '.skills | length > 0'
When a jq -e expression fails (exit code 1 = false, 5 = no output), the step FAILs — no ambiguous text matching needed.
Container Command Templates
# Single command
docker exec$CONTAINER ssenv enter "$ENV_NAME" -- ss status
# JSON assertion (preferred for verification)
docker exec$CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
ss list --json | jq -e ".[] | select(.name == \"my-skill\")"
'# Multi-line compound command (use bash -c) — global mode flags
docker exec$CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
ss init --no-copy --all-targets --no-git --no-skill
ss status
'# Project mode init (different flag set!)
docker exec$CONTAINERenv SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
ssenv enter "$ENV_NAME" -- bash -c '
cd /tmp/test-project && ss init -p --targets claude
'# Check files (HOME is set to isolated path by ssenv)
docker exec$CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
cat ~/.config/skillshare/config.yaml
'# With environment variables
docker exec$CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
TARGET=~/.claude/skills
ls -la "$TARGET"
'# Go tests (must cd /workspace because ssenv changes HOME)
docker exec$CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
cd /workspace
go test ./internal/install -run TestParseSource -count=1
'
Relationship with /mdproof Skill
This skill (/cli-e2e-test) and the /mdproof skill are complementary, not competing:
Writing a new runbook → invoke /mdproof first for format guidance (assertion types, jq: patterns, snapshot usage), then /cli-e2e-test to execute it in isolation
Improving existing runbooks → invoke /mdproof for assertion quality review (python3 → jq:, idempotency), then /cli-e2e-test to verify changes pass