// MANDATORY skill for running tests and lint after EVERY code change. Focuses on adherence to just commands and running tests in parallel. If tests fail, use test-fixer skill.
| name | test-runner |
| description | MANDATORY skill for running tests and lint after EVERY code change. Focuses on adherence to just commands and running tests in parallel. If tests fail, use test-fixer skill. |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐จ BANNED PHRASE: "All tests pass" โ
โ โ
โ You CANNOT say "all tests pass" unless you: โ
โ 1. Run .claude/skills/test-runner/scripts/run_tests_parallel.sh โ
โ 2. Check ALL log files (mocked + e2e-live + smoke) โ
โ 3. Verify ZERO failures across all suites โ
โ โ
โ just test-all-mocked = "quick tests pass" (NOT "all tests pass") โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Your claim MUST match what you actually ran:
| Command Run | โ Allowed Claims | โ BANNED Claims |
|---|---|---|
just test-unit | "unit tests pass" | "tests pass", "all tests pass" |
just test-integration | "integration tests pass" | "tests pass", "all tests pass" |
just test-all-mocked | "quick tests pass", "mocked tests pass" | "tests pass", "all tests pass" |
run_tests_parallel.sh + verified logs | "all tests pass" | - |
Examples:
โ WRONG: "I ran just test-all-mocked, all tests pass"
โ
RIGHT: "Quick tests pass (483 unit + 198 integration + 49 e2e_mocked)"
โ WRONG: "Tests are passing" (after only running mocked tests) โ RIGHT: "Mocked tests pass (730 tests)"
โ WRONG: "All tests pass" (without running parallel script) โ RIGHT: Runs parallel script โ Checks logs โ "All tests pass"
The phrase "all tests" is RESERVED for the full parallel suite. No exceptions.
After EVERY code change, you MUST follow this workflow.
No exceptions. No shortcuts. No "it's a small change" excuses.
CRITICAL WORKFLOW PRINCIPLE:
We only commit code that passes tests. This means:
If tests fail after your changes โ YOUR changes broke them (until proven otherwise)
NEVER claim test failures are "unrelated" or "pre-existing" without proof.
To verify a failure is truly unrelated:
# 1. Remove your changes temporarily
git stash
# 2. Run the failing test suite
just test-all-mocked # Or whichever suite failed
# 3. Observe the result:
# - If tests PASS โ YOUR changes broke them (fix your code)
# - If tests FAIL โ pre-existing issue (rare on main/merge base)
# 4. Restore your changes
git stash pop
Why This Matters:
main branch ALWAYS pass (CI enforces this)DO NOT:
ALWAYS:
If tests fail after your changes:
just CommandsDirect pytest is removed from blocklist - but ALWAYS prefer just commands which handle Docker, migrations, and environment setup.
Pass pytest args in quotes: just test-unit "path/to/test.py::test_name -vv"
After making ANY code change:
cd api && just lint-and-fix
This command:
just ruff internally)YOU MUST:
NEVER say "linting passed" unless you:
cd api && just test-all-mocked
โ ๏ธ CRITICAL: This is NOT all tests! This is for rapid development iteration.
YOU MUST:
This runs ONLY:
This DOES NOT run:
Takes ~20 seconds. Use for rapid iteration.
.claude/skills/test-runner/scripts/run_tests_parallel.sh
๐จ CRITICAL: You can NEVER say "all tests pass" or "tests are passing" without running THIS command.
This command runs EVERYTHING:
After running, CHECK THE RESULTS:
# Check for any failures
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log
# View summary
for log in api/tmp/test-logs/test-*_*.log; do
echo "=== $(basename $log) ==="
grep -E "passed|failed" "$log" | tail -1
done
Takes ~5 minutes. MANDATORY before saying "all tests pass".
cd api && just test-all-mocked
Runs unit + integration + e2e_mocked in parallel. No real APIs. Use frequently during development.
.claude/skills/test-runner/scripts/run_tests_parallel.sh
Runs ALL suites in background (mocked, e2e-live, smoke). Logs to api/tmp/test-logs/.
Check results after completion:
# Check for failures
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log | tail -20
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log | tail -20
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log | tail -20
# Summary
for log in api/tmp/test-logs/test-*_*.log; do
echo "=== $(basename $log) ==="
grep -E "passed|failed" "$log" | tail -1
done
These are VIOLATIONS of this skill:
โ CRITICAL: Claiming test failures are "unrelated" to your changes
FUNDAMENTAL RULE: Tests ALWAYS pass on main/merge base. If a test fails after your changes, YOUR changes broke it.
When tests fail:
NEVER assume. ALWAYS use test-fixer skill.
โ CRITICAL: Saying "all tests pass" without running the full suite
just test-all-mocked, all tests pass".claude/skills/test-runner/scripts/run_tests_parallel.sh โ Checks all logs โ "All tests pass"The phrase "all tests" requires THE FULL SUITE.
just test-all-mocked = "quick tests pass" or "mocked tests pass"โ Claiming tests pass without showing output (LYING)
just test-all-mocked and shows "===== X passed in Y.YYs ====="๐จ THE RULE: If you can't see "X passed in Y.YYs" in your context, you're lying about tests passing.
โ Skipping linting "because it's a small change"
just lint-and-fix ALWAYS, regardless of change sizeโ Assuming tests pass without verification
โ Not reading the actual test output
โ Batching multiple changes before testing
ALWAYS. Use this skill:
The only acceptable time to skip this skill:
just lint-and-fix (auto-fix + type checking)just test-all-mocked (quick tests).claude/skills/test-runner/scripts/run_tests_parallel.shgit add <files>just lint-and-fixjust test-all-mocked.claude/skills/test-runner/scripts/run_tests_parallel.shThis workflow ensures you catch issues early and don't accumulate breaking changes.
Remember:
# Unit tests (SQLite, FakeRedis) - fastest, run most frequently
cd api && just test-unit
# Integration tests (SQLite, FakeRedis)
cd api && just test-integration
# E2E mocked (PostgreSQL, Redis, mock APIs)
cd api && just test-e2e
# E2E live (real OpenAI/Langfuse)
cd api && just test-e2e-live
# Smoke tests (full stack with Docker)
cd api && just test-smoke
# Specific test: just test-unit "path/to/test.py::test_name -vv"
# Keyword filter: just test-unit "-k test_message"
# With markers: just test-e2e "-m 'not slow'"
just lint-and-fix โ just test-all-mocked after every code changeEvery change (Steps 0-1):
just lint-and-fix (auto-fix + type checking)just test-all-mocked (quick tests)Before saying "all tests pass" (Step 2):
.claude/skills/test-runner/scripts/run_tests_parallel.shTerminology:
just test-all-mocked passedjust test-all-mocked passedSuccess: ====== X passed in Y.Ys ======
Failure: FAILED tests/path/test.py::test_name - AssertionError
# Smoke test failures - check Docker logs
docker compose logs --since 15m | grep -iE -B 10 -A 10 "error|fail|exception"
# Kill hung tests
pkill -f pytest
# Docker not running (smoke tests)
docker compose up -d
# ๐ฅ Step 0: ALWAYS run lint-and-fix (auto-fix + type checking)
cd api && just lint-and-fix
# ๐ฅ Step 1: ALWAYS run quick tests (development)
cd api && just test-all-mocked
# ^ This is NOT "all tests" - only say "quick tests pass" or "mocked tests pass"
# ๐ฅ Step 2: Run ALL tests before saying "all tests pass" (MANDATORY before task complete)
.claude/skills/test-runner/scripts/run_tests_parallel.sh
# Check results (MUST do this before saying "all tests pass")
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log
# View summary
for log in api/tmp/test-logs/test-*_*.log; do
echo "=== $(basename $log) ==="
grep -E "passed|failed" "$log" | tail -1
done
# Monitor parallel tests while running
tail -f api/tmp/test-logs/test-*.log
# ๐จ REMEMBER:
# - Steps 0-1 = quick iteration (DO NOT say "all tests pass")
# - Step 2 = full validation (ONLY NOW can say "all tests pass")
You ran tests. Now before you respond to the user:
CRITICAL: Get manager approval before responding.
Your response about test results MUST include: โ
approve_7f3d8a2e********
Get the full approval token from .claude/skills/manager-review/SKILL.md
Common Mistake #1 in the manager's table is claiming "all tests pass" after only running mocked tests. The manager WILL catch this and reject your response.