| name | testing-patterns |
| description | Testing strategy: pyramid, AAA, mocks/fakes/stubs, flaky tests, coverage. Triggers: test, fixture, mock, stub, e2e, TDD, Playwright, Cypress, flaky, coverage, property-based. |
| effort | medium |
| user-invocable | false |
| allowed-tools | Read |
Testing Patterns Skill
Test Structure (AAA Pattern)
def test_function_does_expected_thing():
"""Test description explaining what and why."""
input_data = {"key": "value"}
expected = "result"
result = function_under_test(input_data)
assert result == expected
Test Organization
tests/
āāā conftest.py # Shared fixtures
āāā unit/ # Unit tests (isolated)
ā āāā test_search_core.py
ā āāā test_utils.py
āāā integration/ # Integration tests
ā āāā test_api.py
āāā e2e/ # End-to-end tests
āāā test_workflow.py
Quality Targets
| Metric | Target |
|---|
| Coverage | >70% overall |
| New code | 100% |
| Core modules | >80% |
| No flaky tests | 0 |
Language-Specific References
| Language | Reference | Key Topics |
|---|
| Python | reference/python-pytest.md | Fixtures, mocking, parametrize, markers, conftest, running tests |
| TypeScript | reference/typescript-vitest.md | Vitest/Jest, React Testing Library, mocking, running tests |
| PHP | reference/php-phpunit.md | PHPUnit test cases, mocking, running tests |
| Go | reference/go-testing.md | Table-driven tests, testify mocking, running tests |
| Flutter/Dart | reference/flutter-testing.md | Widget tests, unit tests, running tests |
For Python pytest patterns, see reference/python-pytest.md.
For TypeScript Vitest/Jest patterns, see reference/typescript-vitest.md.
For PHP PHPUnit patterns, see reference/php-phpunit.md.
For Go testing patterns, see reference/go-testing.md.
For Flutter/Dart testing patterns, see reference/flutter-testing.md.
Common Rationalizations
| Excuse | Why It's Wrong |
|---|
| "It's too simple to test" | Simple code breaks in integration ā test the contract, not the complexity |
| "Tests slow down development" | Tests slow down bugs reaching production ā that's the point |
| "We'll add tests later" | Untested code accumulates ā later means never, and coverage gaps compound |
| "Mocking everything is fine" | Over-mocking tests the mocks, not the code ā mock at boundaries only |
| "100% coverage means no bugs" | Coverage measures execution, not correctness ā focus on behavior assertions |
Rules
- MUST follow Arrange-Act-Assert (AAA) structure in every test ā unstructured tests degrade into procedural smoke tests
- MUST test behavior through the public interface, not internal implementation ā tests coupled to internals break on every refactor
- NEVER test implementation details (private method return values, internal state flags) ā they are not the contract
- NEVER hit real external services in unit tests ā use fakes/stubs for boundaries; save real integration for integration tests
- CRITICAL: integration tests must hit real dependencies (database, message queue, external API) when mock-vs-prod divergence is a real risk. Mocked integration tests create false confidence.
- MANDATORY: flaky tests are bugs, not noise. Quarantine or delete them ā a tolerated flaky test erodes the suite's credibility.
Gotchas
- Coverage numbers are easy to game: include generated code, test files that import but do not assert, or wide
# pragma: no cover usage. A 95% reported coverage with 60% real behavior assertion is common.
- Snapshot tests (Jest
.toMatchSnapshot(), pytest-regressions) accept any output as "correct" on first run. An incorrect initial snapshot becomes the accepted baseline ā review snapshots as carefully as code.
- Mocks configured with
any matchers (e.g., .mock.calls[0][0] without a schema) pass even when the production call shape changes. Assert on specific arguments, not just "was called".
- Test isolation fails when globals leak (module-level mutable state, module-scoped fixtures, env vars set in one test). Flakiness that appears only under
pytest -n auto or jest --parallel is usually shared state.
- Property-based tests (Hypothesis, fast-check) shrink failing examples to minimal reproducers, but shrinking time can dominate the run. For complex generators, cap shrink deadlines or seed the failing example for next-run reproducibility.
- Test pyramid vs trophy: the "right" ratio depends on stack. Frontend apps with rendering concerns benefit from more integration tests (trophy); pure backend services align better with pyramid. Don't cargo-cult one model.
When NOT to Load
- For running the test suite ā use
/test
- For test-first development workflow ā use
/tdd
- For debugging a specific test failure ā use
/debug on the failure output
- For test framework choice in a new project ā use
/app-builder
- For performance/load testing ā this skill covers correctness tests, not load