| name | test-best-pratice |
| description | A general skill for writing high-quality automated tests. Use it to design and implement stable, maintainable, and diagnosable tests for functions, modules, APIs, components, pages, and critical business workflows. |
test-best-pratice
Goal
Write tests with a high signal-to-noise ratio:
- Fail when something is actually wrong, and do not fail randomly when nothing is wrong.
- Make it easy to locate the cause after a failure.
- Require minimal changes when internal implementation is refactored.
- Run fast enough for daily development and CI.
- Stay maintainable over time instead of becoming team overhead.
This skill is not tied to any specific framework. It can be used for unit tests, integration tests, API tests, component tests, and end-to-end tests.
Core Principles
1. The goal of testing is to build confidence, not to chase coverage
Ask first:
- What are the most important user scenarios for this code?
- Which paths would cause the greatest business impact if they broke?
- Which edge cases are most likely to produce bugs?
Do not start with “I need to test every function and every branch.” Start with use cases. Coverage can be a useful signal, but it cannot replace judgment about test value.
2. Test behavior and contracts, not implementation details
Prefer testing:
- Public API behavior
- User-observable outcomes
- Input/output contracts between modules
- Whether critical side effects happen, such as database writes, message dispatch, or state changes
Avoid directly testing:
- Private methods
- Internal state layout
- Temporary variables
- Component internals
- Assertions written only to fit the current implementation
Principle: implementations may be refactored, but contracts should not change casually.
3. Choose the smallest test that gives enough confidence
Test size is not “smaller is always better” or “bigger is always better.” Choose the cheapest layer that gives enough confidence.
Default decision order:
- If a unit test can verify it reliably, do not jump to UI or E2E.
- If you need to verify collaboration between modules, prefer an integration test.
- Use a small number of end-to-end tests only for high-value flows such as payments, login, or submission.
Remember: a large number of slow, brittle high-level tests is worse than a well-layered test suite.
4. Stability matters more than “it looks like we tested a lot”
A flaky test destroys trust in the test suite.
Actively control anything that introduces instability:
- Time
- Time zone
- Randomness
- Network jitter
- Third-party services
- Test execution order
- Shared state across tests
- Uncertain async timing
5. A test is both documentation and an alarm
A high-quality test should make it obvious, from its name and assertions alone:
- What guarantee this feature provides
- Which edge case or historical pitfall it protects against
- Which business rule has been broken when it fails
So tests should first be clear, and only then “beautifully abstract.”
When to Use
Use this skill when you need to:
- Add tests for a new feature
- Write regression tests for a bug fix
- Build a safety net before refactoring
- Define behavioral contracts for shared modules
- Verify APIs, components, pages, or workflows reliably
- Clean up flaky tests or low-value tests
Output Standards
A finished test should satisfy these standards as much as possible:
- Clear purpose: every test should answer “what does this protect?”
- Single failure reason: a failure should not leave people guessing.
- Independent execution: it should pass alone and in random order.
- Repeatability: the same commit should produce the same result in the same environment.
- Readability: names, data, and assertions should be easy to understand.
- Maintainability: refactoring internals should not require broad test rewrites.
- Reasonable speed: tests should be fast by default; slow tests need a strong reason.
Workflow
Step 1: Identify the test target and the risk
Before writing a test, clarify:
- What is the target: function, module, API, component, page, or workflow?
- What is its external contract?
- What is the main success path?
- What are the main failure paths?
- Where are the high-risk edges?
- How expensive is failure?
Prioritize coverage for:
- Core success paths
- Boundary conditions most likely to fail
- Previously reported bugs
- Critical rules that refactors could easily break
Step 2: Choose the test layer
Use the following guidance:
Unit tests
Good for:
- Pure functions
- Rule evaluation
- Data transformation
- Validation logic
- Sorting, filtering, aggregation
- Single-step state-machine behavior
Characteristics:
- Fast
- Stable
- Precise diagnosis
- Low cost
Do not use them to verify:
- Real collaboration across many modules
- UI interaction flows
- Real network or database integration
Integration tests
Good for:
- Service and database collaboration
- Combined behavior across modules
- API routes with middleware and persistence
- Components working with state management or the data layer
Characteristics:
- More realistic than unit tests
- Higher confidence
- Moderate cost
They are often high leverage because they balance realism and maintenance cost well.
End-to-end tests (E2E)
Good for:
- Login
- Registration
- Checkout and payment
- Form submission
- Critical business flows
- Cross-page workflows
Characteristics:
- Closest to the user
- Most expensive, slowest, and easiest to make brittle
Strategy:
- Cover only a small number of critical happy paths
- Do not use E2E as a substitute for lower-level tests
Step 3: Turn scenarios into a test checklist
For every test target, list at least:
- Valid input
- Boundary input
- Invalid input
- Null or missing values
- Duplicate or conflicting input
- Insufficient permissions
- Timeout, failure, or exception paths
- Idempotency, if relevant
- Sorting, pagination, precision, and time zone, if relevant
Do not start coding immediately. List scenarios first.
Step 4: Design test data
Test data should be:
- Minimal, containing only what the test needs
- Semantically clear, so the purpose is obvious from the name
- Free from magic numbers and meaningless strings
- Explicit rather than implicitly reused
Prefer:
- Factory functions
- Fixtures
- Builders
- Clearly named test samples
Avoid:
- Oversized shared test datasets
- “Universal” objects created only for reuse
- Implicit dependence on a global seed database
Step 5: Write tests in AAA structure
Recommended structure:
- Arrange: prepare inputs, dependencies, and initial state
- Act: execute one core action
- Assert: verify externally visible results
Constraints:
- A test should usually perform only one core action
- Do not scatter assertions everywhere
- If there are too many assertions, check whether the scope is too large
Step 6: Prefer meaningful assertions
Characteristics of good assertions:
- They assert business outcomes rather than procedural noise
- They are easy to understand when they fail
- They are strongly tied to the purpose of the test
Prefer asserting:
- Return values and output structure
- Persisted results
- User-visible text and state
- Clear error types and messages
- Required side effects
Avoid:
- Meaningless “object exists” assertions
- Low-value assertions such as “the page rendered”
- Assertions about intermediate steps that are irrelevant to the business outcome
Step 7: Handle external dependencies
Use real collaboration when practical; do not mock by reflex
For collaboration between modules you control, prefer assembling the real pieces and testing them together. Only consider mocks or stubs when the dependency is:
- Unstable
- Slow
- Expensive
- Hard to construct
- Uncontrollable
- Side-effectful, such as sending real SMS or charging money
Mock boundaries
Good places to mock:
- Third-party payment services
- SMS or email services
- Cloud storage
- External HTTP APIs
- Slow dependencies that are irrelevant to the current test
Do not over-mock:
- Large parts of your own internal system
- Everything, just to make the test easier to write
Rule of thumb: mock at the system boundary, not across the entire inside of the system.
Step 8: Control stability
Eliminate these sources of non-determinism whenever possible:
- Freeze the clock or inject a time source
- Fix the random seed
- Create state explicitly before each test and do not rely on someone else to clean up
- Avoid leftover shared database state
- Avoid dependence on execution order
- Do not call real third-party networks
- Do not use arbitrary sleeps or waits
Waiting strategy:
- Wait for a clear condition
- Wait for an element to become visible, text to appear, a request to finish, or state to be persisted
- Do not write “wait 2 seconds and see”
Step 9: Check whether the test is brittle
After writing the test, ask:
- If I refactor the internals without changing behavior, will this test fail for no good reason?
- If the execution order changes, will this test break?
- If the network or machine is slightly slower, will this test break?
- If the UI copy or DOM structure changes slightly, will this test trigger mass failures?
- If this test fails, can I know the rough cause within one minute?
If the answers are poor, improve the test.
Specific Rules by Test Type
A. Unit test best practices
- Prefer testing pure logic and boundary conditions.
- Test the public API, not private methods.
- Each test should protect one rule.
- Use parameterized tests for similar input families.
- Test error paths too, and assert error type or key message.
- Name tests as “scenario + expected result.”
- Do not turn the test into another complicated business program.
- Avoid using real databases or networks in unit tests.
Good naming style
returns_discounted_price_for_vip_user
rejects_empty_email
keeps_original_order_when_scores_are_equal
B. Integration test best practices
- Cover key collaboration paths, not every possible combination.
- Prefer real assembly, mocking only external boundaries.
- Manage the lifecycle of test data carefully.
- For databases, use controlled schema setup, cleanup, transaction rollback, or temporary instances.
- Assert side effects clearly for queues, caches, and file systems.
- In API tests, verify status code, response body, persisted results, and permission constraints together.
C. UI and component test best practices
- Query elements from the user’s perspective, preferring role, label, and text.
- Do not depend on class names, DOM hierarchy, or
nth-child unless necessary.
- Avoid testing only “it renders”; test real interaction and outcomes.
- Assert user-visible state changes, not component internals.
- After interactions, wait for a clear result instead of sleeping.
- Accessible UI is usually easier to test reliably.
Recommended query priority
Prefer:
- role
- label text
- placeholder text
- visible text
Use only as a fallback:
D. E2E best practices
- Keep them few and high value.
- Protect only critical main flows, not every branch.
- Every test must be independent and must not depend on earlier tests.
- Manage login state, test accounts, and seed data explicitly.
- Wait for system state, not fixed time.
- Use stable selectors that align with user semantics.
- Do not validate every detail in E2E.
- Preserve enough diagnostic information on failure: logs, screenshots, traces, HAR files, and error responses.
Flaky Test Handling Rules
If a test sometimes passes and sometimes fails on the same code, investigate in this order:
- Is it using arbitrary sleep or wait?
- Does it depend on execution order or shared state?
- Does it depend on the real network or third-party services?
- Is the assertion happening too early?
- Is the selector too brittle?
- Is there a problem with time, time zone, or randomness?
- Is the scope too large, mixing several possible failure sources?
Treatment principles:
- Fix it first; do not normalize rerunning.
- If it cannot be fixed immediately, isolate it temporarily and record the reason.
- Do not tolerate flaky tests sitting on the main branch long term.
Coverage Strategy
Do not focus only on code coverage numbers. Care more about these forms of coverage:
- Use-case coverage
- Risk coverage
- Boundary coverage
- Permission coverage
- Exception-path coverage
- Regression coverage
Recommended priority:
- Core main flows
- High-risk edges
- Regression tests for historical bugs
- Permission and security logic
- Contracts likely to break during refactors
Maintainability Rules
Tests should be DAMP, not excessively DRY
Some duplication in tests is acceptable if it makes intent clearer.
Prefer:
- Clear readability
- Explicit data
- Independent scenarios
Abstract carefully:
- Extract helpers only when the repetition is truly stable and improves understanding
- Do not hide test data, test behavior, and assertions inside black-box helpers
Rule of thumb:
If the abstraction forces the reader to jump through many layers just to understand what the test does, the abstraction has probably gone too far.
Test failures should be diagnosable
When an assertion fails, it should ideally show:
- The expected value
- The actual value
- The current scenario
- The key inputs
When needed, also provide:
- Request and response snapshots
- Database state summaries
- Page screenshots
- Traces or logs
Anti-Patterns
Common bad smells:
- Writing tests only for coverage numbers
- Testing private implementation details
- Using lots of fixed sleep or wait calls
- Tests depending on one another
- Sharing dirty data or shared account state
- Too many actions and intentions inside a single test
- Assertions that only check “exists” or “does not throw”
- Too many E2E tests and too few unit or integration tests
- Mocking everything until the test no longer reflects the real system
- Over-abstracted helpers that hide test intent
- Vague names such as
should work
- Fixing a bug without adding a regression test
- Rerunning flaky tests instead of finding the root cause
Recommended Working Templates
Template 1: Writing tests for a new feature
- List the core use cases
- Choose the test layer
- Write the main success path first
- Add high-risk boundary cases
- Add exception paths
- Run locally multiple times to check stability
- Before merging, verify that failure messages are clear
Template 2: Writing regression tests for a bug fix
- Reproduce the bug first
- Write a failing test first
- Fix the code
- Confirm the test turns green
- Add nearby boundary cases to prevent the same class of issue from returning
Template 3: Adding tests to existing code
- Start from the clearest external contract
- Cover the most critical success path first
- Then add the most fragile boundaries
- Avoid diving into private internals at the beginning
- If the code is hard to test, record the design smell and refactor in small steps
Expected Output
When using this skill to produce tests, the default output should include:
-
Test strategy explanation
- Why this test layer was chosen
- Which scenarios are covered
- Which scenarios are intentionally not covered, and why
-
Test code
- Runnable as-is
- Clearly structured
- Clearly named
-
Stability notes
- How flakiness is avoided
- How test data, time, randomness, and network dependencies are controlled
-
Follow-up suggestions
- Whether integration or E2E tests should be added later
- Whether the design could be improved to make the code more testable
Short Checklist
Before submitting, quickly check:
- Which business rule does this test protect?
- Does it test behavior or implementation details?
- Can it run independently?
- Does it rely on fixed sleep?
- Does it rely on shared state?
- If it fails, can the cause be located quickly?
- Is it worth maintaining long term?
If you cannot answer two or more of these clearly, do not submit it yet.
One-Sentence Principle
Write tests that provide real confidence. Prefer user behavior and system contracts, cover the highest-risk scenarios with the smallest sufficient test layer, and reject brittle, vague, and flaky tests.