| name | write-tests |
| description | Write high-quality unit tests (Vitest) and E2E tests (Playwright) following senior test-engineering practices. Use this skill whenever asked to write tests, add test coverage, create test cases, fix failing tests, add missing assertions, test a new feature, write specs, or cover edge cases. Also trigger when the user says "write tests for", "add tests", "test this", "cover this", "needs tests", "add E2E test", "add unit test", "test coverage", or when reviewing code and noticing missing test coverage.
|
Write Tests
Tests are not production code. They are documentation — each one is a tiny executable spec that says
"this system does X." A reader should grasp the intent in seconds. A failure should point to exactly
one broken behavior, not leave you going through a 40-line test body.
Workflow
Follow these steps in order before writing any test code.
-
Decide the framework. Testing a function's return value, side effects, or module interactions
→ Vitest (lives under packages/<name>/test/). Testing that a real HTTP request to a running app
produces the correct Sentry envelope → Playwright (lives under
dev-packages/e2e-tests/test-applications/<app>/tests/). Testing Node SDK instrumentation
against real envelope output → node-integration-tests (lives under
dev-packages/node-integration-tests/suites/).
Parameterization differs by framework — pick the right one:
| Framework | How to parameterize |
|---|
| Vitest | it.each / it.for (runner-integrated, one test each) |
| Playwright E2E | .forEach() outside test() (registers separate tests) |
| Node integration tests | Loops inside a single test() body (one Node.js process) |
-
Read 2–3 existing test files in the target test/ directory. Specifically note:
- Which
vi.mock style they use (string path or import form)
- What cleanup they do in
beforeEach (clearAllMocks vs restoreAllMocks)
- How they import the module under test (
../../src/... vs @sentry/...)
- The
describe/it nesting depth and naming style
- What setup functions are called together — does the function under test require companion
initialization? (e.g., does
patchRoute also need patchAppUse to work correctly?)
Match what you find. Consistency within a package matters more than idealized best practice.
-
Check for existing test utilities before writing helpers from scratch:
packages/core/test/mocks/ — TestClient, getDefaultTestClientOptions, fake transports
packages/core/test/testutils.ts — clearGlobalScope(), version gating
dev-packages/test-utils/ — waitForTransaction, waitForError, waitForSession,
getPlaywrightConfig, mock Sentry server, event proxy
dev-packages/node-integration-tests/utils/ — createEsmAndCjsTests, assertion helpers
-
Identify the behaviors that matter most — edge cases, error paths, boundary conditions.
Don't aim for quantity; aim for the tests that would catch real regressions.
Core principles
Fewer tests, better tests
The goal is not to maximize test count. A large suite of shallow happy-path tests gives a false
sense of coverage — they pass on every change, including changes that introduce bugs. A smaller
suite that targets edge cases, error paths, and boundary conditions catches far more regressions.
Before writing a test, ask: "If this test didn't exist, what bug could ship?" If you can't answer
that concretely, the test probably isn't worth writing. Prioritize:
- Edge cases and boundaries — the off-by-one, the empty array, the
null input
- Error paths — does the function fail gracefully or silently swallow the error?
- Integration seams — where two modules or systems interact (e.g., middleware calling
next())
- Behavior that previously broke — regression tests for known bugs
Don't waste tests on: trivial getters/setters, pure delegation to well-tested libraries,
TypeScript type constraints (the compiler already checks those), or re-testing the same behavior
that a higher-level test already covers.
Arrange → Act → Assert
Structure every test with the AAA pattern, separated by blank lines. The whitespace makes the
phases obvious — no labels or comments needed.
it('skips errors already captured by middleware', () => {
const error = new Error('already captured');
Object.defineProperty(error, '__sentry_captured__', { value: true });
responseHandler(createMockContext(500, error));
expect(mockCaptureException).not.toHaveBeenCalled();
});
One behavior, one reason to fail
Each test makes exactly one behavioral claim. Multiple expect calls are fine when they assert on
different facets of the same outcome. But if you're checking two unrelated behaviors, those are
two tests. No conditional logic, no branching, no try/catch — a test is a straight line.
Assert behavior, not implementation
If someone refactored the internals but the function still returned the correct result, would this
test break? If yes, you're testing wiring, not behavior.
it('handles the request', async () => {
expect(() => handler(mockReq)).not.toThrow();
});
it('sets transaction name from route path', () => {
responseHandler(createMockContext(200));
expect(mockSetTransactionName).toHaveBeenCalledWith('GET /test');
});
Precise assertions
Default to exact matching. toMatchObject, expect.objectContaining, and expect.arrayContaining
silently ignore fields that matter. This has caused real bugs to ship in this codebase.
Use toEqual by default. The same applies to toHaveBeenCalledWith — spell out every
argument rather than wrapping in objectContaining. This is the single most common place where
loose assertions creep in:
expect(startSpan).toHaveBeenCalledWith(expect.objectContaining({ name: 'middleware', op: 'middleware.hono' }));
expect(startSpan).toHaveBeenCalledWith({
name: 'middleware',
op: 'middleware.hono',
onlyIfParent: true,
parentSpan: fakeRootSpan,
attributes: { 'sentry.op': 'middleware.hono', 'sentry.origin': 'auto.middleware.hono' },
});
The only valid reasons to use toMatchObject or objectContaining are: (1) the object is
generated by a framework or third-party library and contains fields you don't control (timestamps,
random IDs, internal framework state), or (2) the object has 10+ fields and the test only
cares about 2–3 of them (in which case individual .toBe() checks on those fields are still
preferred). If you wrote the object being asserted, you can spell it out — use toEqual.
When you do fall back, prefer individual .toBe() checks over objectContaining:
expect(event.transaction).toBe('GET /users/:id');
expect(event.contexts?.trace?.op).toBe('http.server');
Every toContain / toContainEqual needs a toHaveLength companion. Without it, the
assertion passes even if the array has unexpected extra items:
expect(spanNames).toContain('authMiddleware');
expect(spanNames).toHaveLength(1);
expect(spanNames).toContain('authMiddleware');
Use exported constants, not magic numbers. If the code under test uses named constants like
SPAN_STATUS_OK, reference those same constants in assertions. If the constant's value ever
changes, tests using magic numbers silently pass with wrong expectations.
Naming
Names should be concise, descriptive, and read as correct English. Lead with the verb.
| Quality | Example |
|---|
| Good | 'captures error when context.error is set' |
| Good | 'does not re-capture errors already captured by wrapMiddlewareWithSpan' |
| Good | 'returns empty array when no items match' |
| Bad | 'should correctly return the formatted price string when given a valid positive number' |
| Bad | 'test error handling' / 'works correctly' |
Drop "should" — it adds words without adding meaning.
Input quality
Use realistic data
const url = 'http://test';
const url = 'https://api.example.com/users/42?include=profile&format=json';
Boundary Value Analysis
If the valid range is 1–100, test -2, -1, 0, 1, 2, 99, 100, 101, Number.POSITIVE_INFINITY. Bugs cluster at boundaries — off-by-one
errors, inclusive/exclusive confusion, type coercion.
Test the unhappy path as hard as the happy path
- Empty inputs:
'', [], {}, undefined, null
- Falsy-but-valid:
0, false, '', NaN — these trip up loose truthiness checks
- Error conditions: network failure, malformed input, missing required fields, timeout
- Concurrency: what if called twice simultaneously? What if called after cleanup?
Each edge case gets its own test with a descriptive name.
Writing Vitest tests
File structure
- Name test files
*.test.ts, mirroring the source path: src/shared/patchRoute.ts →
test/shared/patchRoute.test.ts.
- Import the module under test from its source path (
../../src/...). But when importing from a different package
(e.g., @sentry/core in a @sentry/node test), use the package name — that's a real dependency, not the code under test.
- For browser-environment tests:
/** @vitest-environment jsdom */ at top of file.
Mocking
Prefer spies and stubs over full module mocks. A spy observes behavior without replacing the
system under test. A full mock replaces it — and now you're testing your mock, not your code.
const warnSpy = vi.spyOn(SentryCore.debug, 'warn');
sentry(app);
expect(warnSpy).toHaveBeenCalledWith(expect.stringContaining('not initialized'));
When you need vi.mock: If the package's existing tests use string paths
(vi.mock('../../src/utils')), match that style. If you're creating the first test file for a
package, prefer the import form for type safety:
vi.mock(import('../../src/utils'), async importOriginal => {
const actual = await importOriginal();
return { ...actual, helperFn: vi.fn() };
});
Always restore mocks. This repo does not set restoreMocks: true globally — you are
responsible for cleanup. Leaked mocks cause mysterious failures in unrelated tests. Use whatever
cleanup the existing tests in your package use. If creating the first test file, use:
beforeEach(() => {
vi.restoreAllMocks();
});
Error testing
Use the library's built-in matchers. Never use try/catch in tests.
expect(() => parseConfig(null)).toThrow('config is required');
await expect(asyncOp()).rejects.toThrow(TimeoutError);
For async callbacks where you need to verify an assertion actually ran, use expect.assertions(n).
Parameterized tests (Vitest)
Use it.each or it.for for data-driven cases — not raw for loops. it.each integrates with
the test runner, gives better output, and each case runs independently.
it.each([
{ input: 0, expected: 'zero' },
{ input: 1, expected: 'one' },
{ input: -1, expected: 'negative' },
])('classifies $input as $expected', ({ input, expected }) => {
expect(classify(input)).toBe(expected);
});
Test isolation
Tests must never depend on execution order or share mutable state. For this codebase, many tests
need to reset global Sentry state:
beforeEach(() => {
clearGlobalScope();
getCurrentScope().clear();
getIsolationScope().clear();
});
Grouping
1-2 levels of describe is usually enough. Deeper nesting makes tests harder to find and read.
describe('patchRoute', () => {
describe('sub-app middleware wrapping', () => {
it('wraps .use() middleware handlers', async () => { ... });
it('does not wrap sole route handlers', async () => { ... });
});
});
Writing node-integration-tests
Node integration tests (dev-packages/node-integration-tests/) use createEsmAndCjsTests to
run a real Node scenario file and assert on captured Sentry envelopes.
Minimize test() calls — each one spawns a separate Node process
This is the opposite of the Playwright rule. In Playwright, each test() is cheap — use
.forEach() to register many tests. In node-integration-tests, each test() forks a fresh Node
process with full startup cost. A describe.each matrix that looks reasonable in a unit test
context balloons into dozens of cold starts and slows CI by a large factor.
Rule: loop inside the test body, not around test() calls.
createEsmAndCjsTests(__dirname, 'scenario.mjs', 'instrument.mjs', (createRunner, test) => {
describe.each(['/sync', '/async'])('when using %s route', route => {
describe.each(['get', 'post', 'put', 'delete', 'patch'])('when using %s method', method => {
test('handles transaction', async () => {
});
});
});
});
createEsmAndCjsTests(__dirname, 'scenario.mjs', 'instrument.mjs', (createRunner, test) => {
test('handles transactions for all route/method/path combinations', async () => {
const runner = createRunner();
const requests: Array<{ method: string; url: string }> = [];
for (const route of ['/sync', '/async']) {
for (const method of ['get', 'post', 'put', 'delete', 'patch']) {
const fullPath = `${route}${path}`;
runner.expect({
transaction: { transaction: `${method.toUpperCase()} ${fullPath}` },
});
requests.push({ method, url: fullPath });
}
}
const started = runner.start();
for (const req of requests) {
await started.makeRequest(req.method, req.url);
}
await started.completed();
}, 60_000);
});
If a subset of cases has meaningfully different expectations (e.g., error vs. success), split
into two tests — not thirty.
Writing Playwright E2E tests
When to write E2E tests
Write E2E tests when you need to verify that the SDK correctly instruments a real application.
Unit tests can't catch integration bugs between the SDK and a framework's request lifecycle.
Also use the /e2e skill for running E2E tests.
File structure
- Tests live in
dev-packages/e2e-tests/test-applications/<app-name>/tests/*.test.ts.
- Shared constants (like
APP_NAME) go in tests/constants.ts.
- Each test app has a
playwright.config.ts using getPlaywrightConfig from
@sentry-internal/test-utils.
The waitFor pattern
Set up a promise for the expected Sentry event, trigger the action, then await and assert.
test('captures transaction for GET /users/:id', async ({ baseURL }) => {
const transactionPromise = waitForTransaction(APP_NAME, event => {
return event.contexts?.trace?.op === 'http.server' && event.transaction === 'GET /users/:id';
});
const response = await fetch(`${baseURL}/users/42`);
expect(response.status).toBe(200);
const transaction = await transactionPromise;
expect(transaction.contexts?.trace?.op).toBe('http.server');
expect(transaction.transaction).toBe('GET /users/:id');
});
The predicate must be specific enough to match only your event. A vague predicate can match an
unrelated event from a parallel test, causing flaky passes or hangs.
Asserting on spans
Prefer asserting on the exact span count alongside individual field checks:
const spans = transaction.spans || [];
expect(spans).toHaveLength(2);
const middlewareSpan = spans.find(s => s.description === 'middlewareA');
expect(middlewareSpan?.op).toBe('middleware.hono');
expect(middlewareSpan?.origin).toBe('auto.middleware.hono');
expect(middlewareSpan?.status).toBe('ok');
Error event assertions
Check both the exception value and the mechanism. The mechanism tells you how the error was
captured — that's the SDK's actual responsibility:
const errorEvent = await errorPromise;
expect(errorEvent.exception?.values?.[0]?.value).toBe('connection refused');
const mechanism = errorEvent.exception?.values?.[0]?.mechanism;
expect(mechanism?.handled).toBe(false);
expect(mechanism?.type).toBe('auto.http.hono.context_error');
Parameterized E2E tests
For Playwright tests (unlike Vitest), use standard JS .forEach() as this is recommended by Playwright,
not it.each or it.for, which are Vitest-only APIs. The .forEach() runs at discovery time, registering
each case as its own independent test. All cases then run separately at execution time.
[
{ a: 1, b: 1, expected: 2 },
{ a: 1, b: 2, expected: 3 },
{ a: 2, b: 1, expected: 3 },
].forEach(({ a, b, expected }) => {
test(`given ${a} and ${b} as arguments, returns ${expected}`, ({ page }) => {
expect(a + b).toEqual(expected);
});
});
Don't put the loop inside a single test. That collapses all cases into one test body — a
failure in one iteration aborts the rest, and the runner reports a single failure with no
per-case visibility:
test('captures transactions for all routes', async ({ baseURL }) => {
for (const route of ['/users', '/posts', '/comments']) {
const txn = await waitForTransaction(APP_NAME, e => e.transaction === `GET ${route}`);
await fetch(`${baseURL}${route}`);
expect(txn.contexts?.trace?.op).toBe('http.server');
}
});
Common pitfalls
- Proxy name mismatch:
APP_NAME must match proxyServerName in start-event-proxy.mjs.
- Flaky predicates: Add enough specificity (path, method, unique marker) to disambiguate.
- Forgetting
await: The waitFor* helpers return a promise. Without await, the test passes
vacuously and the assertion never runs.
Checklist
Before you're done, verify each test against these criteria: