| name | qa-test-data-management |
| description | Test data management — fixtures generated from real schemas (TS types, DB schema, OpenAPI), PII hygiene (faker/factory_boy for synthetics), prod-like masking when copying prod data, environment isolation (testcontainers, transactional rollback, tempdir), fixture lifecycle. Defense against Mode 1 (mock obsession) — fixture does not drift from reality. |
| type | discretionary |
| domain | development |
| owners | ["tester"] |
| gates | ["TEST"] |
| tech | ["docker","faker","factory_boy"] |
| topic | ["qa"] |
| triggers | ["test data management","test fixtures","PII in tests","faker","factory_boy","testcontainers","prod data masking","fixture drift","environment isolation"] |
| related | ["qa-test-plan","qa-regression-baseline","qa-mutation-testing","tests-quality-review"] |
| budget_lines | 250 |
| schema_version | 1 |
Skill: QA Test Data Management
Test data management so fixtures do not drift from reality, do not contain PII, and do not pollute environment between runs. Defense against Mode 1 of Test Integrity Defense (mock obsession) — because mock obsession often starts with a "hand-crafted" fixture that diverges from real schema.
Sections:
- When to apply
- Fixture derivation from real schemas
- PII hygiene
- Prod-like data masking
- Environment isolation
- Fixture lifecycle
- Anti-patterns
- Output template
- DoD
1. When to apply
Triggers:
- First integration test on new module / new DB table / new API endpoint
- Security alert: PII detected in test fixtures (auto-scan, see
$qa-test-integrity-audit)
- DB schema migration — fixtures of all affected tests need re-generation
- Stabilization sprint includes test data quality audit
Outputs:
- Fixture generation script / decorator / factory class
- Mask config for prod-data clone (if applicable)
- Cleanup hooks for testcontainers / tempdir
- PII audit report
2. Fixture derivation from real schemas
Rule: fixture generated from source-of-truth schema, not "hand-crafted".
JS/TS — from TypeScript types via faker:
import { faker } from "@faker-js/faker";
import type { User } from "@/types/user";
export function makeUser(overrides?: Partial<User>): User {
return {
id: faker.string.uuid(),
email: faker.internet.email(),
name: faker.person.fullName(),
role: "user",
createdAt: faker.date.recent(),
...overrides,
};
}
Python — from dataclass via factory_boy:
import factory
from app.models import User
class UserFactory(factory.Factory):
class Meta:
model = User
id = factory.Faker("uuid4")
email = factory.Faker("email")
name = factory.Faker("name")
role = "user"
From OpenAPI / JSON Schema: use openapi-typescript-codegen (TS) or datamodel-code-generator (Python) for type derivation, then factory as above.
Why: schema changes — type changes — factory fails at compile — tests rewritten to new schema. Without this, fixture silently grows stale, tests green, prod fails.
3. PII hygiene
What must NOT appear in test fixtures (ever):
- Real emails / phones / addresses (even your own)
- Real payment data (CC numbers, IBAN, SWIFT)
- Real names of real people
- Real API keys / tokens / passwords
- Real customer IDs from prod
What to use instead:
- faker (JS/TS):
faker.internet.email(), faker.phone.number(), faker.location.streetAddress()
- factory_boy + faker (Python):
factory.Faker("email"), factory.Faker("phone_number")
Detection: ESLint rule no-restricted-syntax + custom regex on email/phone patterns in test files. Pre-commit hook blocks if anything resembling real PII found (see $qa-test-integrity-audit §1).
If PII accidentally committed:
- Rotate compromised credentials immediately
- Filter git history (
git filter-repo) — NOT rebase, so original commits become inaccessible
- Notify owners if real customer data was involved
- Document incident in security-baseline-dev runbook
4. Prod-like data masking
Sometimes you need testing on realistic distribution (volume, edge cases) — copy prod data into staging/test, but mask PII.
Masking rules:
- Emails: replace local part with hash, keep domain category (
user-a3f9@example.com)
- Names: replace with faker.person.fullName() seeded by hash of original (consistent within run)
- Phones: randomize last 4 digits keeping country code
- Dates: shift by random Δ ±30 days (preserve weekday patterns)
- IDs: preserve referential integrity through deterministic mapping (
user_123 → user_anon_xyz consistently)
- Payment: zeroize CC, randomize amounts within ±20% range
Tools: pgmasq (Postgres), Faker.unique + custom transformers, in-house masking script. Run AS PART OF prod→staging sync, not in test runtime.
Audit: after mask sample 100 rows, manual check for no leakage. Document mask config in repo so reviewer audits.
5. Environment isolation
Every test run must start with clean state and not leave artifacts for the next.
DB isolation patterns:
- Testcontainers (Docker-based fresh DB per test suite):
@testcontainers/postgresql (JS), testcontainers (Python). Spin up real Postgres/MongoDB in Docker container, teardown after.
- Transactional rollback (faster than testcontainers): wrap test in transaction, rollback at end. Works for SQL DBs, not for MongoDB.
- In-memory adapters (fastest, but less real): SQLite for Postgres-like, mongo-memory-server.
File system isolation:
tmp directories per test (os.tmpdir() + uuid)
- Cleanup in
afterEach / pytest fixture teardown
- Do NOT use shared
tests/fixtures/output/ directory
Network isolation:
- MSW (JS) / responses (Python) for mock external HTTP at boundary
- Never hit real API in test — flaky + slow + cost
State between tests:
- Module-level state NOT shared (avoid singletons in production code or wipe in
beforeEach)
- DB indexes/sequences reset between suites
6. Fixture lifecycle
Create:
- Generation via factory (not hardcode)
- Minimum data needed for test (no "kitchen sink" fixtures)
- Specific test → specific fixture builder
Use:
- One test → one fixture instance (not shared)
- If related entities needed — factory composition (
UserFactory.subFactory(OrderFactory))
- Overrides only for test-specific variations
Cleanup:
- Testcontainers — auto-teardown on suite exit
- DB transactional rollback — auto on test exit
- File system —
afterEach cleanup hook
- Tempdir — registered for OS cleanup
Storage convention:
- Factory definitions:
tests/factories/ (JS) or tests/conftest.py + factory module (Python)
- Test data files: do NOT commit large fixture JSON files into git; generate on demand
- Snapshot files OK in git (Playwright traces, vitest snapshots) — but without PII
7. Anti-patterns
- 🔴 Hardcoded real PII —
expect(user.email).toBe("user@example.com") — security leak + flaky when the user changes email.
- 🔴 Shared mutable fixture —
beforeAll(() => fixture = makeUser()) + tests mutate fixture — order dependence, flaky.
- 🔴 Fixture drift — handcrafted
{id: 1, name: "test"} that doesn't use TS type / schema. Schema changes, fixture doesn't fail, prod fails.
- 🟠 No cleanup — testcontainers started, teardown forgotten — CI runner clogged with zombie containers within hours.
- 🟠 Kitchen sink fixture —
makeUserWithEverything() returns user + orders + payments + sessions — each test gets more than needed, an update breaks all tests.
- 🟡 Faker without seed — randomness good, but without seed cannot reproduce failing test. Pin seed for reproducible builds (
faker.seed(42) per test).
8. Output template
Tester includes in TEST report (when applicable):
### Test Data Management
- Factories used: N (list)
- PII audit: pass / N findings
- Masking config: configured / N/A
- Environment isolation:
- testcontainers: N suites
- transactional rollback: N suites
- tempdir cleanup: configured
- Fixture seeding: seeded (reproducible) / random
- Action items: [fixture refactors / PII findings to fix]
9. DoD
See also
$qa-test-plan — test plan accounts for test data requirements per flow
$qa-regression-baseline — baseline tracks fixture changes between releases
$qa-mutation-testing — mutation depends strongly on fixture quality
$qa-test-integrity-audit — PII detection rules + fixture drift detection
$qa-flaky-test-protocol — env isolation issues → flaky cause
$tests-quality-review — Reviewer-side audit including test data quality
$security-baseline-dev — PII handling rules consistent with production code