| name | tdd |
| description | Strict Test-Driven Development workflow enforcing the RED-GREEN-REFACTOR cycle. Ensures every line of production code is justified by a failing test. Use when building new features, fixing bugs, or adding behavior. |
| category | process |
| triggers | ["test first","test-driven","red green refactor","write test before code"] |
TDD
Purpose: Enforce strict Test-Driven Development with verified RED-GREEN-REFACTOR cycles
Phases: RED -> Verify RED -> GREEN -> Verify GREEN -> REFACTOR
Usage: /tdd <feature or behavior description>
Iron Laws
- NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST -- Every line of production code must be justified by a test that fails without it. No exceptions. No "just this once."
- WRITE THE MINIMUM CODE TO PASS -- The GREEN phase produces only what the test demands. No extra features, no future-proofing, no refactoring. Just make the red test green.
- DELETE AND RESTART IF VIOLATED -- If production code was written before its test, delete it completely. Then write the test first. There is no shortcut that preserves TDD's guarantees.
Note: Command examples use npm as default. Adapt to the project's package manager per ai-assistant-protocol — Project Commands.
Why TDD for AI Agents
Everything that makes TDD tedious for humans makes it ideal for AI: clear measurable goals per cycle, mechanical verification at each step, and tight feedback loops. Tests serve as natural-language specs that guide the agent toward exactly the behavior you expect.
When to Use
- Building new features or modules from scratch
- Fixing bugs (write test that reproduces bug first)
- Adding behavior to existing code
- Implementing interfaces or contracts
- Any work where correctness matters more than speed
When NOT to Use
- Exploratory prototyping (explore first, then delete and TDD) ->
/explore
- Code already written that needs tests after the fact ->
/test-coverage
- Refactoring working code with passing tests ->
/refactor
- Investigating a bug you don't understand yet ->
/debug
- Configuration or infrastructure files with no testable logic
Never Do
- Never keep production code written before its test -- Delete it. Write the test. Then rewrite the code. The test must fail first or you have no proof it catches anything.
- Never skip the verify step -- Running the test and confirming it fails (RED) or passes (GREEN) is not optional. The verify step is what separates TDD from "writing tests."
- Never mock what you don't own -- Don't mock third-party libraries or framework internals. Wrap them in your own interface and mock that. See testing-anti-patterns.md.
- Never test implementation details -- Test behavior, not how the code achieves it. If refactoring breaks your tests but not your behavior, your tests are wrong.
- Never write more than one failing test at a time -- One RED test. Make it GREEN. Refactor. Then write the next test. Multiple failing tests create confusion and split focus.
- Never refactor during GREEN -- GREEN means "make it pass." REFACTOR is a separate phase. Mixing them means you can't tell if breakage came from the fix or the cleanup.
- Never commit with failing tests -- Every commit must be GREEN. If you can't make it green, revert to the last green state.
Gate Enforcement
CRITICAL: Each phase transition requires verification.
- RED -> Verify RED: Test MUST fail. Failure MUST be for the right reason.
- Verify RED -> GREEN: Test MUST pass after code change.
- GREEN -> Verify GREEN: All tests MUST pass, not just the new one.
- Verify GREEN -> REFACTOR: Only begin cleanup with a fully green suite.
Phase 0: Setup -- Decompose and Plan
Mode: Analysis and test file creation -- no production code.
Step 0.1: Decompose into Testable Behaviors
Decompose features into testable behaviors, not methods. A behavior is a single observable outcome: "When [condition], it should [result]." Methods may have multiple behaviors -- happy paths, edge cases, and error cases are separate behaviors. Decompose complex behaviors into simpler ones until each can be tested with a single assertion.
## Feature Decomposition
**Feature:** [Feature name]
**Behaviors identified:** [count]
| # | Behavior | Expected |
|---|----------|----------|
| 1 | When [condition], it should [result] | [expected output] |
| 2 | When [condition], it should [result] | [expected output] |
| ... | ... | ... |
Step 0.2: Create Test File with Roadmap
Before the first RED phase, create the test file with the describe block and a list of planned test descriptions as skipped/todo tests. This provides a roadmap for the session and makes progress visible:
describe('ModuleName', () => {
it.todo('calculates subtotal from item prices and quantities');
it.todo('returns zero for an empty item list');
it.todo('applies percentage discount code');
it.todo('rejects invalid discount codes with an error');
it.todo('calculates tax based on region-specific rate');
});
Run the test file to confirm it loads without errors (all tests show as "todo/skipped"). This file is the session plan -- each RED phase replaces the next it.todo with a real test.
Phase 1: RED -- Write a Failing Test
Mode: Test files only -- no production code.
Step 1.1: Identify the Next Behavior
Pick the next behavior from the decomposition (Step 0.1). Replace its it.todo entry with a real test. One assertion, one concept.
## TDD Cycle -- Behavior N/M
**Behavior:** [What the code should do]
**Input:** [What goes in]
**Expected output:** [What comes out]
Step 1.2: Write ONE Minimal Failing Test
describe('ModuleName', () => {
it('should [expected behavior] when [condition]', () => {
const input = createInput();
const result = moduleName.doSomething(input);
expect(result).toEqual(expectedOutput);
});
});
Rules:
- Test name describes behavior, not implementation ("should calculate total with tax" not "should call multiply")
- Use real objects over mocks wherever possible
- One logical assertion per test (multiple
expect calls are fine if they assert one concept)
- The test must reference production code that does not yet exist or does not yet handle this case
Step 1.3: Confirm Test is Written
## RED Phase Complete
**Test file:** `path/to/module.spec.ts`
**Test name:** `should [behavior] when [condition]`
**Asserts:** [what it checks]
Ready to verify this test fails.
Phase 2: Verify RED -- Confirm the Test Fails
Mode: Read-only verification -- run tests, do not change anything.
Step 2.1: Run the Test
npm run test -- path/to/module.spec.ts
Step 2.2: Confirm Failure Reason
CRITICAL: The test must fail for the right reason.
| Failure Type | Verdict | Action |
|---|
| Function not found / module not exported | Valid RED | Proceed to GREEN |
| Assertion fails (wrong return value) | Valid RED | Proceed to GREEN |
| Syntax error in test | Invalid RED | Fix the test, re-run |
| Wrong import path | Invalid RED | Fix the test, re-run |
| Unrelated test fails | Invalid RED | Fix the unrelated failure first |
| Test passes unexpectedly | Invalid RED | The behavior already exists -- write a different test or verify your test actually tests what you think |
## RED Verified
**Test result:** FAIL
**Failure reason:** [why it failed]
**Valid failure:** Yes / No
**Action:** Proceed to GREEN / Fix test and re-verify
GATE: Test must fail for a valid reason before proceeding.
Phase 3: GREEN -- Write Minimum Code to Pass
Mode: Production code -- minimal changes only.
Step 3.1: Write the Simplest Code That Passes
- If the test expects a return value, hardcode it if that makes the test pass
- Do not add error handling the test doesn't require
- Do not add features beyond what the test checks
- Do not refactor, rename, or reorganize
Step 3.2: Keep It Minimal
| Temptation | Response |
|---|
| "I should also handle the edge case" | Write a test for it first |
| "This needs error handling" | Write a test for the error first |
| "I should extract a helper" | Do that in REFACTOR |
| "The variable name is bad" | Rename in REFACTOR |
| "I know what the next test will need" | Write that test first |
## GREEN Phase Complete
**File changed:** `path/to/module.ts`
**Change:** [what was added/modified]
**Lines added:** [count]
Ready to verify all tests pass.
Phase 4: Verify GREEN -- Confirm All Tests Pass
Mode: Read-only verification -- run tests, do not change anything.
Step 4.1: Run the Test
npm run test -- path/to/module.spec.ts
Step 4.2: Run Related Tests
npm run test -- "path/to/directory/"
Step 4.3: Confirm Clean Pass
## GREEN Verified
**New test:** PASS
**Related tests:** PASS ({N} tests)
**Clean output:** Yes / No (warnings?)
Ready to refactor.
GATE: ALL tests must pass before proceeding to REFACTOR.
Phase 5: REFACTOR -- Improve Without Changing Behavior
Mode: Production and test code -- behavior must remain identical.
Step 5.1: Identify Improvements
- Extract duplicated logic into helpers
- Rename variables and functions for clarity
- Simplify conditional logic
- Remove dead code
- Improve test readability
- For utilities with well-defined input/output contracts, consider adding property-based tests (e.g., fast-check) to catch edge cases example-based tests miss
Step 5.2: Refactor in Small Steps
After each change:
npm run test -- path/to/module.spec.ts
If any test fails, undo the last change immediately. Refactoring must never break tests.
Step 5.3: Verify Final State
npm run test -- "path/to/directory/"
npm run typecheck
npm run lint
## REFACTOR Complete
**Changes made:**
- [Refactoring 1]
- [Refactoring 2]
**All tests:** PASS
**Type check:** PASS
**Lint:** PASS
Cycle complete. Ready for next behavior or commit.
Step 5.4: Track Progress
Track progress across cycles: "Behavior N/M: [description] -- RED/GREEN/REFACTOR". This prevents losing track in long TDD sessions. Report progress after each cycle completion.
## Progress
- [x] Behavior 1/M: [description] -- COMPLETE
- [x] Behavior 2/M: [description] -- COMPLETE
- [ ] Behavior 3/M: [description]
- [ ] ...
Commit Frequency
Commit at natural TDD boundaries:
- After completing each behavior cycle for complex features (each behavior involves significant logic or multiple files)
- After every 2-3 cycles for simpler features (behaviors are straightforward, single-file changes)
- Always commit before starting a risky refactor -- if the refactor might break things, ensure the current green state is committed first
Commit messages should reference the behavior:
test: add PriceCalculator subtotal calculation
test: add discount validation and tax calculation
refactor: extract shared pricing helpers
Never commit with failing tests. Every commit must be GREEN.
Common Rationalizations
When tempted to skip TDD, consult this table (see references/tdd-rationalizations.md for deeper analysis with real-world scenarios):
| Rationalization | Rebuttal |
|---|
| "Too simple to test" | Simple code breaks. A one-line function with a typo is still a bug in production. |
| "Need exploration first" | Explore freely. Then delete the exploration and rebuild with TDD. Exploration code is throwaway. |
| "Tests after achieve same result" | Tests-after verify what the code IS. Tests-first verify what the code SHOULD BE. Only one of these catches design mistakes. |
| "TDD is dogmatic" | TDD is pragmatic: it finds bugs at write time instead of debug time. Dogma would be following it without evidence. The evidence is overwhelming. |
| "I'll write the test right after" | You won't. And if you do, you can't verify it catches the bug -- because the code already passes. |
| "This is just a config file" | If it can break production, it can have a test. If it can't break production, why are you changing it? |
| "The function is obvious" | Obvious functions get called with non-obvious inputs. The test documents what "obvious" means. |
| "I need to see the implementation shape first" | The test IS the shape. Write what you want to call, then make it work. The test is the first client of your API. |
| "Mocking is too hard for this" | Hard-to-mock means hard-to-test means bad design. Fix the design. The difficulty is the signal. |
| "We're in a hurry" | TDD is faster. Debugging untested code is what wastes time. You're not saving time, you're borrowing it at high interest. |
| "Existing code has no tests" | Start now. Every tested line is one less future debugging session. Don't perpetuate the problem because someone else created it. |
Red Flags -- Stop and Restart
If any of these occur, stop the current cycle and reassess:
- You wrote production code before a test -- Delete the code. Write the test.
- The test passes on the first run -- Your test doesn't test anything new. Rewrite it.
- You're not sure why the test fails -- You don't understand the system well enough. Read the code first.
- You added "just one more thing" in GREEN -- Revert to the last green state. Write a test for the extra thing.
- You're mocking more than two dependencies -- The unit under test has too many collaborators. Refactor the design.
- The test name describes implementation -- "should call database" is wrong. "should return user by email" is right.
- You're testing private methods -- Test the public interface. Private methods are implementation details.
- Multiple tests are failing at once -- You jumped ahead. Revert to last green. One failing test at a time.
- You refactored during GREEN -- Revert. Make it pass first, then clean up.
- The test requires complex setup (>15 lines of arrange) -- The code under test needs a simpler interface. Refactor first.
- You're writing tests to match existing code -- That's
/test-coverage, not /tdd. TDD means the test comes first.
- You feel confident enough to skip verification -- That's exactly when bugs slip through. Run the test.
Example: TDD Cycle for a Bug Fix
RED
A user reports that calculateDiscount returns negative prices for 100% discounts.
describe('calculateDiscount', () => {
it('should return zero when discount is 100%', () => {
const result = calculateDiscount(50.00, 100);
expect(result).toBe(0);
});
});
Verify RED
npm run test -- discount.spec.ts
GREEN
export function calculateDiscount(price: number, discountPercent: number): number {
const discounted = price - (price * discountPercent / 100);
return Math.max(0, discounted);
}
Verify GREEN
npm run test -- discount.spec.ts
REFACTOR
No refactoring needed for this small change. Run full suite to confirm:
npm run test -- "src/pricing/"
Commit the fix with its regression test.
Verification Checklist
Before committing, confirm every item:
When Stuck
See references/tdd-troubleshooting.md for detailed decision tables, the "wish" technique, and debugging integration guidance.
| Problem | Solution |
|---|
| Don't know what test to write | Describe the behavior in plain English first. The test is that sentence turned into code. |
| Test is too complex | Break the behavior into smaller pieces. Test each piece separately. |
| Can't make the test pass simply | The design is wrong. Step back. What interface would make this test trivial? Build that interface. |
| Too many mocks needed | The code has too many dependencies. Extract an interface, inject dependencies, or split the unit. |
| Existing code has no tests | Don't retrofit TDD. Use /test-coverage to add tests to existing code. Use /tdd for new behavior. |
| Feature is unclear | Use /explore or /plan first. Come back to /tdd when you know what to build. |
| Tests pass but behavior is wrong | Your tests don't cover the actual requirements. Write a new test for the failing scenario. |
| Refactoring breaks tests | Undo the refactoring. Refactoring should not change behavior. If it does, you're changing functionality -- write a test first. |
References
Acceptance Tests
| ID | Type | Prompt / Condition | Expected |
|---|
| TDD-T1 | Positive | "Write the test first, then implement" | Skill triggers |
| TDD-T2 | Positive | "Use red-green-refactor for this feature" | Skill triggers |
| TDD-T3 | Positive | "Test-driven development for the parser" | Skill triggers |
| TDD-T4 | Negative | "Add tests for existing code" | Does NOT trigger (→ /test-coverage) |
| TDD-T5 | Negative | "Build the login page" | Does NOT trigger (→ /implement) |
| TDD-T6 | Negative | "Run the test suite" | Does NOT trigger (→ /validate) |
| TDD-T7 | Boundary | "Implement with tests" | Does NOT trigger (→ /implement, has TDD-lite) |