with one click
forge-tdd
// WHEN: About to write any production code. HARD-GATE: Iron law - write test first, watch fail, write minimal code, watch pass. No exceptions.
// WHEN: About to write any production code. HARD-GATE: Iron law - write test first, watch fail, write minimal code, watch pass. No exceptions.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | forge-tdd |
| description | WHEN: About to write any production code. HARD-GATE: Iron law - write test first, watch fail, write minimal code, watch pass. No exceptions. |
| type | rigid |
| version | 1.0.1 |
| preamble-tier | 3 |
| triggers | ["write test first","TDD","test-driven development","RED test before code"] |
| allowed-tools | ["Bash","Write"] |
HARD-GATE: Non-negotiable. No production code without failing test first.
| Rationalization | The Truth |
|---|---|
| "This is a simple feature, TDD feels slow" | Simplicity hides the hardest bugs. TDD catches them. Slow startup, faster debugging. Net win. |
| "I'll write tests after to save time" | You won't. Post-hoc tests miss 40% of edge cases TDD would have caught. Write first. |
| "The spec is clear enough, I don't need tests to clarify it" | No spec is ever clear enough. Test is the spec. Test forces you to think through edge cases. |
| "I can skip the test-run-fail step, I know it will fail" | NO. You MUST run it and see the failure. Seeing the failure teaches you what you're fixing. |
| "Our codebase doesn't do TDD, I'll follow convention" | YOU are disciplined. Convention doesn't override discipline. Do it. |
| "I'll test as I code (test-parallel) instead of test-first" | Wrong order. Test FIRST, then code. The order matters. Test-first catches what test-parallel misses. |
| "This code is internal/hidden, no one will use it, I'll skip TDD" | Internal code is harder to test and debug. TDD is MORE important, not less. |
| "I already know what tests to write, I'll code first" | You don't. Writing code first blinds you to edge cases. Test first clarifies. |
| "The test infrastructure is broken, I'll work around it" | STOP. Report BLOCKED. Don't workaround. Fix or escalate. |
| "I wrote a test that passes but it doesn't actually test anything" | Weak tests are worse than no tests. Test must verify behavior, not just syntax. |
| "RED tests only mirror the tech plan, not approved QA cases" | When qa/manual-test-cases.csv exists for the task, RED should map to those atomic rows (or explicitly document gaps). Otherwise TDD and P4.4 semantic machine eval drift from the acceptance inventory the team signed. |
NO PRODUCTION CODE EXISTS BEFORE THE TEST.
If you write 1 line of code without a test first, you have failed.
If you notice any of these, STOP and do not proceed:
delivery_mechanism + implementation_stack (or legacy ui_implementation_stack) from prd-locked.md Q10 and approved QA rows when present.Write a test that describes the desired behavior
Run the test
Success Criterion: Red test fails with clear, meaningful error.
When qa/manual-test-cases.csv exists for the task, tie each RED test to an acceptance Id and/or a qa/semantic-automation.csv step Id using a comment the verifier scans:
def test_valid_login_succeeds(self):
# forge-tdd: TC-001 (manual-test-cases.csv)
# forge-tdd: step-login (semantic-automation.csv)
...
# forge-tdd:). Parenthetical hints are optional.Required column on the manual CSV (yes / true / 1 / y) means at least one scanned test must include # forge-tdd: <that Id>.~/forge/brain/prds/<task-id>/: all test_*.py and *_test.py, or only paths/globs listed in qa/tdd-scan-paths.txt (one entry per line, relative to the task dir).python3 tools/verify/verify_forge_task.py --verify-tdd-csv-trace fails the build when markers reference unknown ids or required rows lack markers.forge_drift_check.py adds these # forge-tdd: lines to the success-criteria haystack by default (use --skip-tdd-marker-hay to omit them).Write minimal code to make test pass
Run the test
Success Criterion: Test passes. Only minimal code added.
Refactor the implementation (not the test)
Run tests again
Stop refactoring
Success Criterion: Tests still pass, code is cleaner, scope unchanged.
Example:
# TASK: Add method to validate email format
# BAD TEST: Tests that function exists and returns something
def test_validate_email():
assert email_validator.validate_email("test@example.com") is not None
# GOOD TEST: Tests the specific behavior
def test_validate_email_accepts_valid_address():
assert email_validator.validate_email("test@example.com") == True
def test_validate_email_rejects_invalid_address():
assert email_validator.validate_email("invalid") == False
def test_validate_email_rejects_missing_at_symbol():
assert email_validator.validate_email("testexample.com") == False
# Run your new test
$ npm test -- test_email_validator.js
# MUST see: Test FAILED / Red
Do not proceed until you see the test fail.
# MINIMAL implementation
def validate_email(email: str) -> bool:
return "@" in email and "." in email.split("@")[1]
$ npm test -- test_email_validator.js
# MUST see: Test PASSED / Green
Do not proceed until the test passes.
$ npm test # All tests, not just the new one
# MUST see: All tests still pass
If existing tests now fail, you broke something. Fix it before proceeding.
Only after tests pass:
If yes → refactor. Then re-run all tests to verify.
Read existing tests in the project. Copy the pattern. Ask for clarification if needed.
Report BLOCKED: Test infrastructure broken [details]. Do not attempt workarounds. Escalate.
These are typically not TDD-able. But: if possible, write a test that verifies the logging/docs exist and are correct. If not possible, report NEEDS_CONTEXT: Task not TDD-compatible [reason].
That's normal and correct. Complex behavior requires complex tests. Tests are not overhead; they're part of the implementation.
Stop. Report NEEDS_CONTEXT: Unclear acceptance criteria [details]. Don't guess.
You probably wrote too much code. Roll back. Write LESS code. Make the first test pass. Then write the next test.
Don't. Refactor ONLY the code you touched. Out-of-scope refactoring is not part of the task.
Create a test file. Use standard test framework for the language. Start with one test. But: if test infrastructure is fundamentally broken, report BLOCKED.
✅ PASS:
❌ BLOCKED:
Before claiming task is done, verify:
# 1. New test exists
$ grep -r "def test_" <test_file> # New test present?
# 2. Test passes
$ npm test -- <test_file> # All tests in file pass?
# 3. All tests pass
$ npm test # No regressions?
# 4. Code is minimal
$ git diff <file> # Only necessary changes? No extra features?
# 5. Code is clear
# Read the code. Is it obvious what it does?
If all 5 pass → done. Otherwise → iterate.
Situation: Test is valid and correct, but takes 15+ seconds to run. Each RED-GREEN cycle takes minutes instead of seconds.
Example: Database integration test that creates 1000 records and validates queries. Valid test, but too slow for fast feedback loop.
Do NOT: Skip the test or reduce test scope because it's slow. Slow tests catch real issues.
Action:
Situation: Test passes sometimes, fails sometimes. No clear pattern (timing, state, environment).
Example: Test that polls for eventual consistency; sometimes data appears in 10ms, sometimes 500ms. Test has hard-coded 50ms wait.
Do NOT: Accept flakiness or increase timeouts. Flakiness is a real bug in code or test.
Action:
Situation: Test is valid, but requires external service that's not running or accessible.
Example: Test for payment gateway integration requires live API connection. API service is down.
Do NOT: Skip the test or mock the service permanently. Integration tests must eventually test real integration.
Action:
Situation: Code to test is tightly coupled to global state, static calls, or framework internals. Cannot unit test without refactoring code itself.
Example: Class that calls Database.getInstance().query() globally; Database is a singleton with no way to inject a test double.
Do NOT: Skip TDD or write test after code. Tight coupling is the problem.
Action:
// BEFORE: tightly coupled
class UserRepository {
def getUser(id) { Database.getInstance().query(...) }
}
// AFTER: injectable
class UserRepository {
constructor(database) { this.db = database }
def getUser(id) { this.db.query(...) }
}
Situation: Behavior cannot be verified with unit test alone. Service boundary requires integration test (multiple services, eventual consistency, network behavior).
Example: Feature: "Cache invalidated when user data changes". Requires backend write → cache invalidation → frontend read. One service cannot test alone.
Do NOT: Force a unit test for inherently distributed behavior. Integration tests are valid TDD.
Action:
Use this tree to decide which test to write first:
START: I need to test behavior X
Q1: Does X require multiple services/processes?
├─ YES → Q2
└─ NO → UNIT TEST (test single component in isolation)
Q2: Does X depend on eventual consistency, timing, or network?
├─ YES → INTEGRATION TEST (test end-to-end, multiple services)
└─ NO → Could be unit test with mocks (see Q3)
Q3: Is the external service/boundary testable in isolation?
├─ NO → INTEGRATION TEST (no way around it)
├─ YES, easy to mock → UNIT TEST (mock the boundary)
└─ YES, but mocking loses important behavior → INTEGRATION TEST (test real behavior)
Q4: Is the integration test too slow (>5 sec)?
├─ YES → Split: UNIT TEST (fast path) + INTEGRATION TEST (slow path, run separately)
└─ NO → Single INTEGRATION TEST suffices
DECISION RULE:
- Unit test for single component logic (validation, calculation, formatting)
- Integration test for multi-component behavior (contracts, APIs, eventual consistency)
- Always write test first (RED), regardless of type
- Fast feedback: prefer unit tests for development
- Final verification: integration tests before merge
Output: TDD PASS (test first, minimal code, all tests pass) or BLOCKED (test infrastructure broken, untestable legacy code, infrastructure unavailable after attempts to restore)
TDD is not about writing tests. It's about:
TDD feels slow at first. After the RED phase (writing the test), you're 60% done. The code is the easy 40%.
This skill is RIGID: type=rigid. Do not bend it.
TDD is the foundation. Every other discipline depends on it.
Note: This version includes edge cases and decision tree for complex testing scenarios (slow tests, flaky tests, infrastructure dependencies, legacy code, integration vs. unit testing).