ワンクリックで
evanflow-tdd
// Vertical-slice TDD for any production code. One test → one impl → repeat. Tests verify behavior through public interfaces, not internals. Use when implementing any feature, bugfix, or behavior change.
// Vertical-slice TDD for any production code. One test → one impl → repeat. Tests verify behavior through public interfaces, not internals. Use when implementing any feature, bugfix, or behavior change.
| name | evanflow-tdd |
| description | Vertical-slice TDD for any production code. One test → one impl → repeat. Tests verify behavior through public interfaces, not internals. Use when implementing any feature, bugfix, or behavior change. |
See evanflow meta-skill. Key terms: vertical slice, behavior through public interface, deep module.
Tests verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't break unless behavior changes.
Good test: "user can perform action X within their weekly rate limit" — describes capability.
Bad test: "calls createX() with status 'QUEUED' then queues a job" — describes mechanics. Renames break it.
DO NOT write all tests first then all implementation. That produces tests of imagined behavior, not actual behavior. They become insensitive to real changes.
DO vertical slices: one test → one implementation → repeat. Each test responds to what you learned from the previous cycle.
database.types.ts)Before writing any test, confirm with the user:
Anti-tailoring check (vertical slicing's biggest risk): before each new test, ask:
If the test only makes sense given your specific impl, it's an internals test wearing a behavior costume. Rewrite it against the contract, or drop it.
Default to integration-style tests against real services (real DB, real queue, real cache) where feasible. Mocked dependencies frequently mask divergence between test and production behavior. Document any project-specific exception in your CLAUDE.md.
Write ONE test for ONE behavior end-to-end. Prove the path works.
RED: Write test → run → confirm it fails for the RIGHT reason
GREEN: Write minimal code → run → confirm it passes
REFACTOR: Clean the impl + the test you just wrote, while it's fresh and green
The REFACTOR step is non-optional and per-cycle — it happens with the test you just wrote as your safety net, not after all tests are done. Refactoring cold code days later is a different (weaker) activity; that lives in evanflow-iterate.
For each remaining behavior, repeat the full RED-GREEN-REFACTOR cycle:
RED: Write next test → fails for the right reason
GREEN: Minimal code to pass → passes
REFACTOR: Clean before moving on (see checklist below)
Rules:
After each GREEN, before writing the next failing test, scan the just-touched code:
Run tests after each refactor step. Never refactor while RED — get to GREEN first.
If a refactor would change behavior, stop: that's a new test, not a refactor.
evanflow-iterate)Cross-cutting refactors that span the whole feature (extracting a shared module across multiple cycles, pulling out a deeper abstraction, restructuring the file layout) belong in evanflow-iterate's self-review pass — after all per-cycle refactors are done. Don't conflate the two: per-cycle refactor uses a fresh test as safety net; macro refactor uses the whole test suite.
[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Test would survive an internal refactor (rename, restructure)
[ ] Code is minimal for this test
[ ] No speculative features added
[ ] Test fails for the right reason before code is written
[ ] ASSERTION IS CORRECT — see warning below
Industry research (HumanEval evaluation across four LLMs) found that over 62% of LLM-generated test assertions were incorrect. This is the single most likely failure mode in LLM-driven TDD: the test passes, but it's testing the wrong thing.
Before writing any test assertion, verify:
response.status when the meaningful field is response.body.error.When in doubt about what to assert, STOP and ask the user rather than guess. An asserted-on-the-wrong-thing test is worse than no test — it provides false confidence.
evanflow-executing-plans to mark task doneevanflow-design-interface to redesignevanflow-improve-architectureSingle entry-point orchestrator for the entire EvanFlow loop. The user says "let's evanflow this", "use evanflow", "evanflow this idea", "run this through evanflow", "implement with evanflow", or any similar phrasing — and you walk the full loop (brainstorm → plan → execute → tdd → iterate → STOP) end-to-end, announcing each step, respecting checkpoints, and stopping where the user retains control. Trigger keywords: "evanflow", "let's evanflow", "use evanflow", "evanflow this".
Meta skill for the EvanFlow system. Loads the shared vocabulary (deep modules, deletion test, vertical slice, grill, mockup quick-mode, no-auto-commit) and describes when to invoke each evanflow-* skill. Use when starting a new task and unsure which evanflow skill applies, or when you need to ground reasoning in the shared vocabulary.
Clarify intent, propose 2-3 approaches, embedded grill to stress-test the chosen path. Use before any creative work — new features, components, behavior changes, design questions. Mockup-only requests use mockup quick-mode (no spec/plan ceremony).
Orchestrate parallel implementation with coder/overseer pairs. Coders implement decomposed tasks using evanflow-tdd; overseers review each coder's output for bugs, gaps, errors, AND cohesion violations against a shared contract. A final integration overseer checks cross-coder cohesion. Use for plans with 3+ truly independent tasks that share an interface contract.
Manage long-session context to prevent drift and degradation. Strategies for proactive summarization, branch isolation, and /clear decisions. Invoke when context feels heavy, when accuracy starts slipping, or proactively after a major phase boundary. Addresses the
Root-cause discipline for bugs, test failures, and unexpected behavior. Embedded grill on the hypothesis before writing fix code. Use when encountering any bug, failing test, or behavior that doesn't match expectation.