ワンクリックでManusで任意のスキルを実行

$pwd:

evanflow-tdd

Name: Evanflow Tdd
Author: evanklem

// Vertical-slice TDD for any production code. One test → one impl → repeat. Tests verify behavior through public interfaces, not internals. Use when implementing any feature, bugfix, or behavior change.

Manusで実行

$ git log --oneline --stat

stars:408

forks:17

updated:2026年4月27日 15:30

SKILL.md

readonly

name	evanflow-tdd
description	Vertical-slice TDD for any production code. One test → one impl → repeat. Tests verify behavior through public interfaces, not internals. Use when implementing any feature, bugfix, or behavior change.

EvanFlow: TDD

Vocabulary

See evanflow meta-skill. Key terms: vertical slice, behavior through public interface, deep module.

Core Principle

Tests verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't break unless behavior changes.

Good test: "user can perform action X within their weekly rate limit" — describes capability.

Bad test: "calls createX() with status 'QUEUED' then queues a job" — describes mechanics. Renames break it.

Anti-Pattern: Horizontal Slices

DO NOT write all tests first then all implementation. That produces tests of imagined behavior, not actual behavior. They become insensitive to real changes.

DO vertical slices: one test → one implementation → repeat. Each test responds to what you learned from the previous cycle.

When to Use

Any production code change (new feature, bug fix, behavior change, refactor with behavior implications)
All new code in your backend's routers and services
All new code in your frontend that has testable logic (not pure-presentation components)

When to Skip (with explicit user approval)

Throwaway prototypes
Generated code (e.g., database.types.ts)
Configuration files
Pure-presentation React components with no logic

The Flow

1. Embedded Grill — "What to Test"

Before writing any test, confirm with the user:

"Which behaviors matter most? We can't test everything."
"What's the public interface — what will callers actually use?"
"Are there opportunities to make this a deep module (small interface, complex internals)?"
"Where do tests need to integrate with real services (DB, payment provider, email provider) vs. where can we test in isolation?"

Anti-tailoring check (vertical slicing's biggest risk): before each new test, ask:

"Am I pinning behavior the spec/contract names, or am I pinning the impl I've already imagined?"
"Could I write this next test knowing only the public contract, before reading any of the impl I just wrote?"
"If a different impl satisfied the same contract, would this test still pass?"

If the test only makes sense given your specific impl, it's an internals test wearing a behavior costume. Rewrite it against the contract, or drop it.

Default to integration-style tests against real services (real DB, real queue, real cache) where feasible. Mocked dependencies frequently mask divergence between test and production behavior. Document any project-specific exception in your CLAUDE.md.

2. Tracer Bullet

Write ONE test for ONE behavior end-to-end. Prove the path works.

RED:      Write test → run → confirm it fails for the RIGHT reason
GREEN:    Write minimal code → run → confirm it passes
REFACTOR: Clean the impl + the test you just wrote, while it's fresh and green

The REFACTOR step is non-optional and per-cycle — it happens with the test you just wrote as your safety net, not after all tests are done. Refactoring cold code days later is a different (weaker) activity; that lives in evanflow-iterate.

3. Incremental Loop

For each remaining behavior, repeat the full RED-GREEN-REFACTOR cycle:

RED:      Write next test → fails for the right reason
GREEN:    Minimal code to pass → passes
REFACTOR: Clean before moving on (see checklist below)

Rules:

One test at a time
Only enough code to pass the current test
Don't anticipate future tests
Tests focus on observable behavior, not internals
Never skip REFACTOR. "I'll clean it up later" is how dead code, duplication, and shallow modules accumulate.

4. Per-Cycle Refactor Checklist

After each GREEN, before writing the next failing test, scan the just-touched code:

Duplication — extract if used twice with the same intent (not just structurally similar)
Naming — does the new name match what the code does? Rename now, while the test pins behavior
Deletion test — does the new module/function earn its existence, or did GREEN add bloat?
Deep-module check — small interface hiding the complexity, or shallow wrapper leaking it?
Test cleanliness — does the test still describe behavior crisply? Names, setup, assertion all clear?

Run tests after each refactor step. Never refactor while RED — get to GREEN first.

If a refactor would change behavior, stop: that's a new test, not a refactor.

5. Macro Refactor (deferred to `evanflow-iterate`)

Cross-cutting refactors that span the whole feature (extracting a shared module across multiple cycles, pulling out a deeper abstraction, restructuring the file layout) belong in evanflow-iterate's self-review pass — after all per-cycle refactors are done. Don't conflate the two: per-cycle refactor uses a fresh test as safety net; macro refactor uses the whole test suite.

Per-Cycle Checklist

[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Test would survive an internal refactor (rename, restructure)
[ ] Code is minimal for this test
[ ] No speculative features added
[ ] Test fails for the right reason before code is written
[ ] ASSERTION IS CORRECT — see warning below

⚠️ Assertion-Correctness Warning

Industry research (HumanEval evaluation across four LLMs) found that over 62% of LLM-generated test assertions were incorrect. This is the single most likely failure mode in LLM-driven TDD: the test passes, but it's testing the wrong thing.

Before writing any test assertion, verify:

Does this assertion match what the user actually wants? Don't assert on behavior you imagined — assert on behavior the spec/contract names.
Is this the assertion's most-precise form? "result is truthy" is weaker than "result equals 42". Loose assertions catch wrong things and miss right things.
Would this assertion still pass if the code was subtly wrong? Mentally introduce a one-character bug — does the assertion catch it? If not, the assertion is too weak.
Are you asserting on the right field? A common failure: asserting response.status when the meaningful field is response.body.error.
For computed values: did you compute the expected value correctly? Don't trust your own arithmetic — verify by hand or another path.

When in doubt about what to assert, STOP and ask the user rather than guess. An asserted-on-the-wrong-thing test is worse than no test — it provides false confidence.

Hard Rules

Vertical slices only. Never write all tests first.
REFACTOR is per-cycle, not deferred. Every GREEN is followed by a refactor pass on the just-written code, with the fresh test as safety net. Deferring all refactor to the end strips the safety net and is the most common way TDD-shaped code ends up with TDD-shaped scars.
Test behavior, not internals. If a rename breaks a test but behavior didn't change, the test was wrong.
Watch the test fail. If you didn't see RED, you don't know it tests the right thing.
Never auto-commit. TDD cycle is RED-GREEN-REFACTOR, not RED-GREEN-REFACTOR-COMMIT.
Default to real services for integration tests. Mocked databases routinely diverge from production behavior — prefer a test DB unless your project documents a specific exception.

Hand-offs

Tests + impl complete for the task → return to evanflow-executing-plans to mark task done
Discovered the interface is wrong → evanflow-design-interface to redesign
Discovered deeper architectural issue → evanflow-improve-architecture

related-skills.json

同じリポジトリ

evanflow-go.md

from "evanklem/evanflow"

Single entry-point orchestrator for the entire EvanFlow loop. The user says "let's evanflow this", "use evanflow", "evanflow this idea", "run this through evanflow", "implement with evanflow", or any similar phrasing — and you walk the full loop (brainstorm → plan → execute → tdd → iterate → STOP) end-to-end, announcing each step, respecting checkpoints, and stopping where the user retains control. Trigger keywords: "evanflow", "let's evanflow", "use evanflow", "evanflow this".

2026-04-27408

evanflow.md

from "evanklem/evanflow"

Meta skill for the EvanFlow system. Loads the shared vocabulary (deep modules, deletion test, vertical slice, grill, mockup quick-mode, no-auto-commit) and describes when to invoke each evanflow-* skill. Use when starting a new task and unsure which evanflow skill applies, or when you need to ground reasoning in the shared vocabulary.

2026-04-27408

evanflow-brainstorming.md

from "evanklem/evanflow"

Clarify intent, propose 2-3 approaches, embedded grill to stress-test the chosen path. Use before any creative work — new features, components, behavior changes, design questions. Mockup-only requests use mockup quick-mode (no spec/plan ceremony).

2026-04-27408

evanflow-coder-overseer.md

from "evanklem/evanflow"

Orchestrate parallel implementation with coder/overseer pairs. Coders implement decomposed tasks using evanflow-tdd; overseers review each coder's output for bugs, gaps, errors, AND cohesion violations against a shared contract. A final integration overseer checks cross-coder cohesion. Use for plans with 3+ truly independent tasks that share an interface contract.

2026-04-27408

evanflow-compact.md

from "evanklem/evanflow"

Manage long-session context to prevent drift and degradation. Strategies for proactive summarization, branch isolation, and /clear decisions. Invoke when context feels heavy, when accuracy starts slipping, or proactively after a major phase boundary. Addresses the

2026-04-27408

evanflow-debug.md

from "evanklem/evanflow"

Root-cause discipline for bugs, test failures, and unexpected behavior. Embedded grill on the hypothesis before writing fix code. Use when encountering any bug, failing test, or behavior that doesn't match expectation.

2026-04-27408

package.json

"author": "evanklem"

"repository": "evanklem/evanflow"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ソフトウェア品質保証アナリスト・テスターコンピュータ・数学職15-1253L4

name	evanflow-tdd
description	Vertical-slice TDD for any production code. One test → one impl → repeat. Tests verify behavior through public interfaces, not internals. Use when implementing any feature, bugfix, or behavior change.

EvanFlow: TDD

Vocabulary

See evanflow meta-skill. Key terms: vertical slice, behavior through public interface, deep module.

Core Principle

Tests verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't break unless behavior changes.

Good test: "user can perform action X within their weekly rate limit" — describes capability.

Bad test: "calls createX() with status 'QUEUED' then queues a job" — describes mechanics. Renames break it.

Anti-Pattern: Horizontal Slices

DO NOT write all tests first then all implementation. That produces tests of imagined behavior, not actual behavior. They become insensitive to real changes.

DO vertical slices: one test → one implementation → repeat. Each test responds to what you learned from the previous cycle.

When to Use

Any production code change (new feature, bug fix, behavior change, refactor with behavior implications)
All new code in your backend's routers and services
All new code in your frontend that has testable logic (not pure-presentation components)

When to Skip (with explicit user approval)

Throwaway prototypes
Generated code (e.g., database.types.ts)
Configuration files
Pure-presentation React components with no logic

The Flow

1. Embedded Grill — "What to Test"

Before writing any test, confirm with the user:

"Which behaviors matter most? We can't test everything."
"What's the public interface — what will callers actually use?"
"Are there opportunities to make this a deep module (small interface, complex internals)?"
"Where do tests need to integrate with real services (DB, payment provider, email provider) vs. where can we test in isolation?"

Anti-tailoring check (vertical slicing's biggest risk): before each new test, ask:

"Am I pinning behavior the spec/contract names, or am I pinning the impl I've already imagined?"
"Could I write this next test knowing only the public contract, before reading any of the impl I just wrote?"
"If a different impl satisfied the same contract, would this test still pass?"

If the test only makes sense given your specific impl, it's an internals test wearing a behavior costume. Rewrite it against the contract, or drop it.

2. Tracer Bullet

Write ONE test for ONE behavior end-to-end. Prove the path works.

RED:      Write test → run → confirm it fails for the RIGHT reason
GREEN:    Write minimal code → run → confirm it passes
REFACTOR: Clean the impl + the test you just wrote, while it's fresh and green

3. Incremental Loop

For each remaining behavior, repeat the full RED-GREEN-REFACTOR cycle:

RED:      Write next test → fails for the right reason
GREEN:    Minimal code to pass → passes
REFACTOR: Clean before moving on (see checklist below)

Rules:

One test at a time
Only enough code to pass the current test
Don't anticipate future tests
Tests focus on observable behavior, not internals
Never skip REFACTOR. "I'll clean it up later" is how dead code, duplication, and shallow modules accumulate.

4. Per-Cycle Refactor Checklist

After each GREEN, before writing the next failing test, scan the just-touched code:

Duplication — extract if used twice with the same intent (not just structurally similar)
Naming — does the new name match what the code does? Rename now, while the test pins behavior
Deletion test — does the new module/function earn its existence, or did GREEN add bloat?
Deep-module check — small interface hiding the complexity, or shallow wrapper leaking it?
Test cleanliness — does the test still describe behavior crisply? Names, setup, assertion all clear?

Run tests after each refactor step. Never refactor while RED — get to GREEN first.

If a refactor would change behavior, stop: that's a new test, not a refactor.

5. Macro Refactor (deferred to `evanflow-iterate`)

Per-Cycle Checklist

[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Test would survive an internal refactor (rename, restructure)
[ ] Code is minimal for this test
[ ] No speculative features added
[ ] Test fails for the right reason before code is written
[ ] ASSERTION IS CORRECT — see warning below

⚠️ Assertion-Correctness Warning

Before writing any test assertion, verify:

Does this assertion match what the user actually wants? Don't assert on behavior you imagined — assert on behavior the spec/contract names.
Is this the assertion's most-precise form? "result is truthy" is weaker than "result equals 42". Loose assertions catch wrong things and miss right things.
Would this assertion still pass if the code was subtly wrong? Mentally introduce a one-character bug — does the assertion catch it? If not, the assertion is too weak.
Are you asserting on the right field? A common failure: asserting response.status when the meaningful field is response.body.error.
For computed values: did you compute the expected value correctly? Don't trust your own arithmetic — verify by hand or another path.

When in doubt about what to assert, STOP and ask the user rather than guess. An asserted-on-the-wrong-thing test is worse than no test — it provides false confidence.

Hard Rules

Vertical slices only. Never write all tests first.
REFACTOR is per-cycle, not deferred. Every GREEN is followed by a refactor pass on the just-written code, with the fresh test as safety net. Deferring all refactor to the end strips the safety net and is the most common way TDD-shaped code ends up with TDD-shaped scars.
Test behavior, not internals. If a rename breaks a test but behavior didn't change, the test was wrong.
Watch the test fail. If you didn't see RED, you don't know it tests the right thing.
Never auto-commit. TDD cycle is RED-GREEN-REFACTOR, not RED-GREEN-REFACTOR-COMMIT.
Default to real services for integration tests. Mocked databases routinely diverge from production behavior — prefer a test DB unless your project documents a specific exception.

Hand-offs

Tests + impl complete for the task → return to evanflow-executing-plans to mark task done
Discovered the interface is wrong → evanflow-design-interface to redesign
Discovered deeper architectural issue → evanflow-improve-architecture

evanflow-tdd

EvanFlow: TDD

Vocabulary

Core Principle

Anti-Pattern: Horizontal Slices

When to Use

When to Skip (with explicit user approval)

The Flow

1. Embedded Grill — "What to Test"

2. Tracer Bullet

3. Incremental Loop

4. Per-Cycle Refactor Checklist

5. Macro Refactor (deferred to evanflow-iterate)

Per-Cycle Checklist

⚠️ Assertion-Correctness Warning

Hard Rules

Hand-offs

このリポジトリの他の Skills

EvanFlow: TDD

Vocabulary

Core Principle

Anti-Pattern: Horizontal Slices

When to Use

When to Skip (with explicit user approval)

The Flow

1. Embedded Grill — "What to Test"

2. Tracer Bullet

3. Incremental Loop

4. Per-Cycle Refactor Checklist

5. Macro Refactor (deferred to evanflow-iterate)

Per-Cycle Checklist

⚠️ Assertion-Correctness Warning

Hard Rules

Hand-offs

このリポジトリの他の Skills

5. Macro Refactor (deferred to `evanflow-iterate`)

5. Macro Refactor (deferred to `evanflow-iterate`)