ワンクリックでManusで任意のスキルを実行

reviewing-test-design

スター5

フォーク1

更新日2026年5月15日 18:04

Evaluates test quality using Dave Farley's 8 properties. Use when reviewing tests, assessing test suite quality, or analyzing test effectiveness against TDD best practices.

インストール

Codex または Claude でインストールこの Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。

Manusで実行

ソース

bnadlerjr

bnadlerjr/dotfiles

GitHub リポジトリを開く Creator のリポジトリを見る

ダウンロード

Manusで実行

関連職種SOC

SOC 職業分類に基づく

ソフトウェア品質保証アナリスト・テスターコンピュータ・数学職·SOC 15-1253

ファイルエクスプローラー

5 ファイル

SKILL.md

readonly

このリポジトリの他の Skills

同じリポジトリ

planning-tdd

bnadlerjr/dotfiles

Creates TDD implementation plans where tests ARE the plan. Specifies test specs and structural context per phase — implementation emerges during execution, not planning. Verification is fully automated; no manual verification step. Use when planning features test-first, creating TDD plans from design artifacts, or when the user asks for a test-driven implementation plan. NOT for executing a plan test-first (the Red-Green-Refactor cycle during implementation) — that is practicing-tdd.

2026-06-235

decomposing-epics

bnadlerjr/dotfiles

Breaks an epic into an ordered backlog of vertical, sprint-sized user stories. Use when splitting an epic into stories, planning a feature's story backlog, or when a story feels too large but isn't ready for task breakdown. Produces 4-8 demoable stories, each independently shippable and grounded in user-visible behavior. NOT for sub-story implementation slicing (see slicing-elephant-carpaccio) or for writing one story's full BDD spec (see writing-agile-stories).

2026-06-235

slicing-elephant-carpaccio

bnadlerjr/dotfiles

Slices a single feature or story into ultra-thin vertical increments (minutes-to-hours each) using Alistair Cockburn's Elephant Carpaccio methodology. Use during implementation planning when one story or feature is too large to build in a session and you need demoable increments across layers. Produces an ordered backlog of 10-20 thin slices, each independently working, testable, and demoable. NOT for splitting an epic into sprint-sized stories (see decomposing-epics).

2026-06-235

writing-dev-tasks

bnadlerjr/dotfiles

Write well-scoped, verifiable development tasks for non-user-facing work — refactors, test work, and dependency/tooling changes. Use to "write a dev task", create a "refactor task", a "tech-debt ticket", a "test task", or a "dependency upgrade task"; to capture "non-user-facing work" or a "technical task"; or to write a "definition of done for a refactor". Produces an implementation-focused task with a verifiable Definition of Done. For user-facing behavior, use writing-agile-stories instead.

2026-06-205

creating-agent-skills

bnadlerjr/dotfiles

Expert guidance for creating, writing, and refining Claude Code Skills. Use when working with SKILL.md files, authoring new skills, improving existing skills, or understanding skill structure and best practices.

2026-06-015

drafting-stack-pr-comments

bnadlerjr/dotfiles

Examines a git-machete stack of dependent branches and drafts preemptive PR comments warning reviewers that code in an earlier PR is already refactored, renamed, moved, deleted, or fixed later in the stack. Output is DRAFT ONLY — the user pastes it manually; the skill never posts to GitHub. Use for stacked PRs, a git-machete stack, when reviewers comment on code that is "already fixed/refactored/handled later in the stack", to preempt reviewer comments on superseded code, to draft PR comments for a branch stack, or to find code in an earlier branch that a later branch supersedes.

2026-06-015

name	reviewing-test-design
description	Evaluates test quality using Dave Farley's 8 properties. Use when reviewing tests, assessing test suite quality, or analyzing test effectiveness against TDD best practices.
argument-hint	<test file, directory, or glob pattern>
context	fork
agent	Explore
model	sonnet

Reviewing Test Design

Evaluate test quality using Dave Farley's 8 properties of good tests.

Quick Start

/reviewing-test-design test/models/user_test.exs

Read the test file(s) specified in $ARGUMENTS
Score each of the 8 properties on a 1-10 scale
Invoke scripts/farley-score.sh to compute the weighted Farley Score
Provide prioritized recommendations

Your Expertise

You specialize in evaluating test quality using Dave Farley's testing principles. You understand that great tests are not just about code coverage, but about creating maintainable, reliable, and meaningful verification of system behavior.

Inputs

$ARGUMENTS: test file paths, directories, or glob patterns to review

If $ARGUMENTS is empty, ask which test files to review.

Review Process

Read the tests thoroughly before examining implementation code
Load references:
- Always load anti-patterns.md — universal patterns (mirror tests, tautological assertions) applicable in every stack
- If the tests use Elixir/Ruby/JS/TS/Python, additionally load language-patterns.md for stack-specific red flags
- If the review is likely to recommend removals or refactoring, load preservation.md for guardrails and action-summary format
Evaluate each property independently with specific evidence
Provide concrete examples from the code for each score
Suggest specific improvements with code examples where helpful
Invoke scripts/farley-score.sh to compute the Farley Score and present it with the breakdown
Prioritize recommendations by impact

Evaluation Framework

Score each test file or test suite against these eight properties on a 1-10 scale.

1. Understandable (U)

10: Tests read like specifications; behavior is crystal clear without reading implementation
7-9: Tests are clear with minor ambiguities; intent is mostly obvious
4-6: Tests require some code inspection to understand purpose
1-3: Tests are cryptic; heavy reliance on implementation details

2. Maintainable (M)

10: Tests use proper abstractions; changes to implementation rarely break tests
7-9: Good separation of concerns; occasional brittleness
4-6: Some coupling to implementation; moderate refactoring pain
1-3: Tightly coupled to implementation; tests break with minor changes

3. Repeatable (R)

10: Tests are deterministic; same result every time, anywhere
7-9: Rarely flaky; minimal environmental dependencies
4-6: Occasional flakiness; some timing or state dependencies
1-3: Frequently inconsistent; relies on external state or timing

4. Atomic (A)

10: Tests are completely isolated; no shared state; parallelizable
7-9: Mostly isolated; minor dependencies between tests
4-6: Some shared state; test order sometimes matters
1-3: Heavy interdependencies; tests must run in specific order

5. Necessary (N)

Before scoring, ask each test: what production defect class does this catch? Name the smallest possible bug that would cause this test to fail but not any other test.

10: Every test catches a distinct, real defect class; no test could be deleted without losing safety.
7-9: Most tests catch real defects; minor redundancy with higher-level tests.
4-6: Some tests' only catchable defect is "a literal changed in one place but not the matching place." Tautological — adds maintenance cost without safety.
1-3: Multiple tests are mirrors of source literals, or assert facts already guaranteed by the type system / compiler.

A "pins a contract" justification is valid only when both conditions hold:

The contract is consumed by a system outside this codebase (an external API client, a serialized schema, a published artifact), AND
No higher-level test (integration, end-to-end, snapshot) already pins it.

Tests that assert result.field == "literal that appears verbatim in the source under test" with no transformation between source and assertion are tautological. Score ≤4 regardless of how the test names itself.

6. Granular (G)

10: Each test asserts one thing; failures pinpoint exact issues
7-9: Tests are focused; occasional multiple assertions with clear purpose
4-6: Tests cover multiple behaviors; failure diagnosis takes effort
1-3: Tests are sprawling; failures require significant investigation

7. Fast (F)

10: Tests execute in milliseconds; entire suite runs quickly
7-9: Tests are quick; minor optimization opportunities
4-6: Some slow tests; suite takes noticeable time
1-3: Tests are slow; significant impact on development flow

8. First (T — for TDD)

10: Clear evidence of test-first approach; tests drive design
7-9: Likely written test-first; good design influence
4-6: Unclear if test-first; tests feel like afterthoughts
1-3: Clearly written after code; tests follow implementation structure

The Farley Score

Do not compute this by hand. Invoke the bundled script with the 8 property scores:

scripts/farley-score.sh -u <U> -m <M> -r <R> -a <A> -n <N> -g <G> -f <F> -t <T>

The script prints <score> <rating> on stdout (e.g. 8.3 Excellent). Use those values directly in the Output Format section below.

If the script exits non-zero, emit the review as usual but replace the score line with Farley Score: UNAVAILABLE — <stderr>. Do not fall back to computing the score manually; the whole point of the script is that the number is trustworthy.

Formula (reference only — executed by `scripts/farley-score.sh`)

Farley Score = (U*1.5 + M*1.5 + R*1.25 + A*1.0 + N*1.0 + G*1.0 + F*0.75 + T*1.0) / 9

Weight rationale:

Understandable (1.5x): Tests as documentation is paramount
Maintainable (1.5x): Long-term value depends on maintainability
Repeatable (1.25x): Reliability is critical for trust
Atomic, Necessary, Granular, First (1.0x): Core principles equally important
Fast (0.75x): Important but can be optimized later

Score interpretation:

Range	Rating	Meaning
9.0-10.0	Exemplary	Model for the industry
7.5-8.9	Excellent	High quality with minor improvements possible
6.0-7.4	Good	Solid foundation with clear improvement opportunities
4.5-5.9	Fair	Functional but needs significant attention
3.0-4.4	Poor	Limited value; major refactoring needed
Below 3.0	Critical	Tests may be harmful; consider rewriting

Output Format

## Test Design Review: [File/Suite Name]

### Property Scores

| Property | Score | Evidence |
|----------|-------|----------|
| Understandable | X/10 | [Brief justification] |
| Maintainable | X/10 | [Brief justification] |
| Repeatable | X/10 | [Brief justification] |
| Atomic | X/10 | [Brief justification] |
| Necessary | X/10 | [Brief justification] |
| Granular | X/10 | [Brief justification] |
| Fast | X/10 | [Brief justification] |
| First (TDD) | X/10 | [Brief justification] |

### Farley Score: X.X/10 [Rating]

### Detailed Analysis

[Expand on each property with specific code examples]

### Top Recommendations

1. [Highest impact improvement]
2. [Second priority]
3. [Third priority]

### Reference

This review is based on Dave Farley's Properties of Good Tests:
https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/

Guidelines

Be constructive and specific; vague feedback helps no one
Acknowledge what's done well before critiquing
Provide actionable suggestions, not just problems
Consider the context and constraints of the project
When uncertain about TDD adherence, note it and score conservatively
If reviewing multiple test files, provide individual scores per file (no aggregate)
Always include the reference link to Dave Farley's article in your output

References

anti-patterns.md — universal test anti-patterns (mirror/tautological tests) that apply in every stack. Always load.
language-patterns.md — concrete red flags in Elixir/Ruby/JS/Python tests that map to Farley property scores. Load when reviewing tests in these stacks.
preservation.md — guardrails against over-culling valuable tests; mocking hygiene; action-summary output format. Load when the review is likely to recommend removals or refactoring.

reviewing-test-design

このリポジトリの他の Skills

Reviewing Test Design

Quick Start

Your Expertise

Inputs

Review Process

Evaluation Framework

1. Understandable (U)

2. Maintainable (M)

3. Repeatable (R)

4. Atomic (A)

5. Necessary (N)

6. Granular (G)

7. Fast (F)

8. First (T — for TDD)

The Farley Score

Formula (reference only — executed by scripts/farley-score.sh)

Output Format

Guidelines

References

Reviewing Test Design

Quick Start

Your Expertise

Inputs

Review Process

Evaluation Framework

1. Understandable (U)

2. Maintainable (M)

3. Repeatable (R)

4. Atomic (A)

5. Necessary (N)

6. Granular (G)

7. Fast (F)

8. First (T — for TDD)

The Farley Score

Formula (reference only — executed by scripts/farley-score.sh)

Output Format

Guidelines

References

このリポジトリの他の Skills

Formula (reference only — executed by `scripts/farley-score.sh`)

Formula (reference only — executed by `scripts/farley-score.sh`)