一键在 Manus 中运行任何 Skill

testing-practices

星标0

分支0

更新时间2026年6月24日 03:26

Use when deciding what or how to test, or reviewing test quality — adding tests for new logic, judging whether a change needs coverage, or critiquing a test that mocks heavily, reads the clock, hits the network, or asserts on internals. Steers toward testing behavior over implementation, one concept per test, deterministic runs, and covering error paths — not just the happy path. Runs scripts/check_test_hygiene.sh to flag flaky/brittle/falsely-green smells; exits nonzero for a pre-commit hook or CI. Not for running the suite, debugging a failing test, test-tooling/fixture setup, speeding up CI, or red-green TDD (use the TDD skill to write a failing test first).

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

az9713

az9713/skill-best-practices

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

文件资源管理器

2 个文件

SKILL.md

readonly

name

testing-practices

description

Testing Practices

Overview

Claude's default test is a happy-path unit test that mocks everything around the function and asserts the function called its collaborators in a certain order. That test is green, brittle, and proves almost nothing — it locks in the current implementation and stays green when the real behavior breaks. This skill pushes toward tests that earn their keep: they assert observable behavior, they're deterministic, they cover the failure paths, and each one checks a single concept. scripts/check_test_hygiene.sh statically flags the smells that make a suite lie (real clock, network, unseeded randomness, over-mocking) and exits nonzero so it runs in CI.

This is a "what to test and how", not a tutorial on your test framework. It assumes you know how to write an assertion; it tells you which assertions are worth writing.

When to Use

Use this when:

Writing tests for new code, or adding tests to cover a fix.
Reviewing a diff's tests, or a suite feels flaky / always-green / slow.
A change adds logic with no test, or adds a test that mocks heavily, sleeps, reads now(), or pokes at private attributes.

Do NOT use this when:

You need general lint/format/style enforcement — that's code-style.
You want a behavioral critique of the production change — that's adversarial-review.
You're choosing a framework or runner; this skill is framework-agnostic.

Running the hygiene check

scripts/check_test_hygiene.sh             # scan tests/ test/ (or whole tree)
scripts/check_test_hygiene.sh tests/api/  # scan a subset

It scans only test files (test_*.py, *_test.py, *.test.*, *.spec.*). Findings are heuristics — a hit means look, not always defect. Acknowledge a deliberate exception with an inline # test-hygiene: ok on that line. Tune the over-mocking threshold with TEST_HYGIENE_MOCK_LIMIT.

Wire it as a hook or Action

It exits nonzero, so enforcement is one line — pre-commit:

.claude/skills/testing-practices/scripts/check_test_hygiene.sh || exit 1

GitHub Action step:

- run: .claude/skills/testing-practices/scripts/check_test_hygiene.sh

What to test (and how)

Test behavior, not implementation

Assert on what the unit produces or causes given an input — the return value, the emitted event, the row written — not on how it got there. Tests that assert "this private method was called" or reach into private attributes break on every refactor even when behavior is identical, and pass even when behavior is wrong. If you can't test a behavior without inspecting internals, that's a design smell in the code, not a reason to test internals.

One concept per test

Each test pins down a single behavior, and its name says which one (test_refund_rejected_when_already_refunded). Don't pile six unrelated assertions into one test: when it fails you can't tell which behavior broke, and the first failed assert hides the rest. Multiple asserts that together verify one concept (e.g. status code + body of one response) are fine.

Cover the error paths, not just the happy path

The happy path is the path you already know works. The bugs live in the edges: empty input, the timeout, the duplicate, the permission denied, the malformed payload, the boundary value. For every happy-path test, ask "what's the failure mode here?" and test that the code fails correctly — right exception, right message, no partial write, no swallow.

Keep tests deterministic

A test that passes or fails depending on the wall clock, the network, or an unseeded RNG is worse than no test — it trains the team to ignore red. Inject a fixed time instead of reading now(). Seed or fix randomness. Mock the network boundary. Wait on a condition, never sleep(). The hygiene script flags all of these.

Gotchas

Testing implementation details. Asserting on call order, private methods, or internal state couples the test to today's code. Refactor → red test → you "fix" the test → you've validated nothing. Assert observable behavior only.
Over-mocking hides real breakage. Mock the function under test's boundaries (network, DB, clock), not its neighbors. When you mock the collaborator you're actually integrating with, the test asserts your wiring matches your mock — and stays green when the real collaborator's contract changes. High mock density (the script's per-file count) is the tell.
Many asserts, one test, no diagnosis. When a 10-assert test fails on assert #2, you learn nothing about asserts #3–10 and can't name the broken behavior. Split by concept.
Hidden non-determinism. datetime.now(), time.time(), random() without a seed, uuid4(), real HTTP, and sleep()-based timing all make tests flaky or irreproducible. The failure shows up at 2am in CI, not on your machine. Inject the clock, seed the RNG, mock the boundary, wait on events.
Only the happy path. A suite that's all happy-path gives false confidence: coverage looks high, but every interesting failure mode is untested. Error paths are where the value is.
Asserting the mock instead of the result. mock.assert_called_with(...) tests that you called your own mock — tautological. Prefer asserting the real output or the real side effect the behavior produces.
Snapshot tests as a behavior substitute. A blob snapshot passes review with one glance and then "updates" silently whenever output drifts. Use targeted assertions on the parts that matter.

Files

SKILL.md — this file; what to test, how, and the smells to avoid.
scripts/check_test_hygiene.sh — static linter for test files: flags real clock, network, unseeded randomness, sleeps, and over-mocking density. Exits nonzero; usable as a pre-commit hook or CI step.

同仓库更多 Skills

同仓库

adversarial-review

az9713/skill-best-practices

Use when a change is written and "looks done" but has not had a hostile second pass before merge — especially diffs touching auth, money, migrations, concurrency, or anything the author is quietly unsure about. Spawns a fresh-eyes reviewer subagent that sees ONLY the diff and the spec, collects findings, drives fixes, and re-dispatches until findings degrade to nitpicks. Reach for this instead of self-reviewing; the author is the worst reviewer of their own diff.

2026-06-240

babysit-pr

az9713/skill-best-practices

Use when a PR is open and green-but-blocked, or red on CI for reasons that smell like flake — a timed-out test runner, a transient network 500 in a setup step, a check that passed locally but failed in CI. Reach for this whenever someone says "this PR keeps failing CI but the test is flaky", "can you babysit this PR to merge", "it's just a flaky check, retry it", or wants a PR shepherded through retries, conflict resolution, and auto-merge without sitting on it manually. Prefer this over hand-clicking "Re-run failed jobs" in the GitHub UI, which gives up no signal on flaky-vs-real and forgets to enable auto-merge.

2026-06-240

billing-lib

az9713/skill-best-practices

Use when writing or reviewing code that meters API token usage, bills accounts, issues invoices, applies credit grants, or computes balances with the internal `billing` library — especially around retries, mid-cycle plan changes, cache-read vs cache-write token pricing, or any place where double-billing or rounding drift would be a problem.

2026-06-240

checkout-verifier

az9713/skill-best-practices

Use when an API-credits checkout or paid-plan upgrade needs to be proven end-to-end against Stripe test mode — confirming a card charge actually creates the invoice and subscription in the right state, reproducing a "I paid but my credits didn't show up" report, checking that a declined or 3DS card fails the way the UI claims, or wiring a billing smoke test into CI so a checkout regression is caught before a customer's money is.

2026-06-240

cherry-pick-prod

az9713/skill-best-practices

Use when a specific fix that's already on main needs to land on a production/release branch without dragging along everything else — a hotfix to backport, a "cherry-pick this commit onto release-2.4", a "we need just that one PR on prod" request. Reach for this whenever someone wants to port one or a few commits to a release branch and open a PR for it, especially before doing it by hand in their main checkout, which pollutes their working tree and routinely leaves conflict markers committed or loses the original commit's provenance.

2026-06-240

code-style

az9713/skill-best-practices

Use when writing or editing code in this org's Python or JS/TS, especially before committing or opening a PR — and proactively the moment a diff adds an import, an except/catch, or any logging. Enforces the style rules Claude gets wrong by default: import grouping, error-wrapping (no bare except / empty catch), no leftover debug prints, explicit over clever. Runs scripts/check_style.sh (ruff, mypy --strict, eslint + grep guards) which exits nonzero so it drops into a pre-commit hook or CI.

2026-06-240

name

testing-practices

description

Testing Practices

Overview

This is a "what to test and how", not a tutorial on your test framework. It assumes you know how to write an assertion; it tells you which assertions are worth writing.

When to Use

Use this when:

Writing tests for new code, or adding tests to cover a fix.
Reviewing a diff's tests, or a suite feels flaky / always-green / slow.
A change adds logic with no test, or adds a test that mocks heavily, sleeps, reads now(), or pokes at private attributes.

Do NOT use this when:

You need general lint/format/style enforcement — that's code-style.
You want a behavioral critique of the production change — that's adversarial-review.
You're choosing a framework or runner; this skill is framework-agnostic.

Running the hygiene check

scripts/check_test_hygiene.sh             # scan tests/ test/ (or whole tree)
scripts/check_test_hygiene.sh tests/api/  # scan a subset

Wire it as a hook or Action

It exits nonzero, so enforcement is one line — pre-commit:

.claude/skills/testing-practices/scripts/check_test_hygiene.sh || exit 1

GitHub Action step:

- run: .claude/skills/testing-practices/scripts/check_test_hygiene.sh

What to test (and how)

Test behavior, not implementation

One concept per test

Cover the error paths, not just the happy path

Keep tests deterministic

Gotchas

Testing implementation details. Asserting on call order, private methods, or internal state couples the test to today's code. Refactor → red test → you "fix" the test → you've validated nothing. Assert observable behavior only.
Over-mocking hides real breakage. Mock the function under test's boundaries (network, DB, clock), not its neighbors. When you mock the collaborator you're actually integrating with, the test asserts your wiring matches your mock — and stays green when the real collaborator's contract changes. High mock density (the script's per-file count) is the tell.
Many asserts, one test, no diagnosis. When a 10-assert test fails on assert #2, you learn nothing about asserts #3–10 and can't name the broken behavior. Split by concept.
Hidden non-determinism. datetime.now(), time.time(), random() without a seed, uuid4(), real HTTP, and sleep()-based timing all make tests flaky or irreproducible. The failure shows up at 2am in CI, not on your machine. Inject the clock, seed the RNG, mock the boundary, wait on events.
Only the happy path. A suite that's all happy-path gives false confidence: coverage looks high, but every interesting failure mode is untested. Error paths are where the value is.
Asserting the mock instead of the result. mock.assert_called_with(...) tests that you called your own mock — tautological. Prefer asserting the real output or the real side effect the behavior produces.
Snapshot tests as a behavior substitute. A blob snapshot passes review with one glance and then "updates" silently whenever output drifts. Use targeted assertions on the parts that matter.

Files

SKILL.md — this file; what to test, how, and the smells to avoid.
scripts/check_test_hygiene.sh — static linter for test files: flags real clock, network, unseeded randomness, sleeps, and over-mocking density. Exits nonzero; usable as a pre-commit hook or CI step.