一键在 Manus 中运行任何 Skill

test-coverage

星标14

分支1

更新时间2026年6月24日 09:09

Systematically audit, improve, and enforce test coverage, and gate test quality in CI — across any ecosystem (TypeScript, Python, Go, Rust). Use to raise coverage, set thresholds, audit gaps, manage exclusions, merge reports, wire coverage into CI/hooks, or add mutation testing and fuzzing as quality gates. Composes with the hk skill for pre-commit enforcement. For how to design and write good tests — property-based, snapshot/approval, differential, contract, flaky-test handling — use the testing skill.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

connorads

connorads/dotfiles

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

文件资源管理器

11 个文件

SKILL.md

readonly

同仓库更多 Skills

同仓库

tmux-plugin-fork-updates

connorads/dotfiles

Safely review, sync, and locally update forked tmux plugins. Use whenever the user mentions tmux-upstream, tmux plugin forks, `prefix + U`, a `connorads/<plugin>` fork being commits behind upstream, asks whether a tmux plugin update is dodgy/compromised/safe, or asks to sync/update a forked tmux plugin. Default to review-only and ask before syncing unless the user explicitly requested automatic safe sync.

2026-06-2614

mechanical-enforcement

connorads/dotfiles

Catalogue of preferred linter rules, TypeScript flags, clippy thresholds, and architectural boundary checks for making bug classes and design drift mechanically impossible. Use when setting up linting in a new project, hardening an existing project, responding to a class of bug by encoding a rule, or deciding which linter to reach for on a given stack. Pairs with the `hk` skill which handles wiring hooks.

2026-06-2514

testing

connorads/dotfiles

Design and write effective tests for behavioural changes, bug fixes, and refactors. Use when choosing a test layer, practising TDD, picking doubles/fakes, reducing brittle or flaky tests, refactoring safely, or applying property-based, snapshot/approval, differential/metamorphic, or contract testing. For coverage, thresholds, mutation testing, fuzzing, and CI/hook enforcement, use the test-coverage skill.

2026-06-2514

homebrew-formula-authoring

connorads/dotfiles

Create, update, validate, and submit Homebrew formulae (homebrew-core, built from source). Use when the user mentions a Homebrew formula, Homebrew/homebrew-core, adding/updating a formula, brew create, building from source, a build system in a brew context (cargo/rust, go, cmake, meson, autotools/configure, make, python virtualenv, node/npm, ruby gem), resource blocks, depends_on/keg_only/uses_from_macos, the mandatory test do block, bottles, livecheck, brew bump-formula-pr, or when asked to run brew audit --new / brew test / brew style for a formula. For macOS GUI apps and prebuilt binaries use the homebrew-cask-authoring skill instead.

2026-06-2414

update-vendored-skills

connorads/dotfiles

Safely refresh the vendored third-party agent skills in this dotfiles repo. Use whenever the user wants to update, refresh, upgrade, or re-pull vendored skills (`skills update`), or asks to check whether a skill refresh is safe / dodgy / compromised before committing. `skills update` is an unauthenticated git clone with no quarantine, no signature, and no scan — and skill files are instructions injected into every agent session — so this skill gates each refresh by reading the diff and only auto-commits trusted-source, clean-diff updates.

2026-06-2214

claude-api

connorads/dotfiles

Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration. TRIGGER — read BEFORE opening the target file; don't skip because it "looks like a one-liner" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Fable, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens). SKIP only when another provider is being worked on (overrides all triggers): OpenAI/GPT/Gemini/Llama/Mistral/Cohere/Ollama named in the query; OR `grep -rE 'openai|langchain_openai|google.generativeai|genai|mistralai|cohere|ollama'` over the project hits (run this grep FIRST

2026-06-2214

name

test-coverage

description

Test Coverage

Audit gaps, write targeted tests, enforce thresholds — across any ecosystem.

Mental Model

The testing pyramid encodes an economic truth: each tier tests what only it can test.

Tier	Tests	Cost to write	Cost to run
Unit	Pure functions, domain logic, validation, parsing	Low	Milliseconds
Integration	Database queries, API boundaries, access control, service interactions	Medium	Seconds
Component	Rendered UI in a real browser, user interactions, visual states	Medium	Seconds
E2E	Full user flows across the entire stack	High	Minutes

Coverage is a regression gate, not a quality metric. High coverage with bad tests is worse than moderate coverage with good tests. The goal is: new code cannot silently skip tests.

Exclusions are architecture, not exceptions. Every exclusion documents a deliberate decision about where code is tested. An exclusion at one tier should have coverage at another.

Decision Tree

Start here. Follow the branch that matches the current state.

Is there any coverage tooling configured?
├── No → Bootstrap (below)
└── Yes
    ├── Coverage below target? → Audit & Improve (below)
    ├── Coverage adequate but not enforced? → Enforce (below)
    ├── Coverage enforced, writing new code? → Write Tests for New Code (below)
    └── Coverage high but tests feel weak / want to raise quality? → Test Quality: Beyond Coverage (below)

Bootstrap: Setting Up Coverage from Scratch

1. Detect the ecosystem

Check for project markers: package.json, go.mod, Cargo.toml, pyproject.toml, setup.py, *.csproj. See ecosystem patterns for tool recommendations per language.

2. Create tiered configs

Each test tier gets its own configuration file with targeted include/exclude patterns. This prevents slow integration tests from blocking fast unit test feedback.

Key principles:

Each tier has a separate include pattern matching only its source files
Each tier has a separate coverage output directory (avoids conflicts)
CI vs local reporter selection: text-summary locally, full HTML/JSON/LCOV in CI

TypeScript/Vitest example structure:

vitest.unit.config.mts    → tests/unit/**/*.unit.spec.ts    → coverage/unit/
vitest.int.config.mts     → tests/int/**/*.int.spec.ts      → coverage/int/
vitest.browser.config.mts → tests/components/**/*.spec.tsx   → coverage/components/

Python example:

pytest -m unit --cov --cov-report=html:coverage/unit
pytest -m integration --cov --cov-report=html:coverage/int

3. Set initial thresholds

Run coverage once, note the baseline. Set thresholds at the current level — this prevents regression while you improve.

# Example: start where you are
thresholds: { lines: 72 }  # measured baseline

Then ratchet up as you add tests. Never ratchet down. See enforcement for the full ratcheting strategy.

4. Add coverage scripts

Create per-tier scripts in your project manifest:

{
  "test:unit": "vitest run --config ./vitest.unit.config.mts",
  "test:unit:coverage": "vitest run --coverage --config ./vitest.unit.config.mts",
  "test:int": "vitest run --config ./vitest.int.config.mts",
  "test:int:coverage": "vitest run --coverage --config ./vitest.int.config.mts",
  "test:components": "vitest run --coverage --config ./vitest.browser.config.mts",
  "test:e2e": "playwright test",
  "test": "pnpm test:unit && pnpm test:int && pnpm test:components && pnpm test:e2e"
}

Audit & Improve: Closing Coverage Gaps

Phase 1: Audit

Run coverage for each tier and examine the output.

# Run with coverage, examine the HTML report or text output
<runner> --coverage

Identify three categories:

Untested files — no coverage at all (highest priority)
Untested branches — code paths never exercised
Untested functions — declared but never called in tests

Phase 2: Classify each gap

For every uncovered file or function, ask:

Question	If yes	If no
Business logic or domain rules?	Unit tests (highest priority)	Continue
Access control or authorisation?	Integration tests	Continue
Data validation or parsing?	Unit tests	Continue
API endpoint or mutation?	Integration tests	Continue
UI component with logic?	Component tests	Continue
Full user flow?	E2E tests	Continue
Can it run in the test environment?	Write tests	Document exclusion
Auto-generated code?	Exclude with comment	Write tests
Thin wrapper around tested library?	Consider excluding	Write tests

Phase 3: Prioritise

Triage order (highest value first):

Domain logic and business rules (unit)
Access control and authorisation (integration)
Data validation and input parsing (unit)
API endpoints and mutations (integration)
UI components with conditional logic (component)
Async/server-rendered components (E2E)
Configuration and wiring (tested implicitly by higher tiers)

Phase 4: Write tests

For each gap, follow the appropriate tier's patterns. Test expected behaviour through the public API, not implementation details.

Unit tests: Pure input → output. No database, no network, no filesystem.

describe('slugify', () => {
  it('converts spaces to hyphens', () => {
    expect(slugify('hello world')).toBe('hello-world')
  })
  it('handles empty string', () => {
    expect(slugify('')).toBe('')
  })
})

Integration tests: Real database, real service boundaries, no mocks for things you own.

it('enforces access control on draft posts', async () => {
  const result = await payload.find({
    collection: 'posts',
    where: { _status: { equals: 'draft' } },
    overrideAccess: false,
    user: anonymousUser,
  })
  expect(result.docs).toHaveLength(0)
})

Component tests: Real browser, real DOM queries (accessibility-first via testing-library).

it('renders film title and year', () => {
  render(<FilmCard film={mockFilm} />)
  expect(screen.getByText('Film Title')).toBeInTheDocument()
  expect(screen.getByText('2024')).toBeInTheDocument()
})

E2E tests: Full user flows, real navigation, real network.

test('user can submit a form', async ({ page }) => {
  await page.goto('/submit')
  await page.fill('[name="title"]', 'My Film')
  await page.click('button[type="submit"]')
  await expect(page).toHaveURL(/\/confirmation/)
})

See ecosystem patterns for language-specific runner syntax and config examples.

Enforce: Wiring Coverage into Hooks and CI

Pre-commit (composes with hk)

If using the hk skill, add coverage test steps to hk.pkl:

["test-unit"] {
  check = "scripts/quiet-on-success.sh pnpm test:unit:coverage"
}
["test-int"] {
  check = "scripts/quiet-on-success.sh pnpm test:int:coverage"
  depends = List("test-unit")
}

Key principles:

Coverage thresholds live in the test config, not in hook config
E2E tests are too slow for pre-commit — run in CI or manually
Order tiers by speed: unit first (fastest fail), then integration, then components
Wrap in quiet-on-success so passing tests produce no output

CI

Run all tiers with coverage in CI. Upload per-tier reports separately for visibility.

- name: Unit tests
  run: pnpm test:unit:coverage
- name: Integration tests
  run: pnpm test:int:coverage
- name: E2E tests
  run: pnpm test:e2e

Ratcheting

For projects not yet at target:

Measure current coverage
Set threshold at current level
After each improvement, bump the threshold
Never lower it

See enforcement for detailed CI patterns, PR checks, and ratcheting workflow.

Write Tests for New Code

When adding features to a codebase with established coverage:

Identify the tier: What kind of code are you writing? Match to the classification table above
Write tests first (TDD): Test the expected behaviour before implementing
Run coverage locally: --coverage for the relevant tier
Handle exclusions: If code genuinely cannot be tested at this tier, document why and ensure coverage exists at another tier
Verify thresholds pass: Pre-commit hooks catch regressions, but check early

Cross-tier exclusion pattern

Every exclusion at one tier names the tier that provides coverage:

// Unit config excludes:
// Cross-tier: Service layer - requires database runtime - tested via integration tests
"src/domain/**/service.ts",

// Integration config excludes:
// Cross-tier: React components - requires browser context - tested via component + E2E tests
"src/components/**",

See coverage exclusions for the full exclusion taxonomy and documentation format.

Test Quality: Beyond Coverage

Coverage is necessary but not sufficient. It tells you which lines ran, never whether a test would fail when the code is wrong. A suite can hit 100% and assert nothing. This section is what to reach for once coverage is high and you need evidence the tests actually catch bugs.

Why a coverage number is not a quality target

Coverage is weakly correlated with effectiveness. Controlling for test-suite size, coverage is a poor predictor of fault-detection ability (Inozemtseva & Holmes, Coverage Is Not Strongly Correlated With Test Suite Effectiveness, ICSE 2014). High coverage is consistent with a near-useless suite.
Goodhart's law. "When a measure becomes a target, it ceases to be a good measure." Mandating a coverage percentage incentivises assertion-free tests, deleted edge cases, and excluded hard files — activity that raises the number while lowering quality. (See Seemann, Code coverage is a useless target measure.)
100% is a smell, not a goal. Use coverage to find untested code, not as a KPI. Well-tested code tends to land in the 80–90s naturally; a hard 100% mandate signals gaming (Fowler, TestCoverage).

Practical stance for this skill: keep coverage as the regression gate (the Enforce section), and use the techniques below to measure and raise quality. The single most useful quality signal is the oracle gap — high coverage paired with low mutation score flags files where weak tests execute important code (see mutation testing).

Which technique, when

This skill owns the techniques you measure and gate in CI (the left two columns below). The test-design techniques (when/how to write a good test) live in the testing skill — load it for those; this skill only points at them.

Situation	Reach for	Home
Coverage high but unsure tests assert enough	Mutation testing	mutation-testing.md — here
Code ingests untrusted/byte input; want crash-finding	Coverage-guided fuzzing	fuzzing.md — here
Function has a statable invariant / round-trip / model	Property-based testing	`testing` skill
No reliable oracle (compilers, ML, numeric, renderers)	Differential / metamorphic	`testing` skill
Verifying serialised output / rendered UI	Snapshot / approval (carefully)	`testing` skill
Service boundary between teams/repos	Contract testing	`testing` skill
Tests pass but assert nothing	Assertion-density check	`testing` skill
Tests fail intermittently	Flaky-test detection & quarantine	`testing` skill

The two quality metrics you can gate on each have a dedicated reference here with per-ecosystem tooling and CI/diff enforcement: mutation testing (does the suite detect changes — the strongest quality signal, and the rigorous version of "do my assertions matter") and fuzzing (coverage-guided crash-finding). The oracle gap above is computed from mutation score.

For the design side — property-based testing (invariants over generated inputs, shrinking, stateful/model-based), snapshot/approval pitfalls, differential & metamorphic testing, assertion density, and flaky-test management — see the testing skill (../testing/SKILL.md). That's where "how to write the test" lives; this skill is where "how to measure and enforce it" lives.

Enforcing quality techniques in CI/hooks

These are slower than unit tests — keep them fast and actionable the same way coverage stays fast:

Diff-scoped: mutate/fuzz only changed code (mutation --in-diff/--incremental/--git-diff-lines; see refs).
Time-boxed: fuzzing runs a fixed budget (e.g. 5 min smoke test), never open-ended in PR CI.
Capped & separated: cap mutants per file; run mutation/fuzz as a dedicated CI job after the suite passes, not in the fast pre-commit path. PBT runs inside the unit tier with a bounded example count.
Thresholds live in the tool, per the Enforce-section boundary — break (Stryker), --min-msi (go-mutesting), exit-code 2 (cargo-mutants), numRuns (fast-check). hk/CI just runs the command and checks the exit code.

Test Organisation Patterns

Directory structure

tests/
  unit/          *.unit.spec.ts       Pure functions, domain logic
  int/           *.int.spec.ts        Database, API, access control
  components/    *.browser.spec.tsx   Rendered UI in real browser
  e2e/           *.e2e.spec.ts        Full user flows
  fixtures/      index.ts             Shared test data factories
  setup/         Per-tier setup files (DB init, browser cleanup)

Naming conventions

Suffix encodes the tier — config include patterns use these suffixes for zero-ambiguity matching:

Tier	Suffix	Example
Unit	`.unit.spec.ts`	`slugify.unit.spec.ts`
Integration	`.int.spec.ts`	`films.int.spec.ts`
Component	`.browser.spec.tsx`	`FilmCard.browser.spec.tsx`
E2E	`.e2e.spec.ts`	`auth.e2e.spec.ts`

Test data factories

Use factory functions with auto-incrementing counters for unique identifiers:

let counter = 0
function createTestUser(overrides = {}) {
  counter++
  return {
    email: `test-${counter}@example.com`,
    name: `Test User ${counter}`,
    ...overrides,
  }
}

Counter-based (not random) for deterministic debugging. Reset between test runs if needed.

Mock boundaries

Do mock: External APIs, third-party SDKs, environment-specific runtimes
Do not mock: Code you own — test through the public API
Database: Use a real local database for integration tests (SQLite, test containers)
Browser: Use a real browser for component tests (Playwright, Vitest browser mode)
Server-side imports: Stub server-only modules when testing in browser context

Coverage Providers: Quick Reference

Provider	Environment	When to use	Limitations
v8	Node.js	Unit, integration tests	Not supported in browser mode
Istanbul	Browser	Component tests	Ignore comments may not survive bundling
c8	Node.js CLI	Standalone v8 wrapper	Alternative to built-in coverage
coverage.py	Python	All tiers via pytest-cov	Requires source mapping for packages
go cover	Go	Built-in, all tiers	Per-package profiles need merging
tarpaulin	Rust	Cargo integration	May miss some async code paths
llvm-cov	Rust	Higher accuracy	Requires nightly or specific toolchain
lcov	Any	Merging multi-tier reports	Format standard, not a provider

Gotchas

Issue	Fix
v8 undercounts arrow functions	Lower `functions` threshold or restructure code
Istanbul ignore comments stripped by bundler	Use file-level exclusions in config instead
Concurrent DB writes in integration tests	Disable parallelism, use single worker
Coverage directories conflict across tiers	Separate `reportsDirectory` per tier config
E2E tests too slow for pre-commit	Run in CI only; document in project README
Ignore comment used without justification	Always add a reason after the ignore directive
Coverage passes but tests are meaningless	Review test quality, not just the metric
New file added with no tests	Threshold regression catches it at commit time
Browser tests import server-only code	Create stub modules, alias in browser config
Flaky tests in pre-commit hooks	Investigate root cause; do not retry or skip

References

Ecosystem Patterns — Index of per-language references:
- TypeScript/JS | Python | Go | Rust | Merging
Coverage Exclusions — How to document and justify every exclusion
Enforcement — Wiring coverage into hk hooks, CI pipelines, and PR checks
Test quality you measure & gate (beyond line/branch coverage):
- Mutation Testing — Does the suite detect changes? Per-ecosystem tools, the oracle gap, diff-based CI
- Fuzzing — Coverage-guided crash-finding (Go native, cargo-fuzz, Atheris, Jazzer, OSS-Fuzz), time-boxed CI
For test design quality — property-based, snapshot/approval, differential/metamorphic, contract, assertion density, flaky-test management — see the testing skill (../testing/SKILL.md)