Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

testing

Étoiles14

Forks1

Mis à jour25 juin 2026 à 23:58

Design and write effective tests for behavioural changes, bug fixes, and refactors. Use when choosing a test layer, practising TDD, picking doubles/fakes, reducing brittle or flaky tests, refactoring safely, or applying property-based, snapshot/approval, differential/metamorphic, or contract testing. For coverage, thresholds, mutation testing, fuzzing, and CI/hook enforcement, use the test-coverage skill.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

connorads

connorads/dotfiles

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Explorateur de fichiers

3 fichiers

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

tmux-plugin-fork-updates

connorads/dotfiles

Safely review, sync, and locally update forked tmux plugins. Use whenever the user mentions tmux-upstream, tmux plugin forks, `prefix + U`, a `connorads/<plugin>` fork being commits behind upstream, asks whether a tmux plugin update is dodgy/compromised/safe, or asks to sync/update a forked tmux plugin. Default to review-only and ask before syncing unless the user explicitly requested automatic safe sync.

2026-06-2614

mechanical-enforcement

connorads/dotfiles

Catalogue of preferred linter rules, TypeScript flags, clippy thresholds, and architectural boundary checks for making bug classes and design drift mechanically impossible. Use when setting up linting in a new project, hardening an existing project, responding to a class of bug by encoding a rule, or deciding which linter to reach for on a given stack. Pairs with the `hk` skill which handles wiring hooks.

2026-06-2514

homebrew-formula-authoring

connorads/dotfiles

Create, update, validate, and submit Homebrew formulae (homebrew-core, built from source). Use when the user mentions a Homebrew formula, Homebrew/homebrew-core, adding/updating a formula, brew create, building from source, a build system in a brew context (cargo/rust, go, cmake, meson, autotools/configure, make, python virtualenv, node/npm, ruby gem), resource blocks, depends_on/keg_only/uses_from_macos, the mandatory test do block, bottles, livecheck, brew bump-formula-pr, or when asked to run brew audit --new / brew test / brew style for a formula. For macOS GUI apps and prebuilt binaries use the homebrew-cask-authoring skill instead.

2026-06-2414

test-coverage

connorads/dotfiles

Systematically audit, improve, and enforce test coverage, and gate test quality in CI — across any ecosystem (TypeScript, Python, Go, Rust). Use to raise coverage, set thresholds, audit gaps, manage exclusions, merge reports, wire coverage into CI/hooks, or add mutation testing and fuzzing as quality gates. Composes with the hk skill for pre-commit enforcement. For how to design and write good tests — property-based, snapshot/approval, differential, contract, flaky-test handling — use the testing skill.

2026-06-2414

update-vendored-skills

connorads/dotfiles

Safely refresh the vendored third-party agent skills in this dotfiles repo. Use whenever the user wants to update, refresh, upgrade, or re-pull vendored skills (`skills update`), or asks to check whether a skill refresh is safe / dodgy / compromised before committing. `skills update` is an unauthenticated git clone with no quarantine, no signature, and no scan — and skill files are instructions injected into every agent session — so this skill gates each refresh by reading the diff and only auto-commits trusted-source, clean-diff updates.

2026-06-2214

claude-api

connorads/dotfiles

Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration. TRIGGER — read BEFORE opening the target file; don't skip because it "looks like a one-liner" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Fable, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens). SKIP only when another provider is being worked on (overrides all triggers): OpenAI/GPT/Gemini/Llama/Mistral/Cohere/Ollama named in the query; OR `grep -rE 'openai|langchain_openai|google.generativeai|genai|mistralai|cohere|ollama'` over the project hits (run this grep FIRST

2026-06-2214

name

testing

description

Testing

Use tests as design feedback and regression protection. Prefer tests that prove observable behaviour through public APIs over tests that mirror implementation structure.

Decision Tree

What is the task?
|-- New behaviour or bug fix
|   |-- Can the behaviour be observed through a public API? -> write the test there
|   `-- Is the seam missing or awkward? -> simplify the design before adding doubles
|-- Refactor existing code
|   |-- Behaviour already covered? -> refactor behind the tests
|   `-- Behaviour not covered? -> characterise it first, then refactor
|-- Large input space or invariants
|   `-- Add property-based tests alongside named examples
|-- Flaky or brittle tests
|   `-- remove time/order/network coupling and implementation-detail assertions
`-- Coverage report, thresholds, or CI/hook enforcement
    `-- Use the test-coverage skill

Core Rules

Prefer TDD for behavioural changes: see the failure, make it pass, then refactor.
Test observable behaviour through public APIs, not implementation details.
Keep tests deterministic and order-independent.
Make each test's why clear from its name, setup, and assertions.
A failing test should point at the cause quickly; vague failures are test design problems.
Prefer real values and simple pure tests before introducing doubles.

Choosing the Layer

Choose the narrowest layer that proves the behaviour.

Layer	Use for	Shape
Pure core	Business rules, parsing, validation, calculations	Unit tests with real values
Application/use case	Decisions across owned ports	Public API tests with fakes for owned ports
Adapter	Database, queue, filesystem, third-party integration code	Contract/integration tests against real infrastructure where practical
Composition	Wiring, CLI, HTTP handlers, UI journeys	A few integration/e2e checks for critical paths

Do not use e2e tests to compensate for untested domain logic. Do not use unit tests to assert wiring that only fails when components are composed.

Shell, zsh and POSIX sh testing

Choose shell test tooling by the language boundary being tested:

Bash-heavy CLI/code: bats-core is a good default.
Cross-shell shell functions/libraries: prefer ShellSpec.
POSIX sh portability: run the same behaviour tests under multiple real shells (dash, busybox sh, bash --posix, etc.).
zsh-native functions/autoloads/completions: test inside zsh with isolated shell state; Bats can black-box execute zsh commands but is not a zsh-native test language.

Prefer black-box CLI tests for scripts: arguments, exit status, stdout, stderr, and filesystem effects. For shell functions, isolate PATH, HOME/ZDOTDIR, temp dirs, fixtures, and shell options.

See references/shell-testing.md for tool comparison, zsh isolation patterns, POSIX multi-shell loops, and example harnesses.

Scenario (integration) tests

A useful named layer sits between application and composition: run the real application end to end and fake only the externals you don't own (third-party HTTP), switching the faked backend state per test.

Fake at the network boundary (e.g. MSW server-side), not by stubbing your own modules — routing, parsing, middleware, and wiring all run for real.
Define named backend states ("payment succeeds", "auth times out") and select one per test instead of restarting the app or re-mocking by hand.
Isolate parallel tests by tagging each with an id (e.g. an injected header) so concurrent tests don't share mocked state.

Reach for it to prove a vertical slice works without standing up real third-party services. It complements, and does not replace, a few true e2e checks and exhaustive domain tests.

Testing at multiple boundaries

"Narrowest layer" is the default, not an absolute. Deliberately re-test the same business rule at more than one boundary (e.g. the domain function and the HTTP API) when defense in depth earns the duplication:

Duplicate business rules, not plumbing. Re-prove a rule at each public entry point; test plumbing (status codes, parsing, DOM details) only at the layer it lives in.
Why pay for it: which layer's test fails tells you which boundary broke; two layers re-implementing a rule surface drift the moment one changes; the rule survives as suites erode.
The cost is real: every rule change touches every layer that asserts it.

Stop duplicating and sample at the outer layer instead when: the inner layer becomes a thin wrapper (ceremony outweighs the rule), the outer surface explodes to many endpoints, suite runtime crosses a pain threshold (keep the domain exhaustive, sample the API), or there is only ever one consumer behind a trivial forwarder.

Test Doubles

Avoid mocks by default; they tend to couple tests to call order and internal collaboration.

Pure core should not need doubles.
Use fakes for ports you own when real infrastructure would make tests slow or nondeterministic.
Test adapters with real infrastructure where feasible, or with contract tests that prove the adapter fulfils the port.
For expensive or hostile external systems, fake at an application-owned port and keep at least one smoke/integration check where practical.
If a test needs many mocks, reconsider the boundary rather than adding more mocking.

The mirror-test trap: a test that mocks the very collaborator whose behaviour it claims to verify proves wiring, not behaviour — and stays green when the real behaviour breaks. A handler test that stubs validateBooking to return a rejection and then asserts the handler returns 400 never exercises the real rule: delete the rule and the test still passes, because the test supplied the rejection itself. Such tests also survive mutation of the mocked unit. Assert against the real collaborator, or prove the rule at its own layer.

Property-Based Tests

Use property-based tests when examples under-sample the behaviour:

parsers and serialisers
normalisation and canonicalisation
permissions matrices
state machines
ordering, sorting, deduplication
arithmetic, date/time, ranges
round trips and invariants

Write properties as invariants over generated inputs, not randomised examples. Keep generators valid by construction where possible. Keep named example tests for edge cases and regression stories; use property tests to explore the input space around them.

Failures shrink automatically to a minimal counterexample — persist that case as a regression example so the specific failure is checked deterministically forever. For stateful systems, generate a sequence of operations and check them against a simple in-memory model (model-based testing).

See property-based-testing.md for per-ecosystem frameworks (fast-check, Hypothesis, proptest, rapid/gopter), shrinking, stateful/model-based testing, CI integration, and pitfalls.

Differential & Metamorphic Testing

Use these when there is no reliable oracle — you cannot state the correct output, only relationships between outputs. They are the backbone of compiler, parser, database, numeric, and ML testing.

Differential: run the same input through two independent implementations (or old vs new version) and assert they agree. Cheap and powerful for safe refactors and for parsers/compilers — keep the reference implementation as the oracle.
Metamorphic: assert a relation between related inputs when no single output is checkable — sin(x) == sin(pi - x); permuting training data should not change a model's accuracy; add-then-remove restores state. Usually expressed as a property (see above), so reach for your PBT framework.

Snapshot & Approval Tests

Snapshot tools (Jest/Vitest snapshots, insta for Rust, syrupy for Python, ApprovalTests) record output and diff future runs against it. Useful for large, semantically meaningful serialised output — but they fail open and degrade:

Snapshot rot / rubber-stamping: when a snapshot breaks, the path of least resistance is update-and-merge, so the snapshot ends up asserting "what the code currently does", not what it should.
Over-broad snapshots bury the one meaningful line among hundreds of irrelevant ones; every change churns the snapshot and nobody reads the diff.

Use them well: keep snapshots small and targeted (snapshot the one derived value, not the whole DOM/object), review every update as real code, and prefer explicit assertions whenever you can name the expectation. Treat a snapshot-only test as roughly assertion-free for quality purposes. Avoid snapshots for incidental structure.

Assertion Quality

A test with no assertion only proves "it did not throw". Make each test's assertions name the behaviour they protect. As a cheap guard, flag assertion-free tests in lint/CI (e.g. ESLint jest/expect-expect, or AST/grep checks for test functions lacking assert/expect/require). Assertion count is a weak, gameable proxy — the rigorous measure of "do my assertions actually catch bugs" is mutation testing, owned by the test-coverage skill.

Contract Testing

Two senses, both about proving a boundary without a full end-to-end stack:

Adapter/port contract (within one codebase): one test suite run against both the real adapter and any fake proves they satisfy the same port. Prefer this over mocks for owned ports (see Test Doubles).
Consumer-driven contract (across independently deployed services, e.g. Pact): the consumer publishes the requests/responses it relies on; the provider verifies it still satisfies them. Sits between integration and e2e, catching cross-service breaks cheaply. Pitfalls: broker/tooling overhead and false confidence if contracts drift from real usage. For HTTP, schema/OpenAPI contract testing is lighter when one side owns the spec.

Refactoring Existing Code

Before refactoring, characterise current behaviour through public APIs. Commit those tests separately while the old implementation still exists. Then refactor behind the tests.

If behaviour is unclear, preserve it first and ask before changing it. Use golden or approval tests only when output is large and semantically meaningful. Avoid snapshots for incidental structure.

Fixing Bugs

Prove the bug is detectable before fixing it. Add or adjust a failing test, enable the strict check, or reproduce the failing command. The red step does not need a commit, but it should be real enough to prove the fix.

After the fix, run the narrowest relevant check first, then the broader checks needed for confidence.

Flaky Tests

A flaky test (passes and fails on the same code) erodes trust in the whole suite. Retry-to-green is an anti-pattern — auto-rerunning until a pass hides a real defect (usually a race, shared state, or order-dependency) and lets it ship.

Detect: re-run suspected tests to surface flakiness, not mask it (pytest-rerunfailures, go test -count=N, seed/order shuffling to expose order-dependence). CI test-analytics (Datadog, Buildkite, GitHub) track per-test pass/fail history over time.
Quarantine, then fix: move a confirmed-flaky test out of the blocking gate into a tracked quarantine with an owner and a deadline — do not skip and forget, and do not leave it blocking the build. Root-cause it: timing, shared state, network, or nondeterministic ordering.

Prevention is design: keep tests deterministic and order-independent, and remove time/network coupling (see Core Rules).

References

property-based-testing.md — Per-ecosystem PBT frameworks, shrinking, stateful/model-based testing, CI integration, and pitfalls.
shell-testing.md — Bats/ShellSpec/shUnit2/cram trade-offs, zsh isolation, POSIX multi-shell testing, and shell fakes.
For coverage reports, thresholds, exclusions, mutation testing, fuzzing, and CI/hook enforcement of test quality, use the test-coverage skill.