Run any Skill in Manus with one click

audit-tests

Audit the existing test suite for redundancy, quality gaps, and organization: duplicate or self-confirming tests, coverage gaps metrics miss, and safe cleanup fixes. Use `prune-new-tests` instead when the target is tests added in the current uncommitted diff.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/conn-castle/agent-layer --skill audit-tests

Copy and paste this command into Claude Code to install the skill

Source

conn-castle/agent-layer

Stars7

Forks1

UpdatedMay 31, 2026 at 14:58

SKILL.md

readonly

audit-tests

Audit the health of the existing test suite and fix what can be fixed safely:

discover the project's test conventions and runner configuration
classify existing tests by tier (unit, integration, e2e)
identify redundant, misleading, or low-value tests
identify meaningful coverage gaps that line-coverage metrics miss
identify misclassified tests (e.g., integration tests labeled as unit tests)
remove dead tests and strengthen weak assertions where mechanical
report all findings and fixes

Use boost-coverage when the goal is to write new tests to raise coverage. Use prune-new-tests when the goal is to prune speculative tests added in the current uncommitted diff (burden-of-proof, diff-scoped only). Use this skill when the goal is to assess and clean up the existing test suite.

Defaults

Default scope is all test files in the repository.
Classify tests into tiers based on project conventions, not assumptions.
If the project has no clear tier separation, note that as a finding rather than inventing a classification scheme.
Prioritize findings that affect developer confidence, suite speed, or maintenance burden.

Inputs

Accept any combination of:

explicit paths, directories, or modules
tier filters (unit, integration, e2e, or all)
a maximum finding count
whether to run coverage commands during gap analysis (default: yes if available)

Required artifact

Write the report to:

.agent-layer/tmp/audit-tests.<run-id>.report.md

Use run-id = YYYYMMDD-HHMMSS-<short-rand>. Create the file with touch before writing.

Multi-agent pattern

Recommended roles:

Convention scout: discovers test runner, directory layout, naming patterns, and tier conventions.
Redundancy analyst: identifies duplicative and overlapping tests.
Quality analyst: identifies low-value tests, weak assertions, and misclassified tests.
Gap analyst: identifies meaningful coverage gaps by comparing test targets against production code structure.
Fixer: removes dead tests and applies mechanical fixes.
Reporter: writes the final report.

Global constraints

Do not assume test tier conventions; discover them from the project's configuration, directory structure, and naming patterns.
Do not treat line-coverage metrics as the sole measure of test health.
Do not flag tests as redundant without concrete evidence (shared setup, identical assertions on the same code path, duplicated scenarios).
Keep findings tied to specific test files and functions.
Do not run tests or coverage commands unless required for gap analysis and the user has not opted out.

Human checkpoints

Required: ask when the project's test conventions are ambiguous enough that tier classification would be unreliable.
Required: ask before removing tests that have partial value and no clear negative-value or duplicate finding.
Required: ask when a finding would require changes to production code for testability.
When a checkpoint involves a genuine tradeoff between substantive alternatives, present at least two options with brief pros and cons, state which you recommend and why, and let the human decide.
Stay autonomous for: dead tests, clear negative-value tests, clear duplicates, and mechanical assertion fixes.

Audit workflow

Phase 0: Preflight (Convention scout)

Confirm baseline with git status --porcelain.
Read COMMANDS.md before choosing any test or coverage commands.
Discover test conventions: runner and configuration, directory structure and naming patterns, tier separation (directories, suffixes, tags, markers), fixture/helper patterns, and any categorization system.
If tier conventions are unclear, note this and proceed with best-effort classification. Do not invent conventions.

Phase 1: Inventory and classify (Convention scout)

Build an inventory of all test files in scope.
Classify each test file by tier:
- Unit: tests a single function/method/class in isolation, mocks external dependencies
- Integration: tests interaction between multiple components, may use real databases or services
- E2E: tests complete user-facing workflows or API paths
- Unclassified: does not clearly fit a single tier
Record the classification rationale for each file or group.
Note any tests that appear misclassified relative to their actual behavior (e.g., a "unit" test that starts a database).

Phase 2: Redundancy analysis (Redundancy analyst)

Identify tests that are duplicative or overlapping:

tests that exercise the same code path with the same inputs and equivalent assertions
test functions that are copy-pasted with trivial variations
multiple test files covering the same module with substantial overlap
test helpers or fixtures that duplicate production code behavior

For each redundancy finding, state:

which tests overlap
what they share (code path, setup, assertions)
which test is the more complete or maintainable version

Phase 3: Quality analysis (Quality analyst)

Identify tests with quality concerns:

Tautological/self-confirming tests: tests whose assertions are satisfied by their own setup; delete clear cases instead of counting them as coverage. Also flag runtime tests that only re-check constraints already enforced by a language, compiler, type checker, schema, or static analyzer.
Rubber-stamp tests: tests with no meaningful assertions — they run code but only check that it does not panic/error, assert truthiness without verifying behavior, or assert on implementation details rather than outcomes. These provide false confidence and should be deleted.
Duplicate tests: tests that are substantially identical to another test — same code path, same inputs, equivalent assertions — often created by agents adding a new test per task instead of extending existing tests. Keep the more complete version; delete the rest.
Weak assertions: tests that assert only on happy paths or skip error paths, boundary conditions, or guard clauses for tested functions
Fragile tests: tests tightly coupled to implementation details that would break on safe refactors
Misleading names: test names that do not match what the test actually verifies
Dead tests: tests that are skipped, commented out, or unreachable
Slow unit tests: tests classified as unit tests that perform I/O, network calls, or sleep

Phase 4: Gap analysis (Gap analyst)

Analyze gaps separately for each tier. Every tier must have its own dedicated section in the findings. No tier may be silently omitted.

For each tier, the conclusion must be one of:

gaps exist (list them)
no gaps found (state the evidence)
not applicable (with genuine architectural justification — e.g., a pure library with no running services has no meaningful integration tier)
tier does not exist yet but the project would benefit from it (this is itself a gap finding)

"Not applicable" requires genuine architectural justification, not merely "the project doesn't have these tests yet."

Unit test gaps: untested production functions/methods/modules; uncovered error paths, guard clauses, and boundary conditions; complex branching tested only at higher tiers; recently changed code (last 3 months) with stale or missing unit tests.

Integration test gaps: untested component interactions and interface boundaries; data-layer operations tested only via mocks; configuration/wiring never tested with real components.

E2E test gaps: untested user-facing workflows or API paths; critical business flows relying solely on lower-tier coverage; deployment-sensitive paths (migrations, startup, health checks) with no e2e coverage.

Additional tiers (when discovered): apply the same gap analysis to any project-specific tier (contract, smoke, performance, etc.) and note when a tier is expected by conventions but has no tests.

Focus on gaps that represent real risk, not low line-coverage numbers. If coverage commands are available and the user has not opted out, run them to inform the analysis.

Phase 5: Fix safe findings (Fixer)

Delete clear tautological/self-confirming tests; report the resulting coverage gap instead of replacing them with false coverage.
Delete clearly dead tests (skipped, commented out, unreachable).
Delete rubber-stamp tests that have no meaningful assertions.
Consolidate clear duplicates: keep the more complete version, delete the rest. When tests overlap substantially but each has unique value, merge into a single consolidated test.
Strengthen weak assertions where the fix is mechanical and unambiguous.
For borderline cases where the test has partial value and deletion is not clearly correct, ask the user.
Run the test suite after fixes to confirm nothing broke.

Phase 6: Synthesize findings (Reporter)

Each finding across all phases must include:

Title
Severity: High | Medium | Low
Category: redundancy | quality | gap | misclassification
Tier: unit | integration | e2e | cross-tier (for redundancy/quality findings that span tiers)
Location: test file(s) and function(s)
Evidence: concrete observation
What was done: fixed | needs human decision | recommendation for boost-coverage

Required report structure

Write .agent-layer/tmp/audit-tests.<run-id>.report.md with:

# Test Audit Summary
- scope audited
- test conventions discovered
- short outcome summary
## Fixes Applied
- what was removed, changed, or strengthened
## Test Inventory
- total test count by tier
- tier classification rationale
- misclassified tests
## Redundancy Findings
- ordered by impact (most duplicative first)
## Quality Findings
- ordered by severity
## Gap Findings
- one subsection per tier (### Unit Test Gaps, ### Integration Test Gaps, ### E2E Test Gaps, plus each additional tier discovered)
- ordered by risk; every tier must appear with an explicit conclusion
## Strengths
- well-tested areas, good patterns worth preserving
## Recommended Actions
- prioritized list: what still needs human decision, what to add
- distinguish between actions for this skill and actions for boost-coverage

Guardrails

Do not turn test style preferences into findings unless they affect correctness, maintainability, or developer confidence.
Do not recommend removing tests without evidence of redundancy or negative value.
Do not conflate low coverage with poor test quality; they are separate concerns.
Do not flag framework-generated or conventional boilerplate as redundant.
Do not widen a test audit into a production code audit.

Definition of done

The report exists at .agent-layer/tmp/audit-tests.<run-id>.report.md with every required section (Summary, Fixes Applied, Test Inventory, Redundancy Findings, Quality Findings, Gap Findings, Strengths, Recommended Actions).
The Gap Findings section contains one subsection per discovered tier, each with an explicit conclusion (gaps exist / no gaps / not applicable with justification / tier missing).
Every finding names the tier, test file(s), function(s), evidence, and What was done verdict; no tier is silently omitted.
If any tests were removed or consolidated, the test suite was re-run and the report records the outcome.

Final handoff

After writing the report:

Echo the report path.
Summarize fixes applied and the test inventory by tier.
If significant gaps were found, recommend boost-coverage on the identified areas.

name	audit-tests
description	Audit the existing test suite for redundancy, quality gaps, and organization: duplicate or self-confirming tests, coverage gaps metrics miss, and safe cleanup fixes. Use `prune-new-tests` instead when the target is tests added in the current uncommitted diff.

audit-tests

More from this repository

More from this repository

audit-tests

Defaults

Inputs

Required artifact

Multi-agent pattern

Global constraints

Human checkpoints

Audit workflow

Phase 0: Preflight (Convention scout)

Phase 1: Inventory and classify (Convention scout)

Phase 2: Redundancy analysis (Redundancy analyst)

Phase 3: Quality analysis (Quality analyst)

Phase 4: Gap analysis (Gap analyst)

Phase 5: Fix safe findings (Fixer)

Phase 6: Synthesize findings (Reporter)

Required report structure

Guardrails

Definition of done

Final handoff

audit-tests

Defaults

Inputs

Required artifact

Multi-agent pattern

Global constraints

Human checkpoints

Audit workflow

Phase 0: Preflight (Convention scout)

Phase 1: Inventory and classify (Convention scout)

Phase 2: Redundancy analysis (Redundancy analyst)

Phase 3: Quality analysis (Quality analyst)

Phase 4: Gap analysis (Gap analyst)

Phase 5: Fix safe findings (Fixer)

Phase 6: Synthesize findings (Reporter)

Required report structure

Guardrails

Definition of done

Final handoff