Run any Skill in Manus with one click

prune-new-tests

Burden-of-proof review of tests added in the current uncommitted diff: auto-delete any newly added test that cannot defend its existence with a concrete production-code mutation that would flip its assertion. Use `audit-tests` for full-suite health; use `boost-coverage` to add tests.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/conn-castle/personal-context --skill prune-new-tests

Copy and paste this command into Claude Code to install the skill

Source

conn-castle/personal-context

Stars0

Forks0

UpdatedMay 18, 2026 at 19:57

File Explorer

2 files

SKILL.md

readonly

prune-new-tests

This skill prunes low-value tests that the implementing agent added as a side effect of implementation. Each newly added test must defend its existence with a concrete mutation in production code that would flip the assertion. Tests that cannot are auto-deleted.

Use audit-tests instead when the target is the full existing suite. Use boost-coverage instead when the goal is to add tests to raise coverage. This skill never touches pre-existing tests.

Defaults

Default scope is tests added in the current uncommitted diff only (staged, unstaged, and untracked test files). Pre-existing tests are never touched.
Default disposition is delete unless justified. The implementing agent did not earn a test's place by writing it; the test must demonstrate it can catch a defect.
Tests are auto-deleted in place — no human approval per test. Surviving coverage gaps are reported, never backfilled with replacement assertions.

Inputs

Accept any combination of:

explicit paths or test files (still must intersect the added-test set)
a per-test override list to keep without review
a dry-run flag to produce the verdict report without deleting

Required artifact

Write the report to:

.agent-layer/tmp/prune-new-tests.<run-id>.report.md

Use run-id = YYYYMMDD-HHMMSS-<short-rand>. Create the file with touch before writing.

Multi-agent pattern

Required roles:

Diff scout: enumerates every test added in the current uncommitted diff (added test files + new test functions inside existing files).
Burden-of-proof reviewer (fresh-context subagent): receives only the added test code, the production code it exercises, and the rubric below. It does not receive the implementer's narrative, the plan, the task list, or the prior conversation. Its job is to apply the rubric without inheriting the author's rationalizations.
Applier: deletes tests marked delete, runs the project test command afterward to confirm nothing unrelated broke, and writes the report.

The reviewer is invoked once per batch of added tests. If the batch is large, split into chunks of related test files; each chunk gets its own fresh-context invocation so the reviewer never accumulates context across chunks.

Reviewer subagent prompt

Pass the contents of reviewer-prompt.md to the reviewer subagent verbatim — do not paraphrase, summarize, or modify the rubric. Send no prior conversation, no plan, no implementer notes.

Inputs the reviewer receives alongside the prompt:

The added test code (full text of new test files; for new functions in existing files, the function bodies plus minimal surrounding context).
The production code each test exercises (named imports/symbols followed to their definitions; enough to judge what assertion would flip under what mutation).
Nothing else. No plan, no task list, no context file, no implementer rationale, no prior reviewer output.

Global constraints

Operate only on tests added in the current uncommitted diff. Modified pre-existing tests are out of scope. Deleted tests are out of scope.
Treat the reviewer subagent's verdicts as authoritative for delete decisions. The orchestrator does not second-guess deletions on its own.
After applying deletions, run the project's repo-defined test command (consult COMMANDS.md) and observe the output. Record what ran and the outcome.
Do not replace deleted tests with "stronger" tests in the same run. Report the resulting gap so a separate boost-coverage invocation can address it deliberately.
Do not lower coverage thresholds or skip checks to clear failures. If a deletion causes a real coverage shortfall, surface it in the report.

Human checkpoints

Required: ask when the diff contains zero added tests — the skill has no work to do; confirm before exiting silently.
Required: ask when the project has no discoverable test command in COMMANDS.md and no obvious convention — applier cannot verify deletions.
Required: ask when applying deletions would empty a brand-new test file whose presence the user clearly intended (file added by user, not agent).
Stay autonomous on all per-test delete verdicts within scope.

Workflow

Phase 1: Enumerate added tests (Diff scout)

Run git status --porcelain and git diff --cached, git diff, and git ls-files --others --exclude-standard to find:
- new test files (untracked or staged)
- new test functions inside otherwise-pre-existing test files (diff hunks that add func TestX, test('...'), it('...'), def test_*, etc., according to the project's discovered test conventions)
Read COMMANDS.md to identify the test command for the verification step.
List each added test by file path, test name, and line range in the report under ## Added Tests.

Phase 2: Burden-of-proof review (Burden-of-proof reviewer)

Group added tests into review chunks (one test file or a small cluster of related files per chunk).
For each chunk, invoke the reviewer subagent with the contents of reviewer-prompt.md and the chunk inputs above. The subagent must be a fresh invocation with no carryover from this conversation.
Collect the JSON-line verdicts into the report under ## Verdicts as a table with columns Location, Name, Verdict, Justification. The Justification column shows the mutation for keep and the reason for delete.

Phase 3: Apply deletions (Applier)

Delete each delete-verdict test:
- if a whole test file becomes empty, delete the file
- if a function within a larger file is deleted, remove the function and any imports/fixtures that become unused as a result
Run the project's test command (from COMMANDS.md). Record the actual command and the observed result in the report.
If unrelated tests break, stop and surface the failure — do not "fix forward" by re-adding deleted tests or weakening assertions.

Phase 4: Survival check (Applier)

Compute survival = keep_count / total_added_tests.
If survival > 0.90, flag the run as suspect under ## Survival Check and note that the reviewer may be rubber-stamping. Recommend re-running with stricter rubric framing or splitting chunks more aggressively.
Otherwise, record the survival ratio for the audit trail.

Phase 5: Report surviving coverage gaps (Applier)

For every deleted test, name the production behavior it nominally targeted (from the reviewer's deletion reason and the test's original assertion). List these under ## Surviving Coverage Gaps so a follow-up boost-coverage run can address them deliberately. Do not fabricate replacement assertions in this run.

Required report structure

Write .agent-layer/tmp/prune-new-tests.<run-id>.report.md with:

# Prune New Tests Summary
## Scope
## Added Tests
## Verdicts
- table: | Location | Name | Verdict | Justification |
## Deletions Applied
- one bullet per file or function actually removed
## Test Run
- the exact command, observed exit status, and any unexpected breakage
## Survival Check
- kept / total, ratio, and the suspect flag if > 0.90
## Surviving Coverage Gaps
- behaviors that lost coverage and warrant a boost-coverage follow-up

Guardrails

Do not delete pre-existing tests, even when they look low-value. That is audit-tests's domain and requires explicit opt-in.
Do not preserve a test on the strength of the implementer's narrative. The reviewer never saw that narrative, and the orchestrator must not reintroduce it.
Do not "improve" surviving tests during this skill's run. Improvements belong to a separate pass.
Do not collapse multiple deleted tests into a single replacement test. That re-introduces agent-authored speculative coverage.
Do not skip Phase 3 verification. A green test command is part of the contract.
Do not silently widen scope to modified or pre-existing tests; surface them as out-of-scope observations instead.

Definition of done

The report exists at .agent-layer/tmp/prune-new-tests.<run-id>.report.md with every required section (Scope, Added Tests, Verdicts, Deletions Applied, Test Run, Survival Check, Surviving Coverage Gaps).
Every added test in scope appears in the Verdicts table with one verdict and the required justification field populated.
Every delete verdict is reflected by a matching entry under Deletions Applied and the corresponding test/function is gone from the working tree.
Test Run records the exact command and its observed result; deletions did not introduce unrelated test failures.
The survival ratio is recorded; if > 0.90, the suspect flag is set and the recommended next step is named.

Final handoff

After writing the report:

Echo the report path.
State the total added tests and the kept/deleted counts.
Name the survival ratio and whether the suspect flag fired.
If Surviving Coverage Gaps is non-empty, recommend running boost-coverage against the listed behaviors.

name	prune-new-tests
description	Burden-of-proof review of tests added in the current uncommitted diff: auto-delete any newly added test that cannot defend its existence with a concrete production-code mutation that would flip its assertion. Use `audit-tests` for full-suite health; use `boost-coverage` to add tests.

prune-new-tests

More from this repository

More from this repository

prune-new-tests

Defaults

Inputs

Required artifact

Multi-agent pattern

Reviewer subagent prompt

Global constraints

Human checkpoints

Workflow

Phase 1: Enumerate added tests (Diff scout)

Phase 2: Burden-of-proof review (Burden-of-proof reviewer)

Phase 3: Apply deletions (Applier)

Phase 4: Survival check (Applier)

Phase 5: Report surviving coverage gaps (Applier)

Required report structure

Guardrails

Definition of done

Final handoff

prune-new-tests

Defaults

Inputs

Required artifact

Multi-agent pattern

Reviewer subagent prompt

Global constraints

Human checkpoints

Workflow

Phase 1: Enumerate added tests (Diff scout)

Phase 2: Burden-of-proof review (Burden-of-proof reviewer)

Phase 3: Apply deletions (Applier)

Phase 4: Survival check (Applier)

Phase 5: Report surviving coverage gaps (Applier)

Required report structure

Guardrails

Definition of done

Final handoff