Run any Skill in Manus with one click

$pwd:

test-refine

Name: Test Refine
Author: Roasbeef

// Refines an existing Go test suite — removes trivial/duplicate tests, strengthens weak assertions, reshapes tests around invariants, and closes branch-coverage gaps. Uses code-guided coverage and (when available) gremlins mutation-testing survivor data rather than relying on line coverage alone. Use when test quality is uneven, after a test-generation pass, before opening a PR, or as a quality gate on critical paths (consensus, channel state, payment flows). Triggers: "refine these tests", "tests are bloated", "tighten assertions", "remove trivial tests", "audit test quality", "/test-refine".

Run Skill in Manus

$ git log --oneline --stat

stars:19

forks:3

updated:May 20, 2026 at 00:30

File Explorer

39 files

SKILL.md

readonly

related-skills.json

same repository

lnget.md

from "Roasbeef/claude-files"

Use lnget to fetch resources from L402-protected URLs that require Lightning payments. Covers basic fetching, payment limits (max cost, max routing fee), token cache management, and Lightning backend status. Use when an HTTP request returns 402 Payment Required and a Lightning micropayment is needed, or when downloading files behind a Lightning paywall.

2026-05-2019

mutation-testing.md

from "Roasbeef/claude-files"

Validates Go test suite quality through mutation testing using go-gremlins/gremlins. Mutates production code, runs the test suite against each mutant, and reports which mutants the tests fail to kill — exposing weak assertions that line coverage cannot detect. Use when evaluating test effectiveness, validating newly written tests, or improving test quality for mission-critical code (consensus, channel state, payment flows, crypto). Triggers: "mutation test", "are these tests strong", "validate test quality", "/mutation-testing".

2026-05-2019

go-debug.md

from "Roasbeef/claude-files"

Interactively debug Go programs in a single context using Delve (dlv) driven through tmux. Use when a bug requires runtime inspection — stepping through code, examining variables, walking goroutines, attaching to a live process, or debugging a hanging integration test — rather than just reading the source. Triggers include "step through this", "set a breakpoint", "attach to the running server", "why is this goroutine stuck", "debug this failing test".

2026-05-2019

substrate.md

from "Roasbeef/claude-files"

This skill provides agent mail management via the Subtrate command center. Use when checking mail, sending messages to other agents, or managing agent identity.

2026-03-3019

lnd.md

from "Roasbeef/claude-files"

Run and interact with lnd Lightning Network daemon in Docker. Use for Lightning development, testing payment channels on regtest, managing lnd containers, and calling lnd RPC endpoints (getinfo, connect, open/close channels, pay/receive). Supports bitcoind, btcd, and neutrino backends.

2026-03-3019

agent-cli.md

from "Roasbeef/claude-files"

Design and review CLIs for AI agent consumption. Covers machine-readable output, input hardening against hallucinations, schema introspection, context window discipline, dry-run safety rails, and skill file packaging. Use when building new CLIs, adding agent support to existing CLIs, reviewing CLI designs for agent compatibility, or wrapping APIs as CLI tools. Triggers: agent CLI, CLI for agents, machine-readable CLI, agent-first CLI, CLI agent DX.

2026-03-3019

package.json

"author": "Roasbeef"

"repository": "Roasbeef/claude-files"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name

test-refine

description

Refines an existing Go test suite — removes trivial/duplicate tests, strengthens weak assertions, reshapes tests around invariants, and closes branch-coverage gaps. Uses code-guided coverage and (when available) gremlins mutation-testing survivor data rather than relying on line coverage alone. Use when test quality is uneven, after a test-generation pass, before opening a PR, or as a quality gate on critical paths (consensus, channel state, payment flows). Triggers: "refine these tests", "tests are bloated", "tighten assertions", "remove trivial tests", "audit test quality", "/test-refine".

Test Refine

A test suite can be voluminous and yet weak: tests that overlap, assert nothing meaningful, run code without checking outputs, or miss the branches that actually matter. Line coverage doesn't catch this — a 100% line-covered suite can have zero real assertions. This skill operates on an existing test suite and refines it.

This is orthogonal to test generation skills (test-forge, property-based-testing):

test-forge generates new tests.
property-based-testing designs PBT for new or existing functions.
mutation-testing validates whether existing tests catch behavior changes.
test-refine (this skill) consumes the above signals and changes the test suite — strengthening assertions, removing dead weight, reshaping for invariants, closing branch-coverage gaps.

When to Use

After a test-forge pass — sharpen the generated tests.
Before opening a PR — clean up the diff so reviewers see strong tests.
Periodically on critical packages — guard against test-quality drift.
Following a mutation-testing run with survivors — convert each LIVED mutant into a tightened test.

Two-Phase Workflow

The skill runs read-only triage first, produces a markdown report, then applies fixes only after the user reviews and approves.

Phase A — Triage (read-only)

# Default: package in cwd. JSON intermediates land in /tmp; markdown in
# .reviews/test-refinement/<date>-<scope>.md.
~/.claude/skills/test-refine/scripts/triage.sh

# Pin scope to a single file
~/.claude/skills/test-refine/scripts/triage.sh --scope file --target ./internal/wallet/wallet_test.go

# Diff-scoped (changed test files in current branch vs auto-detected
# default branch — origin/HEAD, then main, then master).
~/.claude/skills/test-refine/scripts/triage.sh --scope diff

# Whole repo (slow)
~/.claude/skills/test-refine/scripts/triage.sh --scope repo

# With mutation testing for the strongest signal. Diff scope fans out
# across the unique set of affected packages (capped by
# MUTATION_FANOUT_CAP=5 by default; override via env var). Repo scope
# hard-fails — narrow before re-trying.
~/.claude/skills/test-refine/scripts/triage.sh --scope package --use-mutations
~/.claude/skills/test-refine/scripts/triage.sh --scope diff    --use-mutations

The triage script:

Resolves scope.
Runs go test -cover -covermode=atomic to capture per-function statement+branch coverage.
Optionally calls ~/.claude/skills/mutation-testing/scripts/unleash.sh for LIVED mutant data.
Runs the AST analyzers (detect-smells.go, detect-duplicates.go, domain-checks.go).
Scores findings via score.go (composite priority — see below).
Renders the markdown report.

Phase B — Batch Fix (after user approves report)

The user reviews the report. Each finding has a checkbox. The user checks boxes for the fixes they approve, then:

~/.claude/skills/test-refine/scripts/apply-fixes.sh \
    --report .reviews/test-refinement/2026-05-06-wallet.md

The script applies only the checked items:

Strengthen assertions in place.
Add missing branch tests.
Reshape into invariants (rewrite using rapid for flagged cases).
Remove tests only if explicitly checked.

After fixes, it re-runs go test ./... -race -count=1, appends an "After" metrics section to the same report, and surfaces the diff.

Safety rule: a test is never removed unless its checkbox in the report is explicitly checked. This honors the global rule "never remove/skip tests without asking" from ~/.claude/CLAUDE.md.

Composite Priority Score

When a triage produces dozens of findings, ranking matters. The composite score combines three signals:

priority = w_risk × risk_score(file_path)
         + w_severity × severity(smell_id)
         + w_gap × branch_gap(function)

Component	Range	Source
`risk_score`	0.2–1.0	File-path heuristic: `consensus
`severity`	0.3 (L) / 0.6 (M) / 1.0 (H)	Smell catalog severity (`references/smell-catalog.md`)
`branch_gap`	0.0–1.0	Uncovered branches in the target function / total branches

Default weights: 0.5 / 0.3 / 0.2. Override via --weights risk=0.6,severity=0.3,gap=0.1.

Smell Catalog (summary)

Full catalog with Go examples in references/smell-catalog.md.

ID	Smell	Severity
S01	No assertions at all (recognises testify, helper-named functions, and `t.Run` subtests)	High
S02	Tautological assertion (`x == x`)	High
S03	Getter/setter trivial test	Medium
S04	Asserts no panic only	High
S05	Unchecked error from SUT	High
S06	Sensitive equality on rendered text — `fmt.Sprint*` and non-canonical `.String()`. Skipped for canonical types (chainhash.Hash, UUID, big.Int, time.Time, OutPoint, etc.)	Medium
S07	Conditional/skipped assertion	Medium
S08	Duplicate test body (semantic)	Medium
S09	Assertion roulette — only fires when ≥4 bare asserts share the same call+RHS shape and the test has ≥8 bare asserts; confidence 0.4 (advisory)	Low
S10	Expect-the-expected (`want` derived from `got`)	High
S11	Side-effect not asserted	Medium
S12	Mutation-survivor zone (gremlins data required)	High

Domain-Specific Checks

For systems / distributed / Bitcoin / networking code, "good test" goes beyond standard assertion strength. The skill also checks four dimensions detailed in references/domain-checks.md:

Concurrency: SUT uses goroutines/channels/mutexes? Test must run with -race-friendly patterns and exercise concurrent calls. Sequential test of concurrent code is flagged.
Failure-mode: SUT returns error or accepts context.Context? Tests must cover error path, cancellation, timeout. For network/disk SUTs, missing fault-injection tests are flagged. Inspired by Jepsen nemesis patterns.
Property/invariant: Detects PBT candidates — serialization roundtrips, parser/encoder pairs, normalizers, comparators. Proposes conversion to rapid PBT (see references/reshape-to-invariants.md).
Determinism: time.Now(), rand.* without seeds, os.Getenv, goroutine-ordering assumptions in tests are flagged. Suggests injectable clocks/RNGs (DST-style).

What Counts as Trivial (and Removable)

Findings flagged for removal candidacy (still always require user approval):

No assertions (S01).
Getter/setter-only tests (S03) — testing language semantics, not behavior.
Tautological assertions (S02).
Zero-mutation-killers (S12 special case): when gremlins data shows a test exists in the package whose removal causes no change in test_efficacy. The test catches no mutants the rest of the suite doesn't already catch.

Naming Convention

Test function names must not contain underscores. Use TestEncodeTxRoundtrip, not TestEncodeTx_Roundtrip. For variants of the same logical test, use t.Run("subtest name", func(t *testing.T) {...}). The reshape pass enforces this — any rewrite the skill proposes uses subtests, not underscored function names. See references/strengthening-patterns.md.

Aggressive Reshape Posture

The user opted into "aggressive — reshape tests for invariants". Examples:

A test that asserts Marshal(x) == fixed_bytes for a single value → reshape into a rapid roundtrip property: Unmarshal(Marshal(x)) == x for arbitrary x.
A test that asserts a single state transition → reshape into a rapid.StateMachine covering all transitions.
Multiple tests covering near-identical inputs → consolidate into a table-driven test with the union of cases.

Every reshape proposal in the report shows the original test verbatim alongside the proposed rewrite. If the user rejects the reshape, no change is made.

Critical Pitfall Guards

Never auto-delete. The report's checkbox grammar (- [ ] Remove TestX) is the only way a test is removed. Bulk approval is not a thing.
Reshape passing tests of mission-critical code with care. Phase B optionally re-runs gremlins unleash on touched packages to verify reshape didn't weaken mutation score. Pass --verify-mutations.
PBT conversion for side-effect-bearing code uses state-machine PBT. Pure-function PBT is wrong here. Such cases are flagged separately and link to property-based-testing/references/strategies.md.
Equivalent mutants are not test gaps. When gremlins reports LIVED, the AST is checked for equivalence patterns (mutated value immediately overwritten, mutation in unreachable code, associative no-op) before flagging as S12.

Output Artifacts

The committed markdown report lives at:

.reviews/test-refinement/<YYYY-MM-DD>-<scope-slug>.md

Contents:

Header banners: warn when the data layer is incomplete (coverage = 0%, gremlins skipped) or when priority is severity-only because branch-gap is unavailable. The user sees this before trusting the ranking.
Before metrics: test count, assertion count, statement+branch coverage %, test_efficacy if mutation data available.
Top-N findings table: rank, priority, confidence, fix-kind, file:line, smell, severity, test → SUT. The full message is not in the table — it's in the per-finding detail block, so nothing gets truncated.
Per-finding detail block: smell, suggestion, an Apply fix checkbox, a False positive — won't fix checkbox (so disagreement is recordable across re-runs), and a collapsible <details> block with the test function body so the reviewer can sanity-check without context-switching to the source file.
Domain-check section: concurrency / failure-mode / PBT / determinism findings. Runs for all scopes (package, file, diff, repo) — the diff-scope variant fans out across the unique set of affected package directories.
Removal candidates list: each with one-line justification and explicit checkbox.
PBT conversion candidates: specific functions + property sketch using rapid.
(After fixes) "After" metrics appended to the same file.

Integration

Reuses property-based-testing skill's references for PBT conversion templates.
Consumes mutation-testing skill output (gremlins JSON) when --use-mutations is set.
Invocable from /code-review and /pre-pr-review flows as a sub-step.

test-refine

Test Refine

When to Use

Two-Phase Workflow

Phase A — Triage (read-only)

Phase B — Batch Fix (after user approves report)

Composite Priority Score

Smell Catalog (summary)

Domain-Specific Checks

What Counts as Trivial (and Removable)

Naming Convention

Aggressive Reshape Posture

Critical Pitfall Guards

Output Artifacts

Integration

Further Reading

Test Refine

When to Use

Two-Phase Workflow

Phase A — Triage (read-only)

Phase B — Batch Fix (after user approves report)

Composite Priority Score

Smell Catalog (summary)

Domain-Specific Checks

What Counts as Trivial (and Removable)

Naming Convention

Aggressive Reshape Posture

Critical Pitfall Guards

Output Artifacts

Integration

Further Reading

test-refine

More from this repository

More from this repository

Test Refine

When to Use

Two-Phase Workflow

Phase A — Triage (read-only)

Phase B — Batch Fix (after user approves report)

Composite Priority Score

Smell Catalog (summary)

Domain-Specific Checks

What Counts as Trivial (and Removable)

Naming Convention

Aggressive Reshape Posture

Critical Pitfall Guards

Output Artifacts

Integration

Further Reading

Test Refine

When to Use

Two-Phase Workflow

Phase A — Triage (read-only)

Phase B — Batch Fix (after user approves report)

Composite Priority Score

Smell Catalog (summary)

Domain-Specific Checks

What Counts as Trivial (and Removable)

Naming Convention

Aggressive Reshape Posture

Critical Pitfall Guards

Output Artifacts

Integration

Further Reading