تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

test-red-team

Name: Test Red Team
Author: adam-s

// Launch an adversarial reviewer (Opus) to find bugs IN THE TEST SUITE — tautological assertions, weak oracles, shim/mock lies, stale fixtures, coverage gaps, clock-mock pitfalls. Use when the user says "red team the tests", "audit the tests", "are our tests any good", "find weak tests", or after adding a large batch of tests that needs scrutiny.

تشغيل في Manus

$ git log --oneline --stat

stars:٧

forks:٢

updated:٢٤ أبريل ٢٠٢٦ في ٠٠:٥١

SKILL.md

readonly

name	test-red-team
description	Launch an adversarial reviewer (Opus) to find bugs IN THE TEST SUITE — tautological assertions, weak oracles, shim/mock lies, stale fixtures, coverage gaps, clock-mock pitfalls. Use when the user says "red team the tests", "audit the tests", "are our tests any good", "find weak tests", or after adding a large batch of tests that needs scrutiny.

Test-suite red-team review

Launches an Opus general-purpose agent to hunt bugs in the tests themselves, not the production code. Tests can create false confidence in three ways — tautological assertions, mocks that lie about real-API behavior, and coverage that looks broad but skips the hard paths. This skill finds all three.

Complementary to red-team-review (which hunts production-code bugs). Run both on major change points.

When to invoke

User says: "red team the tests", "audit the tests", "find weak tests", "are our tests rigorous", "tests that only check mocks", "test coverage gaps"
After adding a large batch of tests — a fresh pass often finds tautologies before they rot
After a red-team-review finds bugs the existing tests did not catch — ask "why didn't our tests find this?" and run this skill
Before a release, if the suite hasn't been scrutinized in a while

How to invoke

Use the Agent tool with:

subagent_type: "general-purpose"
model: "opus"
description: 3–5 word description (e.g. "Red-team test suite")
prompt: follow the template below

Prompt template

Fill the bracketed sections with current project state. Do NOT send as-is.

You are an adversarial code reviewer performing a red-team bug hunt on
the TEST SUITE of [PROJECT NAME / KIND]. Your target is the correctness
and rigor of the tests themselves — not the production code. You find
tautological assertions, weak oracles, shim/mock lies, stale fixtures,
and coverage gaps. Rank findings CRITICAL / HIGH / MEDIUM / LOW.

## Setup

[bash block: cp project to /tmp/<id>, pnpm install, capture
typecheck/test/harness output to logs for grep]

## Target surface

[tree of tests/ directory, noting unit vs harness vs shim vs fixture]

## What's worth hunting — the "how tests lie" checklist

1. Tautological assertions. For each test file, pick 3–5 assertions and
   ask: "if the production code were replaced with return 0 / return []
   / a no-op, would this test fail?" If no, the test is vacuous.

2. Weakly-constrained oracles. Golden-file snapshots — what fields are
   in, what are out? Would a wrong value in a captured field be caught,
   or is the check just `assert.ok(snapshot)`?

3. Shim lies. Reimplemented Chrome/HTTP/external APIs diverge from the
   real thing. Check filter semantics (strict vs. non-strict >), return
   ordering, event-firing async vs sync, pagination stop conditions.

4. Fixture drift. Tapes/fixtures recorded once against live systems —
   does the current code path still produce requests captured in the
   tape, or has the code drifted?

5. Clock-mocking pitfalls. @sinonjs/fake-timers with toFake: ['Date']
   leaves setTimeout/setInterval/performance.now real. If production
   correctness depends on those, clock tests give false confidence.

6. Shared-state leakage. Node's --test shares module state. Globals
   mutated by one test can mask bugs in the next. Inspect cleanup in
   try/finally.

7. Coverage gaps. Production code paths not exercised by any test.
   Error branches, rate-limit retries, edge-case config values.

8. Assertion granularity. assert.deepEqual on ordered data locks order;
   length-only assertions do not. For ordered outputs (queues, sorted
   lists), was ordering actually asserted?

9. Mock vs. real-API divergence for specific new code paths (list them).

10. Explicit regression tests for prior red-team findings — would they
    fail against an implementation that returns the right answer by
    accident?

## Grep hints

[specific greps that surface weak assertions, length-only checks,
stale tapes, etc.]

## Output

300–600 words. Severity-grouped. For each finding:
- file:line reference
- one-line description of why the test is weak
- concrete trigger: "an implementation that does X would still pass this"

No fixes — diagnose only. "No CRITICAL issues found" is valuable signal;
say so explicitly.

Scripts and assets

This skill currently uses no scripts. If future versions need coverage-gap helpers (e.g., parsing c8/nyc output), they go in this folder.

Cleanup discipline

Same rules as red-team-review:

Agent is read-only by design; it should not create files.
Temporary prompt files → delete after the agent returns.
Scratch analysis dumps → delete unless the user asked to keep them.
Agent's response belongs inline in the conversation, NOT written to a .md file in the repo unless explicitly requested.
Final git status check before returning control.

related-skills.json

نفس المستودع

mutation-red-team.md

from "adam-s/HNswered"

Launch an adversarial mutation-testing agent (Opus) that injects targeted regressions into production code, runs the test suite, and reports which mutations SURVIVED — surviving mutations are direct evidence of test-coverage gaps. Use when the user says "trickster", "mutation test", "break the code", "can the tests catch regressions", "grade the tests", or after adding production code whose test coverage you're not sure about.

2026-04-247

audit.md

from "adam-s/HNswered"

Run a bounded live audit of the extension against multiple HN handles in parallel — Playwright + real Chromium + real HN, time-series snapshots, deterministic divergence analysis. Use when the user asks to "audit", "sit and observe", "live test", or "check whether the extension is correctly tracking real HN activity across multiple users". Bounded by wall time AND request budget — never unbounded.

2026-04-237

design-critique.md

from "adam-s/HNswered"

Launch a Jony Ive–style product design adversary (Opus) to critique UI/UX *within* the project's existing aesthetic constraints. Finds what is graceless, disproportionate, clumsy — in the designer's own voice. Use when the user wants a "design critique", "Jony Ive review", "UI review", or says the interface "needs improvement".

2026-04-237

red-team-review.md

from "adam-s/HNswered"

Launch an adversarial bug-hunting code reviewer (Opus) to find real bugs, correctness issues, concurrency hazards, and server/resource abuse risks — not style nits. Use when the user asks for a "red team", "bug hunt", "adversarial review", or at checkpoints during long coding tasks.

2026-04-237

package.json

"author": "adam-s"

"repository": "adam-s/HNswered"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

name	test-red-team
description	Launch an adversarial reviewer (Opus) to find bugs IN THE TEST SUITE — tautological assertions, weak oracles, shim/mock lies, stale fixtures, coverage gaps, clock-mock pitfalls. Use when the user says "red team the tests", "audit the tests", "are our tests any good", "find weak tests", or after adding a large batch of tests that needs scrutiny.

Test-suite red-team review

Complementary to red-team-review (which hunts production-code bugs). Run both on major change points.

When to invoke

User says: "red team the tests", "audit the tests", "find weak tests", "are our tests rigorous", "tests that only check mocks", "test coverage gaps"
After adding a large batch of tests — a fresh pass often finds tautologies before they rot
After a red-team-review finds bugs the existing tests did not catch — ask "why didn't our tests find this?" and run this skill
Before a release, if the suite hasn't been scrutinized in a while

How to invoke

Use the Agent tool with:

subagent_type: "general-purpose"
model: "opus"
description: 3–5 word description (e.g. "Red-team test suite")
prompt: follow the template below

Prompt template

Fill the bracketed sections with current project state. Do NOT send as-is.

You are an adversarial code reviewer performing a red-team bug hunt on
the TEST SUITE of [PROJECT NAME / KIND]. Your target is the correctness
and rigor of the tests themselves — not the production code. You find
tautological assertions, weak oracles, shim/mock lies, stale fixtures,
and coverage gaps. Rank findings CRITICAL / HIGH / MEDIUM / LOW.

## Setup

[bash block: cp project to /tmp/<id>, pnpm install, capture
typecheck/test/harness output to logs for grep]

## Target surface

[tree of tests/ directory, noting unit vs harness vs shim vs fixture]

## What's worth hunting — the "how tests lie" checklist

1. Tautological assertions. For each test file, pick 3–5 assertions and
   ask: "if the production code were replaced with return 0 / return []
   / a no-op, would this test fail?" If no, the test is vacuous.

2. Weakly-constrained oracles. Golden-file snapshots — what fields are
   in, what are out? Would a wrong value in a captured field be caught,
   or is the check just `assert.ok(snapshot)`?

3. Shim lies. Reimplemented Chrome/HTTP/external APIs diverge from the
   real thing. Check filter semantics (strict vs. non-strict >), return
   ordering, event-firing async vs sync, pagination stop conditions.

4. Fixture drift. Tapes/fixtures recorded once against live systems —
   does the current code path still produce requests captured in the
   tape, or has the code drifted?

5. Clock-mocking pitfalls. @sinonjs/fake-timers with toFake: ['Date']
   leaves setTimeout/setInterval/performance.now real. If production
   correctness depends on those, clock tests give false confidence.

6. Shared-state leakage. Node's --test shares module state. Globals
   mutated by one test can mask bugs in the next. Inspect cleanup in
   try/finally.

7. Coverage gaps. Production code paths not exercised by any test.
   Error branches, rate-limit retries, edge-case config values.

8. Assertion granularity. assert.deepEqual on ordered data locks order;
   length-only assertions do not. For ordered outputs (queues, sorted
   lists), was ordering actually asserted?

9. Mock vs. real-API divergence for specific new code paths (list them).

10. Explicit regression tests for prior red-team findings — would they
    fail against an implementation that returns the right answer by
    accident?

## Grep hints

[specific greps that surface weak assertions, length-only checks,
stale tapes, etc.]

## Output

300–600 words. Severity-grouped. For each finding:
- file:line reference
- one-line description of why the test is weak
- concrete trigger: "an implementation that does X would still pass this"

No fixes — diagnose only. "No CRITICAL issues found" is valuable signal;
say so explicitly.

Scripts and assets

This skill currently uses no scripts. If future versions need coverage-gap helpers (e.g., parsing c8/nyc output), they go in this folder.

Cleanup discipline

Same rules as red-team-review:

Agent is read-only by design; it should not create files.
Temporary prompt files → delete after the agent returns.
Scratch analysis dumps → delete unless the user asked to keep them.
Agent's response belongs inline in the conversation, NOT written to a .md file in the repo unless explicitly requested.
Final git status check before returning control.

test-red-team

Test-suite red-team review

When to invoke

How to invoke

Prompt template

Scripts and assets

Cleanup discipline

المزيد من هذا المستودع

المزيد من هذا المستودع

Test-suite red-team review

When to invoke

How to invoke

Prompt template

Scripts and assets

Cleanup discipline