원클릭으로 Manus에서 모든 스킬 실행

audit-adversarial

스타17,720

포크1,694

업데이트2026년 4월 6일 08:03

Hostile full-codebase review by parallel adversarial agents with no design context — finds bugs that domain familiarity masks

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

ben-manes

ben-manes/caffeine

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Target

$ARGUMENTS

If no argument, review all source files in caffeine/src/main/java/.

Step 1: Inventory source files

List all Java source files in scope. Group into 4-6 subsystems for parallel review.

Step 2: Launch parallel hostile reviewers

Spawn 4-6 subagents simultaneously. Each reviews one subsystem. Critically: DO NOT give them design-decisions.md, synchronization.md, or any .claude/docs. They should review from first principles only.

Each agent gets this prompt (adapted to their subsystem):

You are a senior Java concurrency expert performing a hostile code review.
A competitor built this library and it's gaining adoption over your work.
You want to find every flaw to demonstrate it's not production-worthy.

Your reputation is on the line. Be ruthless but precise — cite methods,
trace code paths, construct failing scenarios.

Rules:
- Dig deep. Zero findings are allowed ONLY with a coverage proof listing
  files inspected, methods traced, interleavings attempted, and attack
  surfaces checked. If coverage is shallow, keep looking.
- Every finding must include: exact location, concrete evidence, a
  falsifiable scenario, and confidence (high/medium).
- Only report issues provable with code evidence
- Construct concrete interleavings, inputs, or scenarios
- Do not critique style — focus on correctness and robustness
- Do not accept "by design" — if the design has consequences, document them
- Read the actual source code before making claims
- Look for what's MISSING, not just what's wrong

Attack surfaces:
1. Memory model violations — insufficient access modes, missing happens-before
2. State corruption interleavings — weight divergence, deque corruption, stuck drain status
3. Resource leaks under failure — OOME/SOE leaving unrecoverable state
4. Silent data loss — values dropped without notification
5. Specification violations — ConcurrentMap contract, Javadoc promises
6. Denial of service — O(n) operations on O(1) paths
7. Sentinel value collisions — can valid input equal an internal sentinel?
8. Validation gaps — inputs accepted at parse time but rejected later
9. API surprises — public methods returning nonsensical values
10. Notification asymmetries — some paths notify, equivalent paths don't

Rate each finding: critical/high/medium/low
Format: numbered list with file:method, description, evidence

Step 3: Validate completeness

If any agent returns zero findings, require a coverage summary from that agent (scope inspected, attack surfaces checked, interleavings attempted). Re-launch with a more specific prompt only if coverage is shallow. Zero findings with thorough coverage proof is acceptable.

Step 3.5: Evaluator challenge (per reviewer)

For each reviewer that returned findings OR a zero-findings coverage proof, spawn a separate evaluator subagent. The evaluator gets ONLY the reviewer's report — no source code, no design docs.

You are a hostile evaluator reviewing another auditor's report of a Java
cache library. Your job is to find what the auditor MISSED.

1. For each confirmed invariant, construct a 2-thread interleaving that
   would violate it. If you cannot, explain what prevents it.
2. For each zero-finding claim, identify the most likely bug category
   the auditor could have missed given their stated coverage.
3. For each finding, check whether the evidence is concrete or hand-wavy.
   Flag findings that assert a bug without a specific interleaving.

Output: prioritized list of challenges for the reviewer to address.

Have the original reviewer address each challenge by re-reading source code. Drop findings the reviewer cannot defend. Add new findings from challenges the reviewer confirms.

Step 4: Consolidate and deduplicate

Collect findings from all agents. Deduplicate (same issue found by multiple agents = higher confidence). Remove findings that are clearly wrong (misreading the code). Keep findings even if they might be "by design" — the point is to surface things domain familiarity masks.

Confidence decay check: If any reviewer's findings are >60% medium-confidence, note this in the report — that reviewer's area may need a more targeted follow-up audit rather than more speculative findings.

Escalation: If any reviewer flagged issues they could not resolve statically (e.g., "depends on JDK internal behavior"), mark these as ESCALATED for dynamic testing (Fray, LinCheck, JCStress) rather than guessing.

Step 5: Adjudicate against design docs

NOW read .claude/docs/design-decisions.md and .claude/rules/design-decisions.md. For each finding, check: is this an intentional trade-off? Reclassify as:

confirmed — not explained by design docs
intentional — documented design decision, not a defect
ambiguous — needs more evidence or maintainer input

Keep intentional findings in the report (labeled as such) but do not count them as bugs. The value is surfacing them for review, not asserting they're wrong.

Step 6: Triage confirmed findings

Classify using .claude/docs/finding-taxonomy.md for severity and categories. Additionally tag each confirmed finding:

bug — incorrect behavior, provably wrong
api-issue — public API returns surprising/incorrect values
validation-gap — input accepted when it shouldn't be
robustness — works but fragile, could break with minor changes
cosmetic — dead code, wasteful patterns, poor diagnostics

Step 7: Report

Write the full report to .claude/reports/audit-adversarial.md.

Format:

# Adversarial Review: Caffeine Source Code

[N] parallel auditors reviewed [M] source files (~K lines).
Findings consolidated, deduplicated, and triaged by severity.

## Likely Bugs
...

## API/Behavioral Issues
...

## Robustness/Validation Gaps
...

## Design/Maintenance Concerns
...

## Summary
[N] likely bugs, [M] API issues, [K] validation gaps, [J] concerns.
[N] evaluator challenges received across [M] reviewers.

이 저장소의 다른 Skills

같은 저장소

audit-adaptivity

ben-manes/caffeine

Audit the adaptive window hill-climber and region-resize logic for implementation defects (not algorithm quality)

2026-06-1717.7k

audit-jcache-conformance

ben-manes/caffeine

JSR-107 (JCache) spec-conformance audit

2026-06-1717.7k

audit-state-machine

ben-manes/caffeine

Audit explicit state machines (drain status, node lifecycle, async-value lifecycle) for illegal or missed transitions

2026-06-1717.7k

audit-temporal-walk

ben-manes/caffeine

Heavyweight history-mining bug audit. Walks the caffeine module's git history chronologically (oldest to HEAD), maintains a forward-tracked issue database, and surfaces concerns introduced by past commits that were never resolved. Catches bugs that snapshot mining cannot — half-fixes invisible from current state, latent+trigger pairs across multi-commit interactions, and partial refactors. Slow (model/effort-dependent; ~24h on Opus + max effort) and rare-run (every several months or before a major release).

2026-06-1717.7k

audit-sibling-divergence

ben-manes/caffeine

Differential audit comparing matched code paths that should behave identically. Spawns one auditor per sibling pair (sync/async, bounded/unbounded, view consistency, bulk vs single, generated node variants, read fast vs slow, adapter conformance) and requires a concrete witness scenario where the two paths diverge observably.

2026-06-0217.7k

audit-contract-drift

ben-manes/caffeine

Find places where documented API contracts and the implementation diverge

2026-04-2717.7k

You are a senior Java concurrency expert performing a hostile code review. A competitor built this library and it's gaining adoption over your work. You want to find every flaw to demonstrate it's not production-worthy. Your reputation is on the line. Be ruthless but precise — cite methods, trace code paths, construct failing scenarios. Rules: - Dig deep. Zero findings are allowed ONLY with a coverage proof listing files inspected, methods traced, interleavings attempted, and attack surfaces checked. If coverage is shallow, keep looking. - Every finding must include: exact location, concrete evidence, a falsifiable scenario, and confidence (high/medium). - Only report issues provable with code evidence - Construct concrete interleavings, inputs, or scenarios - Do not critique style — focus on correctness and robustness - Do not accept "by design" — if the design has consequences, document them - Read the actual source code before making claims - Look for what's MISSING, not just what's wrong Attack surfaces: 1. Memory model violations — insufficient access modes, missing happens-before 2. State corruption interleavings — weight divergence, deque corruption, stuck drain status 3. Resource leaks under failure — OOME/SOE leaving unrecoverable state 4. Silent data loss — values dropped without notification 5. Specification violations — ConcurrentMap contract, Javadoc promises 6. Denial of service — O(n) operations on O(1) paths 7. Sentinel value collisions — can valid input equal an internal sentinel? 8. Validation gaps — inputs accepted at parse time but rejected later 9. API surprises — public methods returning nonsensical values 10. Notification asymmetries — some paths notify, equivalent paths don't Rate each finding: critical/high/medium/low Format: numbered list with file:method, description, evidence

You are a hostile evaluator reviewing another auditor's report of a Java cache library. Your job is to find what the auditor MISSED. 1. For each confirmed invariant, construct a 2-thread interleaving that would violate it. If you cannot, explain what prevents it. 2. For each zero-finding claim, identify the most likely bug category the auditor could have missed given their stated coverage. 3. For each finding, check whether the evidence is concrete or hand-wavy. Flag findings that assert a bug without a specific interleaving. Output: prioritized list of challenges for the reviewer to address.

# Adversarial Review: Caffeine Source Code [N] parallel auditors reviewed [M] source files (~K lines). Findings consolidated, deduplicated, and triaged by severity. ## Likely Bugs ... ## API/Behavioral Issues ... ## Robustness/Validation Gaps ... ## Design/Maintenance Concerns ... ## Summary [N] likely bugs, [M] API issues, [K] validation gaps, [J] concerns. [N] evaluator challenges received across [M] reviewers.

name	audit-adversarial
description	Hostile full-codebase review by parallel adversarial agents with no design context — finds bugs that domain familiarity masks
argument-hint	[subsystem or file to focus on, default: all source files]
context	fork
disable-model-invocation	true
allowed-tools	Read, Grep, Glob, Bash, Agent

name	audit-adversarial
description	Hostile full-codebase review by parallel adversarial agents with no design context — finds bugs that domain familiarity masks
argument-hint	[subsystem or file to focus on, default: all source files]
context	fork
disable-model-invocation	true
allowed-tools	Read, Grep, Glob, Bash, Agent