원클릭으로
audit-adversarial
Hostile full-codebase review by parallel adversarial agents with no design context — finds bugs that domain familiarity masks
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Hostile full-codebase review by parallel adversarial agents with no design context — finds bugs that domain familiarity masks
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
| name | audit-adversarial |
| description | Hostile full-codebase review by parallel adversarial agents with no design context — finds bugs that domain familiarity masks |
| argument-hint | [subsystem or file to focus on, default: all source files] |
| context | fork |
| disable-model-invocation | true |
| allowed-tools | Read, Grep, Glob, Bash, Agent |
Run a hostile adversarial review of the Caffeine source code. Unlike the other audit skills which use the auditor agent (with design context), this skill deliberately gives reviewers NO project context — they review with fresh eyes and a hostile mindset.
$ARGUMENTS
If no argument, review all source files in caffeine/src/main/java/.
List all Java source files in scope. Group into 4-6 subsystems for parallel review.
Spawn 4-6 subagents simultaneously. Each reviews one subsystem. Critically: DO NOT give them design-decisions.md, synchronization.md, or any .claude/docs. They should review from first principles only.
Each agent gets this prompt (adapted to their subsystem):
You are a senior Java concurrency expert performing a hostile code review.
A competitor built this library and it's gaining adoption over your work.
You want to find every flaw to demonstrate it's not production-worthy.
Your reputation is on the line. Be ruthless but precise — cite methods,
trace code paths, construct failing scenarios.
Rules:
- Dig deep. Zero findings are allowed ONLY with a coverage proof listing
files inspected, methods traced, interleavings attempted, and attack
surfaces checked. If coverage is shallow, keep looking.
- Every finding must include: exact location, concrete evidence, a
falsifiable scenario, and confidence (high/medium).
- Only report issues provable with code evidence
- Construct concrete interleavings, inputs, or scenarios
- Do not critique style — focus on correctness and robustness
- Do not accept "by design" — if the design has consequences, document them
- Read the actual source code before making claims
- Look for what's MISSING, not just what's wrong
Attack surfaces:
1. Memory model violations — insufficient access modes, missing happens-before
2. State corruption interleavings — weight divergence, deque corruption, stuck drain status
3. Resource leaks under failure — OOME/SOE leaving unrecoverable state
4. Silent data loss — values dropped without notification
5. Specification violations — ConcurrentMap contract, Javadoc promises
6. Denial of service — O(n) operations on O(1) paths
7. Sentinel value collisions — can valid input equal an internal sentinel?
8. Validation gaps — inputs accepted at parse time but rejected later
9. API surprises — public methods returning nonsensical values
10. Notification asymmetries — some paths notify, equivalent paths don't
Rate each finding: critical/high/medium/low
Format: numbered list with file:method, description, evidence
If any agent returns zero findings, require a coverage summary from that agent (scope inspected, attack surfaces checked, interleavings attempted). Re-launch with a more specific prompt only if coverage is shallow. Zero findings with thorough coverage proof is acceptable.
For each reviewer that returned findings OR a zero-findings coverage proof, spawn a separate evaluator subagent. The evaluator gets ONLY the reviewer's report — no source code, no design docs.
You are a hostile evaluator reviewing another auditor's report of a Java
cache library. Your job is to find what the auditor MISSED.
1. For each confirmed invariant, construct a 2-thread interleaving that
would violate it. If you cannot, explain what prevents it.
2. For each zero-finding claim, identify the most likely bug category
the auditor could have missed given their stated coverage.
3. For each finding, check whether the evidence is concrete or hand-wavy.
Flag findings that assert a bug without a specific interleaving.
Output: prioritized list of challenges for the reviewer to address.
Have the original reviewer address each challenge by re-reading source code. Drop findings the reviewer cannot defend. Add new findings from challenges the reviewer confirms.
Collect findings from all agents. Deduplicate (same issue found by multiple agents = higher confidence). Remove findings that are clearly wrong (misreading the code). Keep findings even if they might be "by design" — the point is to surface things domain familiarity masks.
Confidence decay check: If any reviewer's findings are >60% medium-confidence, note this in the report — that reviewer's area may need a more targeted follow-up audit rather than more speculative findings.
Escalation: If any reviewer flagged issues they could not resolve statically (e.g., "depends on JDK internal behavior"), mark these as ESCALATED for dynamic testing (Fray, LinCheck, JCStress) rather than guessing.
NOW read .claude/docs/design-decisions.md and .claude/rules/design-decisions.md.
For each finding, check: is this an intentional trade-off? Reclassify as:
Keep intentional findings in the report (labeled as such) but do not count them as bugs. The value is surfacing them for review, not asserting they're wrong.
Classify using .claude/docs/finding-taxonomy.md for severity and categories.
Additionally tag each confirmed finding:
Write the full report to .claude/reports/audit-adversarial.md.
Format:
# Adversarial Review: Caffeine Source Code
[N] parallel auditors reviewed [M] source files (~K lines).
Findings consolidated, deduplicated, and triaged by severity.
## Likely Bugs
...
## API/Behavioral Issues
...
## Robustness/Validation Gaps
...
## Design/Maintenance Concerns
...
## Summary
[N] likely bugs, [M] API issues, [K] validation gaps, [J] concerns.
[N] evaluator challenges received across [M] reviewers.
Audit the adaptive window hill-climber and region-resize logic for implementation defects (not algorithm quality)
JSR-107 (JCache) spec-conformance audit
Audit explicit state machines (drain status, node lifecycle, async-value lifecycle) for illegal or missed transitions
Heavyweight history-mining bug audit. Walks the caffeine module's git history chronologically (oldest to HEAD), maintains a forward-tracked issue database, and surfaces concerns introduced by past commits that were never resolved. Catches bugs that snapshot mining cannot — half-fixes invisible from current state, latent+trigger pairs across multi-commit interactions, and partial refactors. Slow (model/effort-dependent; ~24h on Opus + max effort) and rare-run (every several months or before a major release).
Differential audit comparing matched code paths that should behave identically. Spawns one auditor per sibling pair (sync/async, bounded/unbounded, view consistency, bulk vs single, generated node variants, read fast vs slow, adapter conformance) and requires a concrete witness scenario where the two paths diverge observably.
Find places where documented API contracts and the implementation diverge