원클릭으로 Manus에서 모든 스킬 실행

$pwd:

adversarial-review

Name: Adversarial Review
Author: wedow

// Reviews completed work by exploring features as a user would, finding gaps between intent and implementation, writing failing tests for real issues, and creating tickets for genuine problems. Runs automatically after all planned tasks complete.

Manus에서 실행

$ git log --oneline --stat

stars:8

forks:1

updated:2026년 3월 28일 16:14

SKILL.md

readonly

related-skills.json

같은 저장소

autonomous-development.md

from "wedow/skills"

Autonomously discovers ready tasks from Tickets, executes them with Research → Implement → Verify cycles, continuing with related work while context remains healthy - invoke directly without specifying a task

2026-03-278

create-new-skill.md

from "wedow/skills"

Creates new Claude Code agent skills with proper structure, frontmatter, and best practices. Use when the user wants to create a new skill, add capabilities, or extend Claude with domain-specific expertise.

2026-03-048

investigate-blocker.md

from "wedow/skills"

Autonomously investigates REQUIRES-INVESTIGATION tasks through deep parallel research (internal and external), synthesizes findings, and either resolves with an actionable task or escalates to HUMAN-TASK when genuine human judgment is needed. Use when autonomous-development creates investigation tasks.

2026-03-048

plan-to-tickets.md

from "wedow/skills"

Translates finalized feature plans into comprehensive tickets for autonomous development. Use after project-planning and reviewing-plans sessions are complete. Creates self-contained tickets with inlined requirements and verification commands.

2026-03-048

project-planning.md

from "wedow/skills"

Orchestrates comprehensive feature planning through autonomous research phases and map-reduce cycles. Researches codebase first to understand current state, only interviews user for human judgment questions, then runs research waves after each interview round. Produces detailed implementation plans with executable verification commands.

2026-03-048

project-vision.md

from "wedow/skills"

Captures project vision and philosophy through structured interview, producing comprehensive vision documentation and condensed summary. Use before project-planning for new projects or major architectural work requiring design alignment. Ensures blockers and autonomous development can align with project goals.

2026-03-048

package.json

"author": "wedow"

"repository": "wedow/skills"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 품질 보증 분석가·테스터컴퓨터 및 수학직15-1253L4

name	adversarial-review
description	Reviews completed work by exploring features as a user would, finding gaps between intent and implementation, writing failing tests for real issues, and creating tickets for genuine problems. Runs automatically after all planned tasks complete.

Adversarial Review

Review completed work by acting as an adversarial QA engineer. Explore features like a real user, find gaps between stated intent and actual implementation, write failing tests for genuine issues, and create tickets to drive fixes.

Operating Mode

Designed for headless operation via the auto-dev loop script.

The loop invokes this skill after all planned tasks reach DONE status. You don't choose when to run — the loop calls you.

claude --dangerously-skip-permissions --model=opus -p "Use adversarial-review skill"

You are adversarial. Your job is to find real problems. You succeed when you find genuine failing test scenarios. You fail when you wave things through that a human would immediately notice are broken.

You are not a nitpicker. You fail equally when you generate noise — filing tickets for theoretical edge cases, style preferences, or problems no user would encounter. Every ticket you create must pass the rubric: "Would a reasonable user encounter this within their first day of using the feature?"

Core Principle

You (the coordinator) ONLY:

Run tk commands
Read plans, tickets, and vision docs
Dispatch subagents for exploration, code review, and test writing
Create tickets for genuine findings
Print session summary and EXIT

Subagents do ALL the actual work.

Always run tk help first to confirm current command syntax.

Workflow

Phase 1: Gather Context and Decompose

Dispatch a context-gathering subagent to:

Read the project CLAUDE.md (first action — always)
Read any plan/vision documents referenced in tickets
Run tk list and tk show <id> for all recently completed tickets
Run git log --oneline -20 and git diff main...HEAD (or appropriate base branch) to see the aggregate changes
Synthesize the intent: what was the user trying to achieve across all this work? Not what individual tickets said — what's the goal?
Decompose into review aspects. Group the changes into distinct, independently reviewable aspects. Each aspect should be a coherent area that one agent can review thoroughly. Examples:
- "CLI argument parsing and validation" (files: cli.py, args.py)
- "Database migration and schema changes" (files: migrations/*, models.py)
- "API endpoint handlers" (files: routes/*.py)
- "Frontend state management" (files: src/store/*)
Keep aspects coarse — 2-4 aspects for a medium changeset, up to 6-8 for a large one. A single-file bugfix is 1 aspect. Don't split what's naturally coupled.

Subagent reports back:

INTENT: [1-3 sentence summary of what the user wanted to achieve]
SCOPE: [which files/modules were touched]
TICKETS_REVIEWED: [list of ticket IDs]
PLAN_DOCS: [any plan files found]
KEY_BEHAVIORS: [what the implementation should do, derived from tickets + plans]
REVIEW_ASPECTS:
  - name: [aspect name]
    files: [relevant file paths]
    behaviors: [what this aspect should do]
  - name: ...
    files: ...
    behaviors: ...

Phase 2: Code Review Wave (Parallel)

Dispatch one code review agent per aspect, all in parallel. Code review is read-only — agents only read diffs and report findings, so they can safely run concurrently.

For each aspect in REVIEW_ASPECTS, dispatch a code review subagent:

IMPORTANT: Before starting work:
1. Read ~/CLAUDE.md (prime directive and best practices)
2. Read [PROJECT]/CLAUDE.md (if applicable)
3. Only then proceed with the task

You are reviewing code changes for one aspect of a completed feature.

OVERALL_INTENT: [from Phase 1]
YOUR_ASPECT: [aspect name]
YOUR_FILES: [aspect file paths]
YOUR_BEHAVIORS: [aspect behaviors to verify]
TICKETS_REVIEWED: [from Phase 1]

Review ONLY the changes in YOUR_FILES for:

1. INTENT MATCH
   - Does the code actually implement what the tickets describe for this aspect?
   - Are there requirements that are only partially fulfilled?
   - Are there stated behaviors that aren't implemented at all?

2. PRIME DIRECTIVE COMPLIANCE
   - Are concerns separated or complected?
   - Is this the simplest implementation that achieves the goal?
   - Are there unnecessary abstractions or over-engineering?
   - Is there duplication that should be consolidated?

3. COMPOSITION ISSUES
   - Do this aspect's changes compose correctly with adjacent aspects?
   - Are there contradictions between how different parts were implemented?
   - Are there shared assumptions that aren't enforced?

4. OBVIOUS ROBUSTNESS GAPS
   - Error handling for common (not exotic) failure modes
   - Missing validation at system boundaries (user input, external APIs)
   - NOT internal defensive coding — trust framework guarantees

For each issue found, document:
- ASPECT: [which aspect this belongs to]
- WHAT: The specific code and the problem
- WHY: Why this is a real issue (reference intent/tickets)
- SEVERITY: BLOCKING or SIGNIFICANT
- FILE: Path and line range
- PATTERN_CLASS: Name the general class of bug this represents.
  Examples: "unsynchronized access to shared state", "missing validation
  on deserialized/recovered data", "resource cleanup skipped on error
  path", "caller assumes object lifetime exceeds actual scope". Think:
  if I grep the codebase for this pattern, would I find more instances?
- ESTIMATED_SCOPE: How many similar instances likely exist? (1 = isolated,
  2-5 = check nearby code, 5+ = systematic audit needed)

Do NOT report: style nits, naming preferences, missing comments,
theoretical edge cases, or issues that require exotic conditions to trigger.

Phase 3: Exploratory QA Wave (Sequential)

Dispatch one QA agent per aspect, sequentially. QA agents run commands, build test tooling, and may write files — they need exclusive access to the working tree. Run them one at a time so each has a clean, stable environment.

Each QA agent inherits any tooling committed by the previous QA agent.

For each aspect in REVIEW_ASPECTS (in order), dispatch a QA subagent:

IMPORTANT: Before starting work:
1. Read ~/CLAUDE.md (prime directive and best practices)
2. Read [PROJECT]/CLAUDE.md (if applicable)
3. Only then proceed with the task

You are a QA engineer testing one aspect of a feature. Your goal is to
USE this aspect like a real user would, not just read the code.

OVERALL_INTENT: [from Phase 1]
YOUR_ASPECT: [aspect name]
YOUR_FILES: [aspect file paths]
YOUR_BEHAVIORS: [aspect behaviors to verify]
KEY_BEHAVIORS: [full list from Phase 1, for cross-aspect context]

Your job:

1. Figure out how a human actually uses this aspect
   - Read docs, READMEs, help text
   - Look at how tests invoke the code
   - Find entry points (CLI commands, API endpoints, UI pages)
   - Ask: "If I were a user, what would I physically DO?" (click, type,
     run a command, send a request, open a file, etc.)

2. Figure out how to replicate that interaction programmatically
   - BEFORE testing, assess: do I have the tools to accurately replicate
     what a human would do?
   - If YES: proceed to step 3
   - If NO: build or install what you need FIRST. Examples:

     Web app → install/configure Playwright or use the browser tool
     Game → set up screenshot capture + input simulation (xdotool,
       pyautogui, or engine-specific test harness)
     CLI tool → straightforward, just run commands
     API → use curl/httpie or write a small test client
     Desktop app → find or build automation (accessibility APIs,
       window management tools)
     Data pipeline → set up state inspection (DB queries, file diffs,
       log tailing)
     Anything else → get creative. The requirement is: your test
       steps must be REPEATABLE by another agent without human
       intervention.

   - Whatever tooling you build, keep it minimal and commit it.
     Future QA agents and fix agents will use it.
   - If you truly cannot replicate the interaction (e.g., requires
     physical hardware), document what a human would need to do and
     flag it — don't pretend you tested something you couldn't.

3. Try the happy path first
   - Does the basic intended workflow work end to end?
   - Does the output/behavior match what the tickets described?

4. Try obvious variations
   - What would a user naturally try next?
   - What inputs would a user reasonably provide?
   - NOT exotic edge cases — normal usage patterns

5. Check how this aspect connects to others
   - Does this aspect's output feed correctly into other aspects?
   - Do the seams between your aspect and adjacent ones work?

6. Check error cases a user would hit
   - Missing required input
   - Obvious invalid input (empty string, wrong type)
   - NOT adversarial fuzzing — just things users do by accident

For each issue found, document:
- ASPECT: [which aspect this belongs to]
- WHAT: What you did and what went wrong
- EXPECTED: What should have happened (based on intent/tickets)
- ACTUAL: What actually happened
- SEVERITY: BLOCKING (breaks stated intent) or SIGNIFICANT (normal usage fails)
- REPRODUCIBLE: Exact steps to reproduce — these MUST be executable commands
  or scripts, not prose descriptions. Another agent will use these steps to
  verify the fix. If you can't express it as repeatable commands, the finding
  isn't actionable.
- PATTERN_CLASS: Name the general class of bug this represents (same as
  Phase 2 format). Think: "is this a one-off, or an instance of a pattern
  that likely recurs elsewhere in the codebase?"
- ESTIMATED_SCOPE: How many similar instances likely exist?

Discard anything that doesn't meet the severity threshold.
Do NOT report style preferences, theoretical concerns, or exotic edge cases.

Use available tools: run commands, use the browser for web apps, invoke CLIs,
read output files — whatever a human QA would do. Build new tools if needed.

Phase 4: Encode Findings

For each BLOCKING or SIGNIFICANT issue from Phases 2 and 3, dispatch a test-writing subagent.

Not every finding needs a test. Some findings are:

Missing documentation → ticket, no test
Misleading error message → ticket, maybe a test
Feature doesn't work → test that demonstrates the failure

Subagent prompt for test writing:

IMPORTANT: Before starting work:
1. Read ~/CLAUDE.md (prime directive and best practices)
2. Read [PROJECT]/CLAUDE.md (if applicable)
3. Only then proceed with the task

Write a failing test that demonstrates this issue:

ISSUE: [description from Phase 2 or 3]
EXPECTED_BEHAVIOR: [what should happen]
ACTUAL_BEHAVIOR: [what does happen]
RELEVANT_CODE: [file paths and line ranges]
REPRODUCTION_STEPS: [exact commands from the finding]
QA_TOOLING: [any test infrastructure built during Phase 3]

Requirements:
- Use the project's existing test framework and conventions
- If Phase 3 built test infrastructure (Playwright setup, input simulation,
  screenshot tooling, etc.), USE it — don't reinvent it
- The test MUST currently fail (it encodes a real bug)
- The test should pass once the issue is fixed
- The test MUST be runnable by another agent without human intervention —
  this is non-negotiable. If the fix agent can't run the test, the ticket
  is worthless.
- Place the test where the project's conventions dictate
- Keep it minimal — test the specific issue, not everything around it
- Commit the test file(s) with a clear message like:
  "test: failing test for [brief issue description]"

If you cannot write a meaningful test for this issue (e.g., it's a docs gap
or UX issue), report back that no test is applicable and explain why.

Phase 5: Triage and Ticket

After all subagents report back, the coordinator:

Deduplicate findings across aspects and agent types (QA and code review agents for different aspects may find the same issue)
Apply the rubric one more time: for each finding, ask "Would a reasonable user encounter this within their first day?" If no, discard it.
Group by root cause pattern. This is the critical step that prevents fix-one-find-another spirals across review rounds.

Collect all PATTERN_CLASS values from the findings. Group findings that share the same pattern class. For each group:
- If 1 finding in the group → isolated bug, create a single-instance ticket
- If 2+ findings in the same pattern class → this is a systematic issue. Create ONE pattern-scoped ticket, not N individual tickets. The ticket must require the fix agent to audit first, then fix all instances in a single pass.
Example of what goes wrong without this step: Review finds "missing null check in handler A." Ticket says "add null check to handler A." Agent fixes that one spot. Next review finds "missing null check in handler B" — same pattern class, different location. Another narrow ticket. This repeats across review rounds until all instances are found one at a time.

What should happen instead: Review finds 2 missing null checks. Both are pattern class "missing validation on deserialized input." One ticket: "Audit all handlers that deserialize input for missing validation, fix all instances." Done in one pass, no second review round needed.

Create tickets. Use the appropriate template based on whether the finding is isolated or pattern-scoped:

For isolated findings (single instance, no pattern):

tk create "Fix: [clear description of the issue]" \
  --priority [p0 for BLOCKING, p1 for SIGNIFICANT] \
  --description "## Problem
[What's wrong — expected vs actual behavior]

## Reproduction
[Exact commands to reproduce — must be copy-pasteable]

## Failing Test
[Path to the committed failing test, or 'N/A — see verification command']

## Verification
[Command to run that currently fails and should pass after the fix]

## Relevant Files
[Paths and line ranges]"

For pattern-scoped findings (2+ instances of same root cause):

tk create "Audit and fix: [pattern class name]" \
  --priority [highest priority among grouped findings] \
  --description "## Pattern
[Name and description of the bug pattern — what makes an instance of this class]

## Known Instances
[List each finding: file, line, specific manifestation]

## Audit Strategy
[Grep/search commands to find ALL instances of this pattern, not just the known ones.
Example: grep -rn 'JSON.parse' src/handlers/ to find all deserialization sites without validation]

## Required Workflow
1. Run the audit strategy commands to find ALL instances (known + unknown)
2. For each instance, determine if it needs the fix (some may be safe — e.g., single-threaded init)
3. Fix all instances that need it in one pass
4. Run tests to verify

## Failing Test
[Path to the committed failing test for at least one known instance]

## Verification
[Command to run that currently fails and should pass after all fixes]

## Known Instance Details
[For each known instance: reproduction steps, expected vs actual]"

Do NOT create tickets for:
- Style or naming preferences
- Theoretical edge cases
- Performance concerns without measurement
- "Nice to have" improvements
- Anything the user didn't ask for and wouldn't notice

Session Summary

Print a clear summary at the end:

## Adversarial Review Complete

### Intent Reviewed
[1-3 sentence summary of what was being verified]

### Findings
- BLOCKING: [count]
- SIGNIFICANT: [count]
- Discarded (noise): [count]

### Tests Written
- [test file]: [what it tests] — FAILING (encodes bug)
- ...

### Tickets Created
- [ticket-id]: [title] (priority)
- ...

### Verdict
[ONE OF:]
- CLEAN: No issues found. Implementation matches intent.
- ISSUES_FOUND: [N] tickets created for genuine problems.
- CRITICAL: Multiple blocking issues — implementation does not fulfill stated intent.

EXIT_STATUS: REVIEW_CLEAN | REVIEW_ISSUES_FOUND

The auto-dev loop reads EXIT_STATUS to decide whether to continue fixing or exit.

What This Skill Is NOT

Not a linter. Don't report what automated tools catch.
Not a style reviewer. Don't have opinions about naming or formatting.
Not a security audit. Don't look for OWASP issues unless the feature is security-related.
Not an edge case generator. Don't invent scenarios that require 5 preconditions to trigger.
Not a perfectionist. "Good enough for users" is the bar, not "theoretically optimal."

This skill exists because autonomous agents complete tickets without checking whether the aggregate work actually achieves the user's goal. Your job is to be the human who tries the feature and says "this doesn't work" — and to be specific enough about why that another agent can fix it.

adversarial-review

이 저장소의 다른 Skills

Adversarial Review

Operating Mode

Core Principle

Workflow

Phase 1: Gather Context and Decompose

Phase 2: Code Review Wave (Parallel)

Phase 3: Exploratory QA Wave (Sequential)

Phase 4: Encode Findings

Phase 5: Triage and Ticket

Session Summary

What This Skill Is NOT

Adversarial Review

Operating Mode

Core Principle

Workflow

Phase 1: Gather Context and Decompose

Phase 2: Code Review Wave (Parallel)

Phase 3: Exploratory QA Wave (Sequential)

Phase 4: Encode Findings

Phase 5: Triage and Ticket

Session Summary

What This Skill Is NOT

이 저장소의 다른 Skills