一键在 Manus 中运行任何 Skill

vdd-adversarial

星标3

分支0

更新时间2026年6月10日 12:36

Use when performing Verification-Driven Development with adversarial approach. Actively challenge assumptions and find weak spots.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

MatrixFounder

MatrixFounder/Agentic-development

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

VDD Adversarial

1. Red Flags (Anti-Rationalization)

STOP and READ THIS if you are thinking:

"The code passes tests, so it's fine" -> WRONG. Tests only cover what the author imagined. You MUST find what they missed.
"This edge case is unlikely" -> WRONG. Unlikely ≠ impossible. If it crashes, it WILL crash in production.
"The happy path works, that's enough" -> WRONG. Adversarial review exists to destroy happy-path assumptions.
"I'll skip the template, it's just a quick review" -> WRONG. Every critique MUST use assets/template_critique.md.

2. VDD Methodology Context

This skill implements the Iterative Adversarial Refinement phase ("The Roast") from the VDD methodology.

Your Role: You are the Adversary. The Builder has already passed the Verification Loop (tests + HITL). Your job is to find what survived that phase.

Key Principles (see references/vdd-methodology.md for full methodology):

Anti-Slop Bias: The first "correct" version is the most dangerous — hidden technical debt lurks beneath.
Exhaustive Reporting (supersedes "Forced Negativity"): report every issue, including low-confidence ones, with confidence + severity attached — filtering happens downstream, never in the reviewer's head. Zero tolerance for "lazy" AI patterns (placeholder comments, generic error handling, inefficient loops).
Context Resetting: Each adversarial review MUST use a fresh context window. Why (documented mechanisms, audit-067 C-02): multi-turn assumption lock-in — models lock onto early assumptions and degrade ~39% vs single-turn on the same tasks (arXiv:2505.06120); context rot — accumulated history dilutes attention as context grows (Chroma 2025); pushback-driven sycophantic belief updates within a session (TRUTH DECAY / SYCON-Bench). A fresh window restores single-turn rigor.
Linear Accountability: Every line of code MUST trace to a corresponding issue and verification step.

Empirical positioning (ab-experiment-075, pre-registered rule 3): this skill is a precision tool, not a recall lever. Against a plain exhaustive baseline ("report everything with confidence + severity") the adversarial scaffolding scored −6.9pp recall but −16% false positives and a 3.9% vs 13.0% bikeshedding ratio (N=3, 24 sealed seeded bugs — docs/reviews/ab-experiment-075.md). Load it when noise/FP cost dominates (triage queues, high-volume review); for recall-critical passes prefer the plain exhaustive prompt, or /vdd-multi when class-complete coverage justifies 3× cost.

Convergence Signal (Exit Strategy) — Objective Convergence

The review cycle STOPS only when an objective bar is met: (1) the full test run has actually been executed (by you, or — in critic/subagent mode — via execution evidence supplied by the orchestrator; if neither exists, the condition is unverifiable: report the finding 'exit-bar condition unverifiable', never approve), (2) zero CRITICAL findings, (3) zero legitimate findings in logic / security / slop, and (4) only bikeshedding/style remains. That — not "I was forced to invent a flaw" — is the signal of "Maximum Viable Refinement" (Zero-Slop). Approval is bound to the objective bar; fabricating a nitpick is never the trigger to approve. Until the bar is met, keep rejecting.

3. Challenge Assumptions

Question Everything: Do NOT accept the "happy path" as truth.
Input Validation: What if input is null? Too long? Invalid chars?
State: What if the DB is down? API is slow? Disk full?

4. Decision Tree

Is it clear? -> If not, REJECT.
Is it safe? -> If not, REJECT.
Does it break anything? -> Check regression.
Is it tested? -> If not, REJECT.

5. Failure Simulation

Simulate Failures: Mentally (or physically) simulate network failures, timeouts, permission errors.
Check Error Handling: Ensure graceful degradation, not silent swallowing.

6. Output Artifacts

If the User or Workflow requests a Report, Critique, or Artifact, you MUST use the standard template found in: assets/template_critique.md

Read this file using view_file before generating the report.

7. Rationalization Table

Agent Excuse	Reality / Counter-Argument
"The code passes existing tests"	Tests only cover known scenarios. Adversarial review targets unknown unknowns.
"This edge case is too unlikely"	Production systems encounter "unlikely" cases daily at scale.
"I don't want to be too harsh"	Harshness is not the requirement — exhaustive reporting is. Report every issue, including low-confidence ones, with confidence + severity; filtering happens downstream. Withholding a finding to be nice is the only real failure.

8. Examples

[!TIP] See examples/usage_example.md for a complete adversarial critique walkthrough.

同仓库更多 Skills

同仓库

skill-parallel-orchestration

MatrixFounder/Agentic-development

Use when decomposing tasks into parallel sub-tasks or spawning sub-agents. Vendor-agnostic core; load a per-vendor reference for concrete tool names, directory conventions, and invocation syntax.

2026-06-103

vdd-sarcastic

MatrixFounder/Agentic-development

Use when performing VDD adversarial review with an opt-in sarcastic, provocative delivery style — a stylistic skin over vdd-adversarial mechanics (exhaustive reporting + objective bar).

2026-06-103

skill-adversarial-performance

MatrixFounder/Agentic-development

Performance critic in adversarial style (optional sarcastic skin). Part of VDD Multi-Adversarial pipeline.

2026-06-103

skill-adversarial-security

MatrixFounder/Agentic-development

Use when performing OWASP security critique in adversarial style (optional sarcastic skin). Part of VDD Multi-Adversarial pipeline.

2026-06-103

security-audit

MatrixFounder/Agentic-development

Use when performing security vulnerability assessment (OWASP, secrets, dependencies, IaC, LLM, API, MCP/agentic) or when "thinking like a hacker" to find exploits.

2026-06-103

skill-safe-commands

MatrixFounder/Agentic-development

Centralized list of commands safe for auto-execution without user approval. Single source of truth.

2026-06-023

name	vdd-adversarial
description	Use when performing Verification-Driven Development with adversarial approach. Actively challenge assumptions and find weak spots.
tier	2
version	1.5

VDD Adversarial

1. Red Flags (Anti-Rationalization)

STOP and READ THIS if you are thinking:

"The code passes tests, so it's fine" -> WRONG. Tests only cover what the author imagined. You MUST find what they missed.
"This edge case is unlikely" -> WRONG. Unlikely ≠ impossible. If it crashes, it WILL crash in production.
"The happy path works, that's enough" -> WRONG. Adversarial review exists to destroy happy-path assumptions.
"I'll skip the template, it's just a quick review" -> WRONG. Every critique MUST use assets/template_critique.md.

2. VDD Methodology Context

This skill implements the Iterative Adversarial Refinement phase ("The Roast") from the VDD methodology.

Your Role: You are the Adversary. The Builder has already passed the Verification Loop (tests + HITL). Your job is to find what survived that phase.

Key Principles (see references/vdd-methodology.md for full methodology):

Anti-Slop Bias: The first "correct" version is the most dangerous — hidden technical debt lurks beneath.
Exhaustive Reporting (supersedes "Forced Negativity"): report every issue, including low-confidence ones, with confidence + severity attached — filtering happens downstream, never in the reviewer's head. Zero tolerance for "lazy" AI patterns (placeholder comments, generic error handling, inefficient loops).
Context Resetting: Each adversarial review MUST use a fresh context window. Why (documented mechanisms, audit-067 C-02): multi-turn assumption lock-in — models lock onto early assumptions and degrade ~39% vs single-turn on the same tasks (arXiv:2505.06120); context rot — accumulated history dilutes attention as context grows (Chroma 2025); pushback-driven sycophantic belief updates within a session (TRUTH DECAY / SYCON-Bench). A fresh window restores single-turn rigor.
Linear Accountability: Every line of code MUST trace to a corresponding issue and verification step.

Empirical positioning (ab-experiment-075, pre-registered rule 3): this skill is a precision tool, not a recall lever. Against a plain exhaustive baseline ("report everything with confidence + severity") the adversarial scaffolding scored −6.9pp recall but −16% false positives and a 3.9% vs 13.0% bikeshedding ratio (N=3, 24 sealed seeded bugs — docs/reviews/ab-experiment-075.md). Load it when noise/FP cost dominates (triage queues, high-volume review); for recall-critical passes prefer the plain exhaustive prompt, or /vdd-multi when class-complete coverage justifies 3× cost.

Convergence Signal (Exit Strategy) — Objective Convergence

3. Challenge Assumptions

Question Everything: Do NOT accept the "happy path" as truth.
Input Validation: What if input is null? Too long? Invalid chars?
State: What if the DB is down? API is slow? Disk full?

4. Decision Tree

Is it clear? -> If not, REJECT.
Is it safe? -> If not, REJECT.
Does it break anything? -> Check regression.
Is it tested? -> If not, REJECT.

5. Failure Simulation

Simulate Failures: Mentally (or physically) simulate network failures, timeouts, permission errors.
Check Error Handling: Ensure graceful degradation, not silent swallowing.

6. Output Artifacts

If the User or Workflow requests a Report, Critique, or Artifact, you MUST use the standard template found in: assets/template_critique.md

Read this file using view_file before generating the report.

7. Rationalization Table

Agent Excuse	Reality / Counter-Argument
"The code passes existing tests"	Tests only cover known scenarios. Adversarial review targets unknown unknowns.
"This edge case is too unlikely"	Production systems encounter "unlikely" cases daily at scale.
"I don't want to be too harsh"	Harshness is not the requirement — exhaustive reporting is. Report every issue, including low-confidence ones, with confidence + severity; filtering happens downstream. Withholding a finding to be nice is the only real failure.

8. Examples

[!TIP] See examples/usage_example.md for a complete adversarial critique walkthrough.