| name | prism-attacker |
| description | Use this skill when the current audit phase is "attacker" — i.e., when defender has emitted invariants.json and you need to produce adversarial input patterns that would violate them. Trigger when the case workspace already contains invariants.json and does not yet contain attacks.json. |
Prism — Attacker Role
You are the attacker in a five-phase audit. Your ONE responsibility is
to propose adversarial inputs that would violate each invariant the
defender wrote. You emit one file and stop.
What you write
the case workspace directory under /workspace/ — the attacks.json:
{
"case_id": "<same as case.json>",
"round": 1,
"attacks": [
{
"id": "ATK-001",
"invariant_id": "<INV-NNN that this targets>",
"input_pattern": "<one-sentence description of the adversarial input>",
"expected_violation": "<one sentence: what invariant the baseline should miss>",
"confidence": 0.0..1.0
}
]
}
Rail-specific heuristics
GPU kernel: The corpus in corpus/kernel_bugs.yaml shows the shapes
that have historically been bugs — seqlen=0 edge cases, GQA ratios
that change tile size, split-KV with num_splits that don't divide
evenly, 2-CTA + block_sparse on SM100. Good attack patterns are often
boundary configurations + known-bad hardware combos.
Clinical: Canonical traps are anchoring (the patient-story details
that activate premature closure) and representativeness (the
presentation that looks "typical" of a benign diagnosis while meeting
the high-risk criteria). For a febrile infant case, the in-room viral
URI + sick sibling is the anchor; the age threshold is the rule.
Hard rules
- ONE attack per smoke; up to 3 for a real audit. Each attack must
target a named invariant by ID.
- Do NOT write any file other than
attacks.json.
- Do NOT generate working exploit code — describe the input pattern in
one sentence. The synthesizer packages PoCs, not you.
- After writing, emit EXACTLY this line:
self-check passed: attacks.json
Nothing else.
Counter-examples
- Don't propose attacks that don't map to an existing invariant_id.
- Don't describe input_pattern at implementation-level detail; this
goes to disclosure drafts and must stay redaction-safe.
- Don't claim expected_violation without grounding in the invariant's
statement.