with one click
benchmark
// CTF benchmark mode — meta-rules for automated benchmark runs. Routing/playbooks live in /skills/exploit/* and /skills/recon/*; this file only documents benchmark-specific conventions.
// CTF benchmark mode — meta-rules for automated benchmark runs. Routing/playbooks live in /skills/exploit/* and /skills/recon/*; this file only documents benchmark-specific conventions.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | benchmark |
| description | CTF benchmark mode — meta-rules for automated benchmark runs. Routing/playbooks live in /skills/exploit/* and /skills/recon/*; this file only documents benchmark-specific conventions. |
| allowed-tools | Bash Read Write |
| metadata | {"subdomain":"benchmark","when_to_use":"benchmark, ctf, challenge, flag capture, benchmark mode, automated evaluation","tags":"benchmark, ctf"} |
This file documents conventions specific to running automated CTF benchmark cycles. Vulnerability routing and exploit playbooks live in their own skill files, not here — the orchestrator delegates to sub-agents who load the right /skills/<area>/<vuln>.md themselves.
The middleware injects only the per-challenge context (target URL, tags, mission brief, flag format). The benchmark playbook (delegation contract, SHORT-CIRCUIT rule, OPPLAN structure) is in this file — load it before first delegation.
All other CRITICAL_RULES remain active (Plan Before Execute, RoE Compliance attacks ONLY the Target URL, No Direct Execution → delegate via task(), Kill Chain Order respects blocked_by).
OBJ-001 RECON (priority 1) — probe the target and inspect for low-hanging signals (hardcoded keys, version banners, oracle behavior).OBJ-002 INITIAL_ACCESS (priority 2, blocked_by=['OBJ-001'] — MUST reference the RECON objective ID, never itself).
Call add_objective SEQUENTIALLY (one per response). Parallel add_objective calls are rejected by middleware.RECON to the recon sub-agent first via task() — NEVER skip recon, even if the vulnerability tag seems obvious. Recon validates the oracle, captures session state, and inspects challenge source for hardcoded keys/flags.INITIAL_ACCESS to the exploit sub-agent via task(). The exploit sub-agent loads its own routing from /skills/exploit/web/SKILL.md (web vulns) or /skills/exploit/ad/SKILL.md (AD vulns) and picks the right <vuln>.md based on the challenge tags. Per-vulnerability skills live at /skills/exploit/web/<tag>.md.CRITICAL: The orchestrator has NO bash tool and MUST NOT attempt direct exploitation. If you find yourself running bash commands, you are violating the delegation contract. Delegate ALL target interaction to sub-agents via task(). The orchestrator's only tools are: OPPLAN tools, filesystem tools (read/write workspace files), load_skill, and task().
Sub-agents load their own skills (load_skill) before first bash. Routing tables, technique playbooks, and shell snippets are NOT duplicated here — they live in /skills/exploit/web/, /skills/exploit/ad/, /skills/recon/, etc.
If a sub-agent (recon or exploit) returns a verified flag matching FLAG{<64-char-hex>}:
update_objective the remaining objectives to status='passed' with the flag in notes.list_objectives/get_objective, just respond.The flag MUST appear in the orchestrator's final response text for the harness to detect it.
FLAG{<64-character-hex>}. The flag MUST appear verbatim in the agent's final message for the harness to detect it./skills/exploit/workflow.md./workspace/benchmark-<CHALLENGE_ID>/. Persist evidence (recon-report.md, exploit scripts, raw HTTP captures) so the next cycle's analysis has a paper trail./skills/exploit/web/SKILL.md routing). When primary attack class fails verification gate, pivot to secondary tags./flag, /flag.txt, /FLAG.txt, /static/flag.txt, env vars, db tables named flag*/secret*. (Detailed playbooks live in /skills/exploit/<vuln>.md.)This skill does NOT contain vulnerability routing tables, exploit shell snippets, recon playbooks, or per-tag technique guidance. Those files are the source of truth:
/skills/exploit/web/SKILL.md/skills/exploit/ad/SKILL.md/skills/recon/<sub-area>/SKILL.md<BASH_TOOLS> in every agent's system prompt/skills/shared/workflow.mdIf you find yourself adding a per-tag table, technique snippet, or routing rule HERE, you are in the wrong file. Add it to the relevant /skills/<area>/ file instead.