with one click
bughunt
// Run competitive AI bug-finding against any website using "BugHunt" — 8 browser agents + 5 Codex code audit agents, Condorcet verification, asymmetric scoring punishes false positives
// Run competitive AI bug-finding against any website using "BugHunt" — 8 browser agents + 5 Codex code audit agents, Condorcet verification, asymmetric scoring punishes false positives
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | bughunt |
| description | Run competitive AI bug-finding against any website using "BugHunt" — 8 browser agents + 5 Codex code audit agents, Condorcet verification, asymmetric scoring punishes false positives |
| triggers | ["bug check","bug check this","bughunt","bug competition","find bugs on","find bugs","QA this site","compete against","run competition","bug hunt","code audit","audit code"] |
This skill calls a bughunt binary that ships in COR-CODE source-only. Always run this check first — don't assume the binary exists, don't invoke it blind.
which bughunt
If it returns a path → tool is installed, proceed to the rest of this skill.
If it returns nothing → tool is NOT installed yet. Walk the user through install BEFORE attempting any bug hunt. Tell them:
"Bughunt isn't installed yet. The source ships in COR-CODE at
tools/bughunt/. Install steps:
cd <COR-CODE-root>/tools/bughuntnpm install(fetches deps incl. Playwright + @openai/codex-sdk + Anthropic SDK)npm run build(compiles TypeScript →dist/)npm link(symlinks thebughuntbinary to your global npm bin so it's on$PATH)- Verify:
which bughunt && bughunt --version(should print2.1.0or later)Want me to run those steps for you, or would you rather run them yourself?"
If the user accepts, run the four npm commands sequentially with output streamed. After the install, re-run which bughunt to confirm and only THEN proceed to the bug hunt.
Do NOT run the install silently without asking — npm install fetches third-party packages from the npm registry (supply-chain surface), and npm link modifies a global PATH-resolved binary. The user should know.
If npm install fails: report the exact error to the user. Common causes: Node.js version too old (need 20+), no internet, npm registry blocked. Don't try clever workarounds — surface the failure.
Pre-requisites the user also needs in env (warn before invoking if missing):
ANTHROPIC_API_KEY — for browser agents (Claude-driven)OPENAI_API_KEY — for code audit agents (Codex)Check with:
[ -n "$ANTHROPIC_API_KEY" ] && echo "✓ ANTHROPIC_API_KEY set" || echo "✗ ANTHROPIC_API_KEY missing"
[ -n "$OPENAI_API_KEY" ] && echo "✓ OPENAI_API_KEY set" || echo "✗ OPENAI_API_KEY missing"
If either is missing, tell the user before running. Don't pretend the run will work.
Code audit findings are advisory only. Every [CODE ADVISORY] item is a hypothesis produced by an AI agent that read source in isolation. Treat it as a lead to investigate, NEVER as a confirmed bug to fix.
Before acting on any finding:
confirmed / false alarm / needs closer look with evidence before proposing any fix.Codex (the engine behind these agents) regularly:
If the report says "this is a bug", the correct response is "that's a hypothesis — let me read the code and verify". Presenting findings as confirmed bugs without verification wastes time on fake issues and risks introducing real regressions chasing phantoms.
Browser bugs are different — those get Condorcet verification (+1 confirmed, -3 false positive) and are reproduced by independent agents. Code findings get no such verification. Claude Code IS the verifier.
Competitive adversarial AI bug-finding. Two attack surfaces:
Browser bugs get Condorcet verification (+1 confirmed, -3 false positive). Code audit findings are advisory — Claude Code decides what's real.
BugHunt output can be huge. When reading the report:
report.json → look at confirmedBugs array ONLY (not falsePositives)[CODE ADVISORY] prefixed items from the summaryAfter fixing bugs found by BugHunt, run it AGAIN with the same agents and settings. The scoring function (+1/-3) is the immutable eval:
1. Run bughunt → get confirmed bugs
2. Fix confirmed bugs
3. Run bughunt again (same agents, same URL) → new results
4. Confirmed bugs should DECREASE each iteration
5. If confirmed bugs increase, the fix introduced new bugs
This is Karpathy's keep/discard pattern applied to QA: fix → measure → keep or revert.
When fixing bugs found by BugHunt, apply the complexity tax:
If BugHunt returns 0 bugs, don't conclude "the site is perfect." Instead:
--max-actions 25 (agents explore deeper)BugHunt is the ideal candidate for scheduled autonomous runs:
Project Claude → inject agent with initialPrompt:
"Run bughunt against {deployed_url} with all 8 browser agents + code audit.
Store confirmed bugs in session-memory. Write summary to .claude/handoffs/qa-{date}.md.
Delete this file when complete."
Schedule weekly or after every deployment via N8N webhook.
Source distributed in COR-CODE at tools/bughunt/. Install once:
cd tools/bughunt
npm install
npm run build
npm link # makes `bughunt` available globally on $PATH
After install: binary bughunt is globally linked. Verify with which bughunt && bughunt --version.
For full install + wiring details (including alternative install without npm link), see tools/bughunt/README.md.
bughunt <url> --no-verify
bughunt <url> --code /path/to/project --no-verify
bughunt <url> --agents "CodeSecurity,CodeLogic,CodeA11y" --code /path/to/project --no-verify
bughunt <url> --code /path/to/project --budget 10.00
| Agent | Specialist Domain |
|---|---|
| Navigator | Links, routing, 404s, redirects, sitemap |
| Responsive | Viewport breakpoints, mobile layout, touch targets |
| Forms | Input validation, error messages, submission |
| Accessibility | WCAG 2.2 AA, screen reader, keyboard nav, ARIA |
| Visual | Layout breaks, z-index, overflow, alignment |
| Performance | Load times, Core Web Vitals, console errors |
| Content | Broken images, alt text, SEO, headings, placeholder text |
| EdgeCase | JS errors, rapid interactions, special characters |
| Agent | Specialist Domain |
|---|---|
| CodeSecurity | Hardcoded secrets, injection, auth bypass, SSRF |
| CodeLogic | Race conditions, null handling, off-by-one, state bugs |
| CodeA11y | Missing alt/ARIA in JSX, contrast from hex values, heading hierarchy |
| CodePerformance | Memory leaks, N+1 queries, missing cleanup, bundle bloat |
| CodeQuality | Dead imports, broken references, type assertion abuse |
Code audit agents run in read-only sandbox — they can never modify files. Their findings are prefixed [CODE ADVISORY] and require Claude Code to evaluate before acting.
| Flag | Purpose | Default |
|---|---|---|
--agents <names> | Comma-separated agent names | all browser (+ code if --code set) |
--code <dir> | Project directory for Codex code audit | none |
--codex-model <model> | Model for Codex agents | codex default |
--no-verify | Skip Condorcet verification (quick mode) | verify on |
--budget <usd> | Max API spend in USD | 10.00 |
--max-actions <n> | Playwright actions per agent | 15 |
--parallel <n> | Concurrent agents | 3 |
--target-score <n> | Score to win | 20 |
-v, --verbose | Show agent reasoning | off |
-o, --output <dir> | Report output directory | ./bughunt-report |
--agent-model <model> | Model for browser hunting (cheaper) | auto |
--verifier-model <model> | Model for verification (stronger) | auto |
Browser agents: BYOK via OPENAI_API_KEY or ANTHROPIC_API_KEY (or --api-key)
Code audit agents: Uses Codex CLI's own auth (already configured via codex login)
Two report files in the output directory:
report.json — structured data (scores, bugs, evidence, stats, cost)report.html — self-contained dark-theme HTML report with scoreboard and bug cardsEvidence screenshots saved to <output>/evidence/<agent-name>/.
Code audit findings marked with [CODE ADVISORY] prefix and file:// URLs.
User: "bug check example.com"
Claude: bughunt https://example.com --agents "Navigator,Accessibility,Content,Visual" --no-verify -o /tmp/bughunt-report
Claude: Reads report, presents findings, investigates real bugs
User: "bug check example.com and audit the code"
Claude: bughunt https://example.com --code /path/to/website --agents "Accessibility,Content,CodeSecurity,CodeA11y" --no-verify
Claude: Reads report, evaluates code advisories, fixes confirmed issues
User: "audit the code for security issues"
Claude: bughunt https://example.com --code . --agents "CodeSecurity,CodeLogic" --no-verify
Claude: Reviews advisories, decides which are real, fixes what matters
| Configuration | Typical Cost | Duration |
|---|---|---|
| 2 browser agents, no verify | $0.005-0.01 | 25-60s |
| 4 browser agents, no verify | $0.01-0.03 | 1-2 min |
| 8 browser + verify | $2-8 | 10-30 min |
| 1 code audit agent | Codex billing (~1.9M tokens) | 5-10 min |
| 5 code audit agents | Codex billing | 10-20 min |
| Browser + code audit combined | Mixed billing | 10-25 min |
If the tool needs updating:
cd ~/.claude/tools/bughunt
# Make changes to src/
npm run build
# That's it — npm link means the global binary points here