Run any Skill in Manus with one click

$pwd:

bughunt

Name: Bughunt
Author: WebSmartTeam

// Run competitive AI bug-finding against any website using "BugHunt" — 8 browser agents + 5 Codex code audit agents, Condorcet verification, asymmetric scoring punishes false positives

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:0

updated:May 6, 2026 at 07:34

SKILL.md

readonly

package.json

"author": "WebSmartTeam"

"repository": "WebSmartTeam/COR-CODE"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	bughunt
description	Run competitive AI bug-finding against any website using "BugHunt" — 8 browser agents + 5 Codex code audit agents, Condorcet verification, asymmetric scoring punishes false positives
triggers	["bug check","bug check this","bughunt","bug competition","find bugs on","find bugs","QA this site","compete against","run competition","bug hunt","code audit","audit code"]

BugHunt

🛠️ FIRST-RUN — check install BEFORE invoking anything

This skill calls a bughunt binary that ships in COR-CODE source-only. Always run this check first — don't assume the binary exists, don't invoke it blind.

which bughunt

If it returns a path → tool is installed, proceed to the rest of this skill.

If it returns nothing → tool is NOT installed yet. Walk the user through install BEFORE attempting any bug hunt. Tell them:

"Bughunt isn't installed yet. The source ships in COR-CODE at tools/bughunt/. Install steps:

cd <COR-CODE-root>/tools/bughunt

npm install (fetches deps incl. Playwright + @openai/codex-sdk + Anthropic SDK)

npm run build (compiles TypeScript → dist/)

npm link (symlinks the bughunt binary to your global npm bin so it's on $PATH)

Verify: which bughunt && bughunt --version (should print 2.1.0 or later)

Want me to run those steps for you, or would you rather run them yourself?"

If the user accepts, run the four npm commands sequentially with output streamed. After the install, re-run which bughunt to confirm and only THEN proceed to the bug hunt.

Do NOT run the install silently without asking — npm install fetches third-party packages from the npm registry (supply-chain surface), and npm link modifies a global PATH-resolved binary. The user should know.

If npm install fails: report the exact error to the user. Common causes: Node.js version too old (need 20+), no internet, npm registry blocked. Don't try clever workarounds — surface the failure.

Pre-requisites the user also needs in env (warn before invoking if missing):

ANTHROPIC_API_KEY — for browser agents (Claude-driven)
OPENAI_API_KEY — for code audit agents (Codex)

Check with:

[ -n "$ANTHROPIC_API_KEY" ] && echo "✓ ANTHROPIC_API_KEY set" || echo "✗ ANTHROPIC_API_KEY missing"
[ -n "$OPENAI_API_KEY" ] && echo "✓ OPENAI_API_KEY set" || echo "✗ OPENAI_API_KEY missing"

If either is missing, tell the user before running. Don't pretend the run will work.

🚨 CORE PRINCIPLE — ADVISORIES ARE HYPOTHESES, NOT BUGS

Code audit findings are advisory only. Every [CODE ADVISORY] item is a hypothesis produced by an AI agent that read source in isolation. Treat it as a lead to investigate, NEVER as a confirmed bug to fix.

Before acting on any finding:

Read the actual code at the cited file + line. Do not trust the embedded snippet in the report — it may be paraphrased or partial.
Verify in context. Does the bug actually exist as described? Does a guard elsewhere catch it? Does a caller pattern make it unreachable?
Mark each as confirmed / false alarm / needs closer look with evidence before proposing any fix.
Only then propose a change, and only with the user's approval.

Codex (the engine behind these agents) regularly:

Flags patterns that look wrong in isolation but are fine in context
Misses guardrails that catch the issue elsewhere in the call chain
Invents callers / flows that don't exist in the real codebase
Gets Rust lifetime/ownership subtleties wrong
Reads partial code and extrapolates

If the report says "this is a bug", the correct response is "that's a hypothesis — let me read the code and verify". Presenting findings as confirmed bugs without verification wastes time on fake issues and risks introducing real regressions chasing phantoms.

Browser bugs are different — those get Condorcet verification (+1 confirmed, -3 false positive) and are reproduced by independent agents. Code findings get no such verification. Claude Code IS the verifier.

Competitive adversarial AI bug-finding. Two attack surfaces:

Browser agents (8) — Playwright-driven, test the rendered site
Code audit agents (5) — Codex headless, audit the source code (ADVISORY ONLY — see principle above)

Browser bugs get Condorcet verification (+1 confirmed, -3 false positive). Code audit findings are advisory — Claude Code decides what's real.

Karpathy Principles Applied

Context Hygiene (CRITICAL)

BugHunt output can be huge. When reading the report:

Read report.json → look at confirmedBugs array ONLY (not falsePositives)
For each confirmed bug: read title, severity, detail, evidence path
DO NOT dump the full report into context — extract the signal
For code advisories: read only the [CODE ADVISORY] prefixed items from the summary
Evidence screenshots: only read if you need to verify — don't load all of them

Eval-Locked Loop for Regression

After fixing bugs found by BugHunt, run it AGAIN with the same agents and settings. The scoring function (+1/-3) is the immutable eval:

1. Run bughunt → get confirmed bugs
2. Fix confirmed bugs
3. Run bughunt again (same agents, same URL) → new results
4. Confirmed bugs should DECREASE each iteration
5. If confirmed bugs increase, the fix introduced new bugs

This is Karpathy's keep/discard pattern applied to QA: fix → measure → keep or revert.

The Simplicity Criterion for Fixes

When fixing bugs found by BugHunt, apply the complexity tax:

A fix that resolves 1 bug but adds 30 lines of workaround? Questionable.
A fix that resolves 1 bug by simplifying the existing code? Always keep.
A fix that resolves 0 bugs but removes dead code that was causing confusion? Keep.

Recovery When Agents Find Nothing

If BugHunt returns 0 bugs, don't conclude "the site is perfect." Instead:

Run with different agents (swap Accessibility for EdgeCase, add Performance)
Try with --max-actions 25 (agents explore deeper)
Check if the URL is actually reachable and rendered correctly
Run code audit agents separately — browser agents can't find logic bugs

Overnight QA Pattern (Fleet Integration)

BugHunt is the ideal candidate for scheduled autonomous runs:

Project Claude → inject agent with initialPrompt:
"Run bughunt against {deployed_url} with all 8 browser agents + code audit.
 Store confirmed bugs in session-memory. Write summary to .claude/handoffs/qa-{date}.md.
 Delete this file when complete."

Schedule weekly or after every deployment via N8N webhook.

Location

Source distributed in COR-CODE at tools/bughunt/. Install once:

cd tools/bughunt
npm install
npm run build
npm link              # makes `bughunt` available globally on $PATH

After install: binary bughunt is globally linked. Verify with which bughunt && bughunt --version.

For full install + wiring details (including alternative install without npm link), see tools/bughunt/README.md.

How to Run

Browser-only (quick mode)

bughunt <url> --no-verify

Browser + code audit

bughunt <url> --code /path/to/project --no-verify

Code audit only (no browser)

bughunt <url> --agents "CodeSecurity,CodeLogic,CodeA11y" --code /path/to/project --no-verify

Full competition (all agents + verification)

bughunt <url> --code /path/to/project --budget 10.00

Available Agents

Browser Agents (Playwright)

Agent	Specialist Domain
Navigator	Links, routing, 404s, redirects, sitemap
Responsive	Viewport breakpoints, mobile layout, touch targets
Forms	Input validation, error messages, submission
Accessibility	WCAG 2.2 AA, screen reader, keyboard nav, ARIA
Visual	Layout breaks, z-index, overflow, alignment
Performance	Load times, Core Web Vitals, console errors
Content	Broken images, alt text, SEO, headings, placeholder text
EdgeCase	JS errors, rapid interactions, special characters

Code Audit Agents (Codex headless — ADVISORY ONLY)

Agent	Specialist Domain
CodeSecurity	Hardcoded secrets, injection, auth bypass, SSRF
CodeLogic	Race conditions, null handling, off-by-one, state bugs
CodeA11y	Missing alt/ARIA in JSX, contrast from hex values, heading hierarchy
CodePerformance	Memory leaks, N+1 queries, missing cleanup, bundle bloat
CodeQuality	Dead imports, broken references, type assertion abuse

Code audit agents run in read-only sandbox — they can never modify files. Their findings are prefixed [CODE ADVISORY] and require Claude Code to evaluate before acting.

Key Flags

Flag	Purpose	Default
`--agents <names>`	Comma-separated agent names	all browser (+ code if --code set)
`--code <dir>`	Project directory for Codex code audit	none
`--codex-model <model>`	Model for Codex agents	codex default
`--no-verify`	Skip Condorcet verification (quick mode)	verify on
`--budget <usd>`	Max API spend in USD	10.00
`--max-actions <n>`	Playwright actions per agent	15
`--parallel <n>`	Concurrent agents	3
`--target-score <n>`	Score to win	20
`-v, --verbose`	Show agent reasoning	off
`-o, --output <dir>`	Report output directory	./bughunt-report
`--agent-model <model>`	Model for browser hunting (cheaper)	auto
`--verifier-model <model>`	Model for verification (stronger)	auto

API Keys

Browser agents: BYOK via OPENAI_API_KEY or ANTHROPIC_API_KEY (or --api-key) Code audit agents: Uses Codex CLI's own auth (already configured via codex login)

Output

Two report files in the output directory:

report.json — structured data (scores, bugs, evidence, stats, cost)
report.html — self-contained dark-theme HTML report with scoreboard and bug cards

Evidence screenshots saved to <output>/evidence/<agent-name>/. Code audit findings marked with [CODE ADVISORY] prefix and file:// URLs.

Scoring System

+1 confirmed bug (2/3 Condorcet verifiers reproduce it)
-3 false positive (verifiers cannot reproduce)
Time-decay bonus: Second independent finder gets half points (+0.5 / -1.5). After verification, bug class is locked (0 points).
Code audit findings: advisory only, not scored competitively
First agent to target score wins.

Typical Usage from Claude

QA a deployed site

User: "bug check example.com"
Claude: bughunt https://example.com --agents "Navigator,Accessibility,Content,Visual" --no-verify -o /tmp/bughunt-report
Claude: Reads report, presents findings, investigates real bugs

QA site + audit codebase

User: "bug check example.com and audit the code"
Claude: bughunt https://example.com --code /path/to/website --agents "Accessibility,Content,CodeSecurity,CodeA11y" --no-verify
Claude: Reads report, evaluates code advisories, fixes confirmed issues

Code audit only

User: "audit the code for security issues"
Claude: bughunt https://example.com --code . --agents "CodeSecurity,CodeLogic" --no-verify
Claude: Reviews advisories, decides which are real, fixes what matters

Cost Expectations

Configuration	Typical Cost	Duration
2 browser agents, no verify	$0.005-0.01	25-60s
4 browser agents, no verify	$0.01-0.03	1-2 min
8 browser + verify	$2-8	10-30 min
1 code audit agent	Codex billing (~1.9M tokens)	5-10 min
5 code audit agents	Codex billing	10-20 min
Browser + code audit combined	Mixed billing	10-25 min

Rebuilding

If the tool needs updating:

cd ~/.claude/tools/bughunt
# Make changes to src/
npm run build
# That's it — npm link means the global binary points here

name	bughunt
description	Run competitive AI bug-finding against any website using "BugHunt" — 8 browser agents + 5 Codex code audit agents, Condorcet verification, asymmetric scoring punishes false positives
triggers	["bug check","bug check this","bughunt","bug competition","find bugs on","find bugs","QA this site","compete against","run competition","bug hunt","code audit","audit code"]