Run any Skill in Manus with one click

performing-ai-assisted-vulnerability-discovery

Using LLMs to accelerate vulnerability research and pentest workflows — generating syntax-valid fuzzing seeds and evolving grammars, fine-tuned mutation dictionaries, parallel agent-based proof-of-vulnerability generation, and evidence-driven passive analysis of real HTTP traffic via the Burp MCP server. Covers concrete prompts, AFL++/ libFuzzer wiring, and Burp+Codex/Gemini/Ollama MCP setup.

Run Skill in Manus

Stars599

Forks104

UpdatedJune 6, 2026 at 16:41

Source

xalgord

xalgord/xalgorix

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

SKILL.md

readonly

More from this repository

same repository

detecting-ai-model-prompt-injection-attacks

xalgord/xalgorix

Detects prompt injection attacks targeting LLM-based applications using a multi-layered defense combining regex pattern matching for known attack signatures, heuristic scoring for structural anomalies, and transformer-based classification with DeBERTa models. The detector analyzes user inputs before they reach the LLM, flagging direct injections (system prompt overrides, role-play escapes, instruction hijacking) and indirect injections (encoded payloads, multi-language obfuscation, delimiter-based escapes). Based on the OWASP LLM Top 10 (LLM01:2025 Prompt Injection) and Simon Willison's prompt injection taxonomy. Activates for requests involving prompt injection detection, LLM input sanitization, AI security scanning, or prompt attack classification.

2026-06-06599

exploiting-ai-model-file-rce

xalgord/xalgorix

Testing machine-learning model files and model-loading services for remote code execution caused by insecure deserialization (pickle/PyTorch), unsafe config instantiation (Hydra), archive path traversal, and dangerous layer types during authorized penetration tests of AI/ML pipelines.

2026-06-06599

implementing-llm-guardrails-for-security

xalgord/xalgorix

Implements input and output validation guardrails for LLM-powered applications to prevent prompt injection, data leakage, toxic content generation, and hallucinated outputs. Builds a security validation pipeline using NVIDIA NeMo Guardrails Colang definitions, custom Python validators for PII detection and content policy enforcement, and the Guardrails AI framework for structured output validation. The guardrails system intercepts both user inputs (blocking injection attempts, stripping PII, enforcing topic boundaries) and model outputs (detecting hallucinations, filtering toxic content, validating JSON schema compliance). Activates for requests involving LLM output validation, AI content filtering, guardrail implementation, or LLM safety enforcement.

2026-06-06599

testing-llm-prompt-injection-and-jailbreaks

xalgord/xalgorix

Testing LLM-backed applications, chatbots, and AI agents for direct and indirect prompt injection, jailbreaks, system-prompt leakage, and tool/agent abuse during authorized penetration tests, using structured payload families and reliable confirmation signals.

2026-06-06599

testing-mcp-server-security

xalgord/xalgorix

Testing Model Context Protocol (MCP) servers and the clients that consume them for tool poisoning, prompt injection via tool descriptions/outputs, over-permissioned and local-credential-stealing tools, config/trust bypasses, and unauthenticated RCE during authorized penetration tests of AI agent infrastructure.

2026-06-06599

detecting-api-enumeration-attacks

xalgord/xalgorix

Detect and prevent API enumeration attacks including BOLA and IDOR exploitation by monitoring sequential identifier access patterns and authorization failures.

2026-06-06599

name	performing-ai-assisted-vulnerability-discovery
description	Using LLMs to accelerate vulnerability research and pentest workflows — generating syntax-valid fuzzing seeds and evolving grammars, fine-tuned mutation dictionaries, parallel agent-based proof-of-vulnerability generation, and evidence-driven passive analysis of real HTTP traffic via the Burp MCP server. Covers concrete prompts, AFL++/ libFuzzer wiring, and Burp+Codex/Gemini/Ollama MCP setup.
domain	cybersecurity
subdomain	ai-security
tags	["penetration-testing","ai-security","fuzzing","vulnerability-research","burp-mcp"]
version	1.0
author	xalgorix
license	Apache-2.0

Performing AI-Assisted Vulnerability Discovery

When to Use

During authorized vulnerability research where complex input formats (SQL, URLs, custom/binary protocols) stall a blind fuzzer
When bootstrapping a coverage-guided fuzzer (AFL++, libFuzzer, Honggfuzz) that needs syntax-valid, security-relevant seeds
When you have crash candidates and need to scale proof-of-vulnerability (PoV) generation across many agents/models
When triaging large volumes of real Burp HTTP traffic and want evidence-driven passive analysis + report drafting
When working under cost/time budgets (bug-bounty, CTF, AIxCC-style cyber reasoning systems)

Critical: Techniques Most Often Missed

Teams either ignore LLMs entirely or paste code and hope. The high-value patterns are about feeding the model coverage feedback and keeping the human/Burp as the source of truth.

1. LLM seed generation for semantic validity (deeper coverage early)

SYSTEM: You are a helpful security engineer.
USER: Write a Python3 program that prints 200 unique SQL injection strings targeting common
anti-pattern mistakes (missing quotes, numeric context, stacked queries). Ensure length <= 256
bytes/string so they survive common length limits.

python3 gen_sqli_seeds.py > seeds.txt
afl-fuzz -i seeds.txt -o findings/ -- ./target @@

Ask for a single self-contained script and tell it to diversify encoding (UTF-8, URL-encoded, UTF-16-LE).

2. Coverage-feedback grammar evolution ("Grammar Guy")

The previous grammar triggered 12 % of the program edges. Functions not reached: parse_auth,
handle_upload. Add / modify rules to cover these.

for epoch in range(MAX_EPOCHS):
    grammar = llm.refine(grammar, feedback=coverage_stats)   # use diff+patch, not full rewrite
    save(grammar, f"grammar_{epoch}.txt")
    coverage_stats = run_fuzzer(grammar)                     # stop when Δcoverage < ε

3. Fine-tuned mutation dictionary for memory-safety bugs

# AFL_CUSTOM_MUTATOR dictionary entries suggested by a model fine-tuned on vuln patterns
{"pattern":"%99999999s"}
{"pattern":"AAAAAAAA....<1024>....%n"}

Prompt: "Give mutation dictionary entries likely to break memory safety in function X." Empirically >2× faster time-to-crash.

4. Parallel agent-based PoV generation

Spawn many lightweight agents (different models/temperatures); each reproduces the crash with gdb, proposes a minimal payload, validates it in a sandbox, and re-queues failures as new fuzz seeds.

How to CONFIRM a hit (avoid false negatives / hallucinations)

Deterministic PoV: the model's claimed bug must reproduce — feed the exact input to the target under gdb/ASan and confirm the same crash PC / sanitizer message. No reproduction = not a finding.
Coverage delta: a new grammar/seed set is "working" only if edges/blocks hit actually increase; measure, don't trust the prompt.
Evidence-bound (Burp MCP): every reported web finding must cite the real request/response in Burp — the model is for analysis/reporting, not blind scanning. Re-check the raw traffic.
Treat all LLM output as untrusted hypotheses; validate before submitting (wrong patches/PoVs cost points/credibility).

Workflow

Step 1: Generate and load seeds

python3 gen_sqli_seeds.py > seeds.txt          # or XSS/path-traversal/binary-blob variants
afl-fuzz -i seeds.txt -o findings/ -- ./target @@

Step 2: Evolve a grammar against coverage

1. Prompt the model for an initial ANTLR/Peach/libFuzzer grammar.
2. Fuzz N minutes; collect edges/blocks hit.
3. Summarize uncovered functions, feed back, ask for diff/patch rules.
4. Merge, re-fuzz, repeat until Δcoverage < ε (mind the token budget).

Step 3: Add a fine-tuned custom mutator

Run static analysis -> function list + AST.
Prompt fine-tuned model for mutation-dictionary tokens per risky function (sprintf wrappers, etc.).
Wire tokens into AFL_CUSTOM_MUTATOR.

Step 4: Scale PoV generation and triage

Static/dynamic analysis -> bug candidates (crash PC, input slice, sanitizer msg).
Orchestrator -> N agents: reproduce (gdb), propose payload, validate in sandbox, submit on success.
Failed attempts re-queue as coverage seeds (feedback loop).

Step 5: Multi-bug super-patch (optional, scoring-aware)

Here are 10 stack traces + file snippets. Identify the shared mistake and generate a unified diff
fixing all occurrences.

Interleave confirmed (PoV-validated) and speculative patches at a tuned ratio (e.g. 2 speculative : 1 confirmed).

Step 6: Evidence-driven web analysis with Burp MCP

# Install the Burp MCP Server BApp (listens on 127.0.0.1:9876), extract the proxy JAR, point a client at the SSE endpoint:
cat > ~/.codex/config.toml <<'EOF'
[mcp_servers.burp]
command = "java"
args = ["-jar", "/absolute/path/to/mcp-proxy.jar", "--sse-url", "http://127.0.0.1:19876"]
EOF
codex      # then run /mcp to verify the Burp tools list

If the MCP handshake fails on strict Origin/header checks, front it with a local Caddy reverse proxy that pins Host/Origin to 127.0.0.1:9876 and strips User-Agent/Accept/Accept-Encoding/Connection (which trigger Burp's 403 during SSE init):

brew install caddy
caddy run --config ~/burp-mcp/Caddyfile &

Step 7: Run evidence-focused analysis prompts (burp-mcp-agents)

passive_hunter.md  broad passive surfacing | idor_hunter.md  IDOR/BOLA/tenant drift
auth_flow_mapper.md  auth vs unauth path diff | ssrf_redirect_hunter.md  SSRF/open-redirect
logic_flaw_hunter.md  multi-step logic flaws | report_writer.md  evidence-focused reporting

Prefer local models (Ollama: deepseek-r1:14b ~16GB, gpt-oss:20b ~20GB) when traffic holds secrets; share only the minimum evidence per finding. Tag your traffic so it is auditable:

Match:   ^User-Agent: (.*)$
Replace: User-Agent: $1 BugBounty-Username

Key Concepts

Concept	Description
LLM seed generation	Model emits syntax-valid, security-relevant inputs so the fuzzer reaches deep branches early
Grammar evolution	Iteratively refine an input grammar using coverage feedback (Grammar Guy pattern)
Custom mutator dict	Fine-tuned model supplies tokens (`%n`, oversized `%s`) that break memory safety faster
Agent-based PoV	Many parallel LLM agents reproduce/validate crashes; failures recycle as new seeds
Super-patch	One unified diff that fixes a root cause shared across multiple crashes
Evidence-driven review	Burp stays source of truth; the LLM reasons over real requests/responses, no blind scanning
Privacy mode	Local backends / redaction prevent leaking cookies/PII to cloud models

Tools & Systems

Tool	Purpose
AFL++ / libFuzzer / Honggfuzz	Coverage-guided fuzzers consuming LLM seeds, grammars, and custom mutators
LLM (GPT/Claude/Mixtral/Llama)	Seed/grammar generation, mutation dicts, PoV reasoning, patch synthesis
Burp MCP Server (BApp)	Exposes intercepted HTTP(S) traffic to MCP clients on `127.0.0.1:9876`
mcp-proxy.jar + Caddy	Bridge stdio↔SSE and normalize headers for the strict MCP handshake
Codex / Gemini CLI / Ollama	MCP clients/backends (cloud or local) for traffic analysis
burp-mcp-agents	Prompt pack (passive/idor/ssrf/logic/report hunters) + launcher helpers
Burp AI Agent	Couples local/cloud LLMs with passive/active analysis and 53+ MCP tools

Common Scenarios

Scenario 1: Stalled parser fuzzing

A binary protocol parser shows flat coverage. LLM-generated syntax-valid seeds plus a coverage-evolved grammar push edges from 12% upward and surface a crash in handle_upload.

Scenario 2: Crash-to-PoV at scale

Dozens of ASan crashes need PoVs under a deadline. Parallel agents reproduce each in gdb, generate minimal payloads, and validate in a sandbox, recycling misses as seeds.

Scenario 3: Passive bug bounty triage

Hundreds of Burp requests are analyzed via the idor_hunter.md and ssrf_redirect_hunter.md prompts through a local Ollama model, flagging object-ID drift backed by real request/response evidence.

Scenario 4: Sensitive-data engagement

Traffic contains session cookies/PII, so a local deepseek-r1:14b backend with STRICT privacy mode is used, sharing only minimal evidence and keeping an integrity-hashed audit log.

Output Format

## AI-Assisted Discovery Finding

**Technique**: LLM-assisted fuzzing / evidence-driven Burp MCP analysis
**Severity**: Per confirmed vulnerability (set after PoV reproduction)
**Target**: <binary/function or HTTP endpoint>

### Method
- Seeds/grammar: <prompt + coverage delta achieved, e.g. 12% -> 41% edges>
- PoV: <agent/model that reproduced; gdb crash PC + sanitizer message>
- Burp evidence: <request/response IDs cited from Burp history>

### Validation
| Check | Result |
|-------|--------|
| Deterministic reproduction | yes (ASan heap-buffer-overflow @ parse_auth) |
| Coverage increase measured | +29% edges |
| Evidence cited from Burp | req #482 / resp #482 |

### Recommendation
1. Fix the confirmed root cause; consider the super-patch diff if multiple crashes share it
2. Add the generated seeds/grammar to regression fuzzing CI
3. Keep cloud LLM usage in privacy/redaction mode; prefer local models for sensitive traffic; require PoV reproduction before reporting