| name | security-audit |
| description | Use when performing security vulnerability assessment (OWASP, secrets, dependencies, IaC, LLM, API, MCP/agentic) or when "thinking like a hacker" to find exploits. |
| tier | 2 |
| version | 3.6 |
Security Audit v3.6
0. Methodology — Two Layers (audit-067 C-10)
This skill runs two complementary layers; neither substitutes for the other:
- Deterministic floor (§2) — regex patterns + external scanners. Reproducible, cheap, CI-gateable (
--fail-on), zero judgment. It is a floor, not the audit: a clean scan is not clearance, because regex is categorically blind to semantic classes — business-logic authz, cross-file taint flows, semantic MCP tool-description poisoning (§3 Agentic limitation note).
- LLM semantic pass (§3–§4) — long-context taint/logic review against the checklists plus threat modeling: the layer that catches what regex cannot. Frontier evidence that LLM-driven semantic review finds real vulnerabilities beyond pattern matching: DARPA AIxCC finals (2025), Google Big Sleep, Codex Security / Claude Code Security review agents (full citations: audit-067 §Bibliography).
Licensing footnote (semgrep): the open-source offering is Semgrep CE since the Dec 2024 licensing change; Opengrep (community fork, Jan 2025) is a drop-in alternative where CE constraints matter. The external-tool roster treats either as the same scanner slot.
1. Red Flags (Anti-Rationalization)
STOP and READ THIS if you are thinking:
- "I'll skip the script because I just checked the code manually" -> WRONG. Humans miss regex patterns. EXECUTE the script.
- "This is an internal tool, so AuthZ doesn't matter" -> WRONG. Zero Trust applies everywhere.
- "Dependencies are probably fine" -> WRONG. Supply chain attacks are the #1 vector.
- "I don't have time for a full audit" -> WRONG. Breach cleanup takes 100x longer.
- "The LLM output is safe to use directly" -> WRONG. LLM output is untrusted input. Sanitize it.
2. Automated Detection
EXECUTE the unified audit script to detect vulnerabilities:
python3 .agent/skills/security-audit/scripts/run_audit.py [project_path] \
[--scan-type all|deps|secrets|patterns|config|iac|mcp|sbom|external] \
[--fail-on critical|high|medium] [--output json|summary] \
[--no-limit] [--max-size MB]
- Analysis: Review the output. If tools fail or report Critical/High issues, they are BLOCKERS.
- Scope: The script checks:
- Secrets (OWASP A04:2025 Cryptographic Failures, CWE-798) — 30+ patterns including cloud, AI, SaaS keys + entropy detection
- Dependencies / Supply Chain (OWASP A03:2025, CWE-1104) — real lock files only (Pipfile.lock/poetry.lock/uv.lock/pdm.lock for Python; package-lock/yarn.lock/pnpm-lock for JS; Cargo.lock; go.sum), npm audit
- Code Patterns / Injection (OWASP A05:2025, CWE-79/89/78) — eval, XSS, SQLi, SSTI, SSRF, path traversal, prototype pollution, deserialization
- Smart Contract / Solidity — reentrancy, delegatecall, selfdestruct (EIP-6780), tx.origin, oracle manipulation, unchecked returns, unprotected initializers
- Rust —
unsafe{}, transmute, mem::forget, unwrap_unchecked, weak RNG
- Go —
math/rand for security, SQL concat, missing TLS, command injection
- GraphQL — introspection/playground enabled in prod, missing depth limits
- Config / Misconfiguration (OWASP A02:2025, CWE-16) — debug mode, CORS, headers
- IaC / Containers — Docker, Kubernetes, Terraform, CloudFormation patterns
- MCP / Agentic (OWASP ASI Top 10 2026) — MCP config provenance (
mcp.json, .mcp.json, claude_desktop_config.json, incl. .vscode/), auto-approve keys, permission-bypass flags, unpinned npx -y/uvx servers, mcp-remote, cleartext MCP URLs, inline env secrets, shell-spawning servers, tool-description poisoning heuristics — all findings CWE+ASI tagged
- SBOM — recursive Software Bill of Materials presence check (honors SKIP_DIRS)
- External Tools (when
--scan-type all or --scan-type external): Auto-runs semgrep --config auto, gitleaks (or trufflehog fallback), slither, bandit, pip-audit, npm audit, cargo audit, govulncheck, gosec, checkov, trivy if detected; snyk-agent-scan (ex-Invariant mcp-scan) when MCP config artifacts are detected — never with auto-start flags (servers stay consent-gated).
--scan-type external runs ONLY external tools and SKIPS the in-process regex scans. Use --scan-type all (default) to run both.
- CI/CD Gate: Use
--fail-on critical to exit with code 1 in CI pipelines.
--max-size MB: default 15 MB per file. Increase for large minified bundles (vendor.js/bundle.js can be 20+ MB).
- ReDoS guard: lines longer than 4000 chars are skipped during pattern scanning (prevents catastrophic backtracking on minified code).
- Self-Exclusion: The scanner skips its own source files to prevent false positives.
- CWE Mapping: All findings include CWE identifiers for compliance integration.
- Known Limitation: The scanner uses regex-only (no AST parsing). It WILL match patterns inside comments, docstrings, and string literals. This is a deliberate trade-off: false positives on comments are preferable to false negatives on real vulnerabilities. Always manually verify findings before acting.
3. "Think Like a Hacker" (Adversarial Review)
Refuse to merge/approve until you have manually verified the code against the relevant checklist.
Smart Contracts (Solidity)
MANDATORY: Read references/checklists/solidity_security.md.
Top Checks:
- Reentrancy: Are checks-effects-interactions followed?
nonReentrant used?
- Access Control: Who owns the contract?
onlyOwner checks? Two-step transfer?
- Price Manipulation: Are spot prices used? (Use Oracles + TWAP).
- EIP-6780:
selfdestruct semantics changed post-Dencun — accounted for?
- ERC-4337: Account Abstraction validation and paymaster checks.
- Fuzzing: See
references/checklists/fuzzing_invariants.md.
Smart Contracts (Solana/Rust)
MANDATORY: Read references/checklists/solana_security.md.
Top Checks:
- Account Validation: Are ALL accounts checked for ownership and signer status?
- PDA bumps: Are bumps strictly validated (canonical bump)?
- Arithmetic: Is
overflow_checks on? Using checked_* methods?
- Token-2022: Are transfer hooks, fees, and extensions handled correctly?
- CPI Guard: Is CPI Guard used where appropriate?
Web/API (OWASP Top 10:2025)
MANDATORY: Read references/checklists/owasp_top_10.md (2025 final taxonomy — A-numbers changed vs 2021; mapping table at file end).
Top Checks:
- Broken Access Control (A01): Can user A access user B's data? (IDOR; SSRF — absorbed into A01 in 2025: validate user-supplied URLs against an allowlist).
- Software Supply Chain (A03): Lock files committed? Versions pinned? SCA + dependency audit clean?
- Injection (A05): Are queries parameterized? Is output escaped?
- Exceptional Conditions (A10): Do security controls fail closed? No stack traces to users?
API Security (OWASP API Top 10:2023)
MANDATORY: Read references/checklists/api_security.md.
Top Checks:
- BOLA (API1): Object-level authorization on every endpoint?
- BOPLA (API3): Mass assignment prevention? Excessive data exposure?
- Rate Limiting (API4): Per-user, per-endpoint rate limits?
AI/LLM Applications (OWASP LLM Top 10 v2.0)
MANDATORY: Read references/checklists/llm_security.md.
Top Checks:
- Prompt Injection (LLM01): Can user input override system prompts?
- Insecure Output Handling (LLM02): Is LLM output sanitized before use in HTML/SQL/shell?
- Excessive Agency (LLM06): Does the agent have minimal permissions? Human-in-the-loop for destructive actions?
- Supply Chain (LLM05): Are models from trusted sources? Plugins verified?
Agentic / MCP (OWASP ASI Top 10 2026 + NSA MCP CSI)
MANDATORY: Read references/checklists/mcp_agentic_security.md.
Top Checks:
- Goal Hijack (ASI01): Can untrusted content (tool outputs, RAG docs, web pages) reach the agent as instructions?
- Tool Poisoning / Rug Pull (ASI01/ASI04): Are tool descriptions clean of steering language? Are definitions pinned + provenance-verified (post-approval mutation = rug pull)?
- Excessive Agency (ASI03): Auto-approve disabled for destructive tools? Least-privilege token per tool/action? No confused-deputy / token passthrough?
- Supply Chain (ASI04): Every MCP server version-pinned (no
npx -y pkg / @latest), from a trusted registry?
Limitation (honest floor): the regex layer (--scan-type mcp) only catches crude markers. Semantic tool-description poisoning, rug-pull dynamics, and toxic flow composition require LLM/manual review — a clean scan is not clearance for those classes.
4. Threat Modeling
Before declaring "Secure", perform lightweight threat modeling:
MANDATORY: Read references/checklists/threat_model.md.
- STRIDE Analysis: Evaluate each component for Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Elevation of Privilege.
- DREAD Scoring: Rate each threat by Damage, Reproducibility, Exploitability, Affected Users, Discoverability.
- Attack Surface Mapping:
- Entry Points: APIs, forms, file uploads, webhooks, LLM interfaces.
- Data Flows: Where does user input go? (Logs? DB? Shell? LLM prompt?).
- Assets: Secrets, PII, Money, Model weights.
- Trust Boundaries: Internet <-> DMZ <-> Internal <-> AI/Agent scope.
5. Secret Remediation
When secrets are found:
MANDATORY: Read references/checklists/secret_rotation.md.
- Rotate immediately — the secret is compromised.
- Clean git history — BFG or git-filter-repo.
- Prevent recurrence — pre-commit hooks (gitleaks/trufflehog).
6. Reporting
- Critical: Immediate Blocker (RCE, Auth Bypass, Secrets Exposed, Prompt Injection). Fix immediately.
- High: Must fix before release (XSS, CSRF, Dep Vulns, SSRF, Mass Assignment).
- Medium: Document in Backlog (Missing headers, weak crypto, best practices).
All findings include CWE identifiers for integration with vulnerability management systems (Jira, Snyk, Sonar).
7. Rationalization Table
| Agent Excuse | Reality / Counter-Argument |
|---|
| "The script reported [OK], so it's clean" | Check skipped_files count. Silent skips = false negatives. |
| "This is a test/dev environment" | Attackers pivot from dev to prod. Zero Trust applies everywhere. |
| "Dependencies are only dev dependencies" | devDependencies run during build. Supply chain attacks don't discriminate. |
| "The flag is a false positive" | Verify manually. Never dismiss without proof. |
| "I'll fix it later" | Later = Never. Critical/High = Blocker NOW. |
| "The LLM generated this code, it's fine" | LLMs hallucinate vulnerabilities. Treat output as untrusted. |
| "The IaC is only for staging" | Staging configs often get copy-pasted to production. Secure from day one. |
| "We don't need an SBOM" | EU Cyber Resilience Act and US EO 14028 require it. Regulators disagree. |