Run any Skill in Manus with one click

$pwd:

exploitability-validation

Name: Exploitability Validation
Author: gadievron

// Multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable, preventing wasted effort on hallucinated findings, dead code paths, or findings with unrealistic preconditions.

Run Skill in Manus

$ git log --oneline --stat

stars:2,793

forks:441

updated:May 26, 2026 at 00:26

File Explorer

10 files

SKILL.md

readonly

related-skills.json

same repository

code-understanding.md

from "gadievron/raptor"

Provides adversarial code comprehension for security research, mapping architecture, tracing data flows, and hunting vulnerability variants to build ground-truth understanding before or alongside static analysis.

2026-05-302.8k

github-evidence-kit.md

from "gadievron/raptor"

Generate, export, load, and verify forensic evidence from GitHub sources. Use when creating verifiable evidence objects from GitHub API, GH Archive, Wayback Machine, local git repositories, or security vendor reports. Handles evidence storage, querying, and re-verification against original sources.

2026-05-232.8k

rr-debugger.md

from "gadievron/raptor"

Deterministic debugging with rr record-replay. Use when debugging crashes, ASAN faults, or when reverse execution is needed. Provides reverse-next, reverse-step, reverse-continue commands and crash trace extraction.

2026-05-082.8k

function-call-tracing.md

from "gadievron/raptor"

Instrument C/C++ with -finstrument-functions for execution tracing and Perfetto visualisation

2026-04-012.8k

code-coverage-with-gcov.md

from "gadievron/raptor"

Add gcov code coverage instrumentation to C/C++ projects

2026-04-012.8k

line-execution-checker.md

from "gadievron/raptor"

Check if specific lines were executed using gcov data

2026-04-012.8k

package.json

"author": "gadievron"

"repository": "gadievron/raptor"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Information Security AnalystsComputer and Mathematical Occupations15-1212L4

name	exploitability-validation
description	Multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable, preventing wasted effort on hallucinated findings, dead code paths, or findings with unrealistic preconditions.
user-invocable	false

Exploitability Validation Skill

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

Purpose

Prevents wasted effort on:

Hallucinated findings (file doesn't exist, code doesn't match)
Unreachable code paths (dead code, test-only)
Findings with unrealistic preconditions

When to Use

After scanning produces findings, BEFORE exploit development:

Scanner finds potential vulnerability
This skill validates it's real and reachable
Exploit Feasibility checks binary constraints
Exploit development proceeds

[CONFIG] Configuration

models:
  native: true
  additional: false  # Set true to also run GPT, Gemini

output_when_additional:
  display: "agreement: 2/3"
  threshold: "1/3 is enough to proceed"

[EXEC] Execution Rules

Run the full pipeline end-to-end.
Solve and fix any issues you encounter, unless you failed five times in a row, or need clarification.
Run on latest thinking/reasoning model available (verify model name).
Pipeline must be deterministic - if ran again, results should be the same.
Validate after writing. Run libexec/raptor-validate-schema <type> <file> after each Write. Match the type to what you wrote:
- stage for any stage-*.json file (e.g., stage-a.json, stage-c.json, stage-f.json)
- attack-tree, attack-paths, attack-surface, hypotheses, disproven for the matching working doc
Fix any errors before proceeding to the next stage.
No finding may reach Stage D without passing through Stages B and C, even if Stage A produced a successful PoC.
Do not narrate gate compliance ("GATE-8 satisfied"), schema validation passes ("findings.json: OK"), or stage transitions ("Stage C complete") to the user. Do show substantive work: PoC test output, tool investigations (objdump, checksec), binary protections, hypothesis results, and evidence discovered. Document gate compliance in validation-report.md only. Report schema or pipeline failures immediately.
Python imports: All python3 -c snippets must start with import sys, os; sys.path.insert(0, os.environ["RAPTOR_DIR"]) before importing from packages.* or core.*.
Build directory: Stage 0 creates $OUTPUT_DIR/build/. Compile and run PoCs there, not in the target repo. 10b. Sandbox: Run ALL compilation and execution via libexec/raptor-run-sandboxed <cmd> [args]. This blocks network, restricts writes, and limits resources. Never run gcc or binaries directly.
libexec scripts: Run libexec/ scripts exactly as shown in the prompts — do not prepend export commands, do not use absolute paths, do not wrap in additional shell logic. The permission system auto-approves libexec/raptor-* commands only when run in this exact form.
Per-stage JSON files. Write your stage's output to stage-X.json (e.g., stage-a.json, stage-b.json), not to findings.json. The prep script merges stage files into findings.json automatically. Do not read or write findings.json directly. Do not use python3 -c scripts for JSON — use the Write tool.

[GATES] MUST-GATEs

Rationale: Without these gates, models sample instead of checking all code, hedge with "if" and "maybe" instead of verifying, and miss exploitable findings.

GATE-1 [ASSUME-EXPLOIT]: Your goal is to discover real exploitable vulnerabilities. If you think something isn't - don't assume. First, investigate under the assumption that it is.

GATE-2 [STRICT-SEQUENCE]: Strictly follow instructions. If you think or try something else, or a new idea comes up, present the results of that analysis separately at the end. Always display the results of the strict criteria first, and only then display the results of the additional methods, if any.

GATE-3 [CHECKLIST]: Check pipeline, update checklist, and collect evidence of compliance to present at the end that you successfully executed all actions through these gates.

GATE-4 [NO-HEDGING]: If your Chain-of-Thought or results include "if", "maybe", "uncertain", "unclear", "could potentially", "may be possible", "depending on", "in theory", "in certain circumstances", or similar - immediately verify the claim. Do not leave unverified.

GATE-5 [FULL-COVERAGE]: Test the entire code provided (file(s)/code base) against checklist.json, ensuring you checked all functions and lines of code. Do not sample, estimate, or guess.

GATE-6 [PROOF]: Always provide proof and show the vulnerable code.

GATE-7 [CONSISTENCY]: Before finalizing each finding, verify that vuln_type, severity, and status are consistent with the description and proof text. A description that explains why a bug is benign must not carry high severity.

GATE-8 [POC-EVIDENCE]: A PoC requires observable evidence: a crash, changed output, callback, file read, error message, or measurable state change. "Ran without error" is not evidence. If the expected effect is not observed, either the PoC is wrong or the bug is not triggered — investigate which.

[STYLE] Output Formatting

Status values in JSON must be snake_case:

exploitable not EXPLOITABLE or Exploitable
confirmed not CONFIRMED or Confirmed
ruled_out not RULED_OUT or Ruled Out
disproven not DISPROVEN or Disproven

RULE: Any text shown to the user (chat, tables, summaries, stage progress) MUST use Title Case, never snake_case. This applies at every stage, not just the final report. Convert on output:

poc_success → "PoC Success"
not_disproven → "Not Disproven"
buffer_overflow → "Buffer Overflow"
command_injection → "Command Injection"
confirmed_constrained → "Confirmed (Constrained)"

BAD (snake_case leaked into chat):

- FIND-001 (buffer_overflow): poc_success

GOOD:

- FIND-001 (Buffer Overflow): PoC Success

No colored circles or emojis:

Do not use 🔴/🟡/🟢 - they are perspective-dependent (red = bad for defenders, good for researchers)
Use plain text headers: ### Exploitable (7 findings) not ### 🔴 EXPLOITABLE

Hypothesis status:

Proven - hypothesis confirmed by evidence
Disproven - hypothesis refuted by evidence
Partial - some predictions confirmed, others refuted

[REMIND] Critical Reminders

Do not skip, sample, or guess - check all code against checklist.json.
Provide proof for every claim.
Actually read files - do not rely on memory.
Update docs after every action.

Stages

All stages execute in sequence. No stage may be skipped. The only exception is Stage E, which only applies to memory corruption vulnerabilities.

Each stage has up to three phases: X0 (mechanical prep), X (LLM reasoning), X1 (mechanical validation). Run X0 and X1 via Python snippets in the stage prompt. The X phases are your reasoning work.

Stage	X0 (prep)	X (reasoning)	X1 (validation)
0	-	Build inventory	-
A	Load checklist + existing findings	Vuln assessment + PoC	Dedup flag + schema check
B	Load findings + attack surface	Hypotheses, attack trees	Schema check all 5 docs
C	Checklist lookup (file+line)	Code verification	Schema check + pass/fail count
D	Test/mock pre-filter + evidence card	Ruling + CVSS vectors	Schema check + counts
E	Group by binary	Per-binary analysis + mapping	Verdict → status mapping
F	CVSS scoring + consistency checks	Self-review + corrections	-
1	-	-	Recompute CVSS, report

Notes:

Stage E only applies to memory corruption vulnerabilities. Web/injection vulns skip E.
Stage 1 recomputes CVSS scores from final vectors (after any Stage F corrections), validates schemas, and generates the report. It never changes verdicts.
Each stage writes a stage_X_summary onto each finding (carry-forward). Later stages read the finding object instead of cross-referencing multiple files.

See stage-specific files for detailed instructions.

Working Documents (Stage B)

Doc	Purpose
attack-tree.json	Knowledge graph. Source of truth.
hypotheses.json	Active hypotheses. Status: testing, confirmed, disproven.
disproven.json	Failed hypotheses. What was tried, why it failed.
attack-paths.json	Paths attempted. PoC results. PROXIMITY. Blockers.
attack-surface.json	Sources, sinks, trust boundaries.

Flow

STAGE 0:  Inventory
          │
          ▼ checklist.json
          │
STAGE A:  A0 load checklist ─► A assess+PoC ─► A1 dedup+validate
          │
          ▼ findings.json (+ origin, stage_a_summary)
          │
STAGE B:  B0 load findings ─► B hypotheses+trees ─► B1 validate 5 docs
          │
          ▼ findings.json (+ stage_b_summary), working docs
          │
STAGE C:  C0 checklist lookup ─► C verify code ─► C1 validate
          │
          ▼ findings.json (+ sanity_check, stage_c_summary)
          │
STAGE D:  D0 test filter+evidence card ─► D ruling+CVSS ─► D1 validate
          │
          ▼ findings.json (+ ruling, cvss_vector, stage_d_summary)
          │
     ┌────┴────┐
     │         │
  Memory    Web/Injection
  Corruption    │
     │          │
STAGE E:        │
  E0 group ─► E analyze ─► E1 verdict map
     │          │
     └────┬─────┘
          │
          ▼ findings.json (+ feasibility, stage_e_summary, final_status)
          │
STAGE F:  F0 CVSS scores+checks ─► F review+correct ─► findings.json
          │
          ▼ findings.json (+ stage_f_summary)
          │
STAGE 1:  Recompute CVSS, validate, report
          │
          ▼ validation-report.md

SMT Add-on (Z3): One-Gadget Constraint Satisfaction

A Z3-backed analysis that runs alongside the LLM heuristic for one-gadgets whenever the optional z3-solver package is installed. It is a soft dependency — if z3 is not importable, the add-on silently returns smt_available=False and behaviour is identical to the non-SMT path. There is no CLI flag or environment variable to enable it; installation is the sole gate.

Architecture. Shared Z3 primitives live in core/smt_solver/ (availability gate, bitvector factories, timed solver construction, two's-complement witness formatting). The domain encoding lives in packages/exploit_feasibility/smt_onegadget.py, which imports the primitives and handles one_gadget constraint parsing and feasibility queries.

Layer	Module	Responsibility
Harness	`core.smt_solver` (`availability`, `config`, `bitvec`, `session`, `witness`)	`z3_available()`, `mk_var / mk_val`, `new_solver(timeout_ms=…)`, `format_witness(model, signed=…)`, `mode_tag(…)`
Encoding	`packages/exploit_feasibility/smt_onegadget.py`	Parse one-gadget register/memory constraints, check feasibility, rank gadgets

The one-gadget encoder pins BitVec width to 64 (x86_64 registers) and treats values as unsigned, so the harness's global bv_width() / is_signed() defaults do not affect it.

One-gadget constraint satisfaction (`smt_onegadget.py`)

Invoked automatically when one_gadget offsets are found during exploit-feasibility analysis. It ranks each gadget by whether Z3 can satisfy its constraint list (rank_onegadgets(gadget_objs)), and stores the verdict for the best-ranked gadget under analysis.one_gadget_info.smt_feasibility in the exploit context.

Constraint forms currently recognised (case-insensitive; combined via || / && / AND at the disjunction/conjunction layer):

Null / non-null: rax == NULL, [rsp+0x30] == NULL, rbp is NULL, rbp is writable, address [rsp+0x50] is writable
Numeric equality / inequality: rax == 0xbeef, rsi != 0
Stack alignment (glibc 2.34+): rsp & 0xf == 0
Register / memory comparisons: r12 == rbx, rax == [rsi]

Unparseable constraints (e.g. valid argv, (s32)[rbp+0x48] == 0, brace-string indexing like {"/bin/sh"}[rsi+0x0] == NULL) land in the unknown bucket and fall back to the heuristic note field. Disjunctions with any unparseable branch are treated conservatively — the whole line goes to unknown rather than evaluating partial branches, to avoid false positives.

Results appear under analysis.one_gadget_info.smt_feasibility:

{
  "feasible": true | false | null,
  "reasoning": "...",
  "satisfied_constraints": ["[rsp+0x30] == NULL"],
  "unsatisfied_constraints": [],
  "model": {"rsp": 140737...}
}

feasible: true — Z3 found satisfying values; the one-gadget is usable under the right conditions.
feasible: false — constraints are mutually exclusive (or, when a crash state is supplied, contradicted by it); ruling out this gadget is reliable.
feasible: null — Z3 returned unknown or all constraints were unparseable; fall back to the heuristic note.

Current integration: analyze_binary() calls rank_onegadgets() without a crash state, so the verdict answers "can these constraints ever be satisfied?" — an over-approximation. The check_onegadget(gadget, crash_state=...) API accepts concrete register values (e.g. from a GDB crash dump) for a more precise per-gadget verdict when a crash state is available; the heuristic note field is always preserved alongside smt_feasibility for comparison.

Stage E: When smt_feasibility.feasible is false for the best-ranked gadget, treat it as a strong chain-break signal (stronger than the heuristic note, which remains present for comparison).

Integration with Exploit Feasibility

Stage E automatically bridges to the exploit_feasibility package for memory corruption vulnerabilities.

Automatic (via Stage E):

# Stage E handles this automatically for applicable vuln types
# See stage-e-feasibility.md for details

Manual (if needed):

from packages.exploit_feasibility import analyze_binary, format_analysis_summary

result = analyze_binary(binary_path, vuln_type='format_string')
print(format_analysis_summary(result, verbose=True))

Final Status After Stage E:

Source Status	Feasibility	Final Status
Confirmed	Likely	Exploitable
Confirmed	Difficult	Confirmed (Constrained)
Confirmed	Unlikely	Confirmed (Blocked)
Confirmed	N/A (web vuln)	Confirmed

This ensures findings are:

Real and reachable (Stages A-D)
Actually exploitable (Stage E + exploit_feasibility)

Notice

This analysis is performed for defensive purposes, in a lab environment. Full permission has been provided.

exploitability-validation

More from this repository

More from this repository

Exploitability Validation Skill

Purpose

When to Use

[CONFIG] Configuration

[EXEC] Execution Rules

[GATES] MUST-GATEs

[STYLE] Output Formatting

[REMIND] Critical Reminders

Stages

Working Documents (Stage B)

Flow

SMT Add-on (Z3): One-Gadget Constraint Satisfaction

One-gadget constraint satisfaction (smt_onegadget.py)

Integration with Exploit Feasibility

Notice

Exploitability Validation Skill

Purpose

When to Use

[CONFIG] Configuration

[EXEC] Execution Rules

[GATES] MUST-GATEs

[STYLE] Output Formatting

[REMIND] Critical Reminders

Stages

Working Documents (Stage B)

Flow

SMT Add-on (Z3): One-Gadget Constraint Satisfaction

One-gadget constraint satisfaction (smt_onegadget.py)

Integration with Exploit Feasibility

Notice

One-gadget constraint satisfaction (`smt_onegadget.py`)

One-gadget constraint satisfaction (`smt_onegadget.py`)