원클릭으로 Manus에서 모든 스킬 실행

$pwd:

dma-attack-techniques

Name: Dma Attack Techniques
Author: gmh5225

// Guide for PCIe DMA threat modeling, FPGA-based memory access, and defensive implications in game security. Use this skill when researching pcileech, BAR and TLP behavior, page-table walking, IOMMU or VT-d, device impersonation, firmware mimicry, or DMA detection and mitigation in game security research.

Manus에서 실행

$ git log --oneline --stat

stars:2,930

forks:420

updated:2026년 5월 21일 08:51

SKILL.md

readonly

related-skills.json

같은 저장소

anti-cheat-systems.md

from "gmh5225/awesome-game-security"

Guide for modern game anti-cheat architecture, Windows kernel monitoring, and detection tradeoffs. Use this skill when analyzing EAC, BattlEye, Vanguard, FACEIT AC, kernel callbacks, handle protection, manual-map detection, boot-start drivers, BYOVD, DMA threats, or behavioral telemetry in game security research.

2026-05-222.9k

game-hacking-techniques.md

from "gmh5225/awesome-game-security"

Guide for game-hacking technique taxonomy and threat modeling relevant to game security. Use this skill when researching memory access, code injection, overlays, input simulation, engine-specific attack surfaces, or how modern anti-cheat systems constrain user-mode, kernel-mode, hypervisor, and DMA-based cheat implementations.

2026-05-222.9k

windows-kernel-security.md

from "gmh5225/awesome-game-security"

Guide for Windows kernel internals and security mechanisms used in game protection and low-level research. Use this skill when working with drivers, IRQL-sensitive callbacks, EPROCESS, ETHREAD, MMVAD internals, IOCTL paths, DSE, PatchGuard, HVCI, PiDDBCache, MmUnloadedDrivers, or kernel memory inspection.

2026-05-212.9k

mobile-security.md

from "gmh5225/awesome-game-security"

Guide for Android and iOS game security, reversing, and anti-cheat-adjacent platform research. Use this skill when working with APK or IPA analysis, IL2CPP mobile titles, Frida, Zygisk or Magisk, jailbreak or root detection bypass, Android kernel modules, emulator detection, or mobile anti-cheat systems.

2026-04-262.9k

game-engine-resources.md

from "gmh5225/awesome-game-security"

Guide for game-engine internals, source trees, plugins, and engine-specific security research. Use this skill when researching Unreal, Unity, Source, Godot, custom engines, engine detectors, engine explorers, or engine protection patterns relevant to modding, reverse engineering, and anti-cheat.

2026-04-142.9k

awesome-game-security-overview.md

from "gmh5225/awesome-game-security"

Guide for understanding and contributing to the awesome-game-security curated resource list. Use this skill when adding new resources, organizing categories, mapping topics across anti-cheat, Windows kernel, DMA, reverse engineering, and game-engine research, or maintaining README.md format consistency.

2026-04-142.9k

package.json

"author": "gmh5225"

"repository": "gmh5225/awesome-game-security"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

정보 보안 분석가컴퓨터 및 수학직15-1212L4

원클릭으로 모든 스킬 실행

name	dma-attack-techniques
description	Guide for PCIe DMA threat modeling, FPGA-based memory access, and defensive implications in game security. Use this skill when researching pcileech, BAR and TLP behavior, page-table walking, IOMMU or VT-d, device impersonation, firmware mimicry, or DMA detection and mitigation in game security research.

DMA Attack Techniques

Overview

This skill covers Direct Memory Access research from the awesome-game-security collection, focusing on FPGA-based PCIe attacks, pcileech usage, physical-memory access workflows, and the defensive limits of software anti-cheat once a hostile device can read memory below the OS.

README Coverage

Cheat > DMA
Anti Cheat > Detection:DMA
Anti Cheat > Detection: Hacked Hypervisor
Anti Cheat > Detection:Virtual Environments
Anti Cheat > Detection:HWID
Windows Security Features

Threat Model

External DMA Cheat Architecture

A modern external DMA cheat consists of three components:

1. Cheat PC — runs the cheat application, signature databases,
   aim assistance, ESP rendering, and a network/USB link to the gaming PC.

2. DMA Card — an FPGA-based PCIe endpoint installed in the gaming PC
   (typically M.2 NVMe slot). Exposes a memory-read/write interface to
   the cheat PC. Uses Bus Master capability to issue Memory Read TLPs
   against the gaming PC's RAM.

3. Actuator (optional) — a USB HID emulator (microcontroller-based) that
   injects keyboard/mouse input on the gaming PC according to commands
   from the cheat PC, closing the loop.

The structural property that makes this threat distinctive:
no attacker code executes on the gaming PC. The DMA card performs
hardware-level transactions between the FPGA and the gaming PC's
memory controller, mediated by the chipset and (when configured) the IOMMU.
The gaming PC's OS, drivers, and anti-cheat see only a PCIe device
announcing itself through Configuration Space and performing what looks
like ordinary DMA.

Three Defense Layers

Layer              Mechanism                    What It Catches
─────────────────────────────────────────────────────────────────────────────
PCIe-layer         Inspect Config Space &        Identity mismatch — spoofed
fingerprinting     behavior at the bus level     device that doesn't match
                                                 real silicon's full signature

IOMMU              Use the IOMMU to bound        Out-of-domain DMA — device
enforcement        what physical memory the       trying to read game memory
                   device can touch               it wasn't allocated

External           TPM-anchored measured boot,   Boot-chain compromise — IOMMU
attestation        cloud-verified                or kernel itself subverted

PCIe Protocol Stack

Three Protocol Layers

Layer              Unit                Function
────────────────────────────────────────────────────────────────
Transaction        TLP                 Memory/IO/Config reads & writes,
                                       completions, messages
Data Link          DLLP                Acknowledgements, flow control
                                       credits, power management
Physical           Ordered Sets        Link training, equalization,
                                       clock recovery

A real device's behavior is shaped by all three layers.
An FPGA emulating a real device only fully controls the Transaction Layer;
the Physical and Data Link layers leak fingerprints that
BRAM-based emulation cannot fully hide.

TLP (Transaction Layer Packet) Format

Every TLP begins with a 3 DW (12-byte) or 4 DW (16-byte) header.
4 DW headers are used for 64-bit addresses and certain message types.

First DWord (DW0) encoding:
Bits       Field         Notes
[31:29]    Fmt[2:0]      Header format + data presence
[28:24]    Type[4:0]     TLP type (combined with Fmt)
[22:20]    TC[2:0]       Traffic Class (default 0)
[18]       Attr[2]       ID-Based Ordering (IDO)
[15]       TD            TLP Digest (ECRC trailer)
[14]       EP            Poisoned data
[13:12]    Attr[1:0]     Relaxed Ordering, No Snoop
[11:10]    AT[1:0]       Address Type (critical for ATS bypass)
[9:0]      Length[9:0]   Payload length in DWords (0x000 = 1024 DW = 4 KB)

Fmt[2:0] encoding:
000 = 3 DW header, no data
001 = 4 DW header, no data
010 = 3 DW header, with data
011 = 4 DW header, with data
100 = TLP Prefix

Key TLP types (Fmt + Type combinations):
Fmt  Type     TLP
000  0_0000   MRd (Memory Read, 3DW / 32-bit addr)
001  0_0000   MRd (Memory Read, 4DW / 64-bit addr)
010  0_0000   MWr (Memory Write, 3DW)
011  0_0000   MWr (Memory Write, 4DW)
000  0_0100   CfgRd0 (Config read — terminate at this device)
010  0_0100   CfgWr0
000  0_0101   CfgRd1 (Config read — forwarded by bridges)
010  0_0101   CfgWr1
000  0_1010   Cpl (Completion without data)
010  0_1010   CplD (Completion with data)
001  1_0rrr   Msg (Message, no data)
011  1_0rrr   MsgD (Message with data)

Detection-Relevant DW0 Fields

TC[2:0] — Traffic Class. Default 0; real silicon rarely uses non-zero TC.
A spoofed device generating non-zero TC is anomalous.

Attr[2:0] — RO/NS/IDO. A device emulating a NIC must follow that NIC's
typical NS/RO usage pattern; mismatches are visible.

AT[1:0] — Address Type:
  00 = Untranslated (IOMMU will translate)
  01 = Translation Request (ATS only)
  10 = Translated (device claims it has already translated via ATS)
This field is the basis of ATS bypass attacks.

TD — TLP Digest. If set, an ECRC trailer is present.
EP — Poisoned. Indicates data is known-bad.

TLP Routing and Requester ID

Three routing modes:
- Address routing — Memory and IO TLPs, matched against bridge apertures
- ID routing — Config TLPs and Completions, by BDF
- Implicit routing — Some Messages (broadcast, terminate at root)

DW1 carries the Requester ID (16 bits = Bus:Device:Function, "BDF")
and an 8-bit Tag for matching completions to requests.

The Requester ID is the entire input to per-device security policy:
IOMMU translation lookup, ACS source validation, AER source ID,
MSI/MSI-X routing. Anything that lets a device send TLPs with a
different Requester ID fundamentally compromises isolation.

Transaction categories:
- Posted (P) — fire-and-forget (Memory Writes, Messages)
- Non-Posted (NP) — requires completion (Memory Reads, IO/Config R/W)
- Completion (Cpl/CplD) — response to Non-Posted requests

Completion Status codes:
000 = Successful Completion (SC)
001 = Unsupported Request (UR)
010 = Configuration Request Retry Status (CRS)
100 = Completer Abort (CA)

UR vs CA distinction matters for spoofing detection — real silicon
responds differently to malformed config accesses vs accesses to
unimplemented offsets. Many spoofed firmwares hard-code one or the other.

Memory Read Completion Splitting

A single Memory Read TLP returns up to Max_Read_Request_Size (MRRS) bytes.
The completer splits the payload at any boundary >= RCB
(Read Completion Boundary, 64 or 128 bytes).
Each fragment cannot exceed Max_Payload_Size (MPS).

Each Completion carries:
- Lower Address[6:0] — lowest 7 bits of first byte address
- Byte Count[11:0] — bytes remaining (last fragment's Byte Count
  equals its own payload length)
- BCM — PCI-X compatibility (typically 0)
- Tag — matches originating MRd's Tag

The split pattern (fragment count, boundary positions) is a
strong fingerprint: real memory controllers produce characteristic
distributions of fragment sizes and inter-fragment gaps.
BRAM-backed emulators producing perfectly uniform 64-byte fragments
at constant cadence are anomalous.

Tag Space and Fingerprinting

- 5-bit Tag (original): 32 outstanding non-posted requests per Requester ID
- Extended Tag (PCIe 1.1+, Device Control[8]): 8-bit / 256 outstanding
- 10-Bit Tag (PCIe 4.0+, Device Control 2[12]): 1024 outstanding

Tag turnover discipline — which tags get reissued and how quickly —
reflects the device's internal request tracking pipeline.
Firmware that issues reads with no tag turnover (same tag, or monotonic
beyond negotiated limit) is observably distinct from real silicon.

MPS and MRRS as Fingerprints

Both are negotiated once at link bring-up and fixed for the session.
- Device Capabilities[2:0]: Max_Payload_Size_Supported
  (0=128, 1=256, 2=512, 3=1024, 4=2048, 5=4096 bytes)
- Device Control[7:5]: current MPS (must be <= Supported,
  set to minimum of all devices in hierarchy)
- Device Control[14:12]: Max_Read_Request_Size (same encoding)

The discriminator is donor consistency: a device claiming a donor
that is known to support larger payloads, different tag behavior,
or a different negotiated profile should match that donor under
the same root-port constraints.

Data Link Layer

DLLPs provide reliable delivery between Physical and Transaction layers.

DLLP          Purpose
─────────────────────────────────────────
Ack           TLP received correctly
Nak           TLP received with error; sender must replay
InitFC1/2     Flow control credit initialization at link bring-up
UpdateFC      Ongoing flow control credit updates
PM_*          Power management (L0s, L1 entry/exit)
Vendor        Vendor-defined

Flow control credits are per TLP category:
- PH / PD — Posted Header / Data
- NPH / NPD — Non-Posted Header / Data
- CplH / CplD — Completion Header / Data

Negotiated credit values are not generally exposed through standard
Link Capabilities register. They are visible in protocol-level traces,
some root-port/vendor performance counters, or FPGA-side debug.
Useful for lab fingerprinting and forensic captures, not normal
runtime config-space detection.

Physical Layer

Two details matter even without PHY-level instrumentation:

LTSSM (Link Training and Status State Machine):
- States: Detect → Polling → Configuration → L0 (operational)
  → L0s, L1, L2 (low-power) → Recovery → Hot Reset → Disabled → Loopback
- Observable via Link Status Register and root-port performance counters

Detection-relevant:
- Negotiated Link Width (Link Status[9:4]):
  Device advertising x16 but negotiating x1 is a tell
- Current Link Speed (Link Status[3:0]):
  Capability claims Gen4 but stays Gen2/Gen3 is anomalous
- Recovery cycle frequency:
  Comparative signal; materially different from donor reference is anomalous

ASPM (Active State Power Management):
- L0s and L1 are link-level low-power states
- A device claiming ASPM support in Link Capabilities but
  never transitioning out of L0 contradicts its class

Configuration Access Mechanisms

Two mechanisms on x86:

CAM (Legacy I/O-port path):
1. CPU writes to I/O port 0xCF8 (Bus:Device:Function:Register)
2. CPU reads/writes at I/O port 0xCFC
- Reaches only first 256 bytes
- Still used during early BIOS/UEFI boot

ECAM (Enhanced, MMIO path):
1. Read MCFG ACPI table for segment base addresses
2. Compute: addr = base + ((bus << 20) | (dev << 15) | (func << 12) | offset)
3. OS maps physical address into kernel virtual memory
- Required for Extended Configuration Space (0x100–0xFFF)
- Where AER, DSN, LTR, VSEC, ATS, PASID, SR-IOV live

On Windows, supported paths are:
- IRP_MN_READ_CONFIG / IRP_MN_WRITE_CONFIG
- BUS_INTERFACE_STANDARD.GetBusData / SetBusData
Production anti-cheat should use documented bus interfaces;
direct MCFG mapping is a lab-only technique.

PCIe Configuration Space

Legacy 256-Byte Header (Type 0 Endpoint)

Offset  Field                    Notes
0x00    Vendor ID (2B)           Chip manufacturer (e.g., 0x8086 Intel)
0x02    Device ID (2B)           Specific product
0x04    Command (2B)             BME (bit 2), MemSpace (bit 1), IOSpace (bit 0)
0x06    Status (2B)              Capabilities List (bit 4)
0x08    Revision ID + Class Code Class triplet: Base / Sub / ProgIF
0x0C    Cache Line / Latency /   Header Type 0x00 = endpoint,
        Header Type / BIST       0x01 = bridge, 0x80 = multi-function
0x10–27 BAR0–BAR5               Memory or I/O windows
0x2C    Subsystem Vendor ID      Often distinguishes board manufacturers
0x2E    Subsystem Device ID
0x30–33 Expansion ROM Base
0x34    Capabilities Pointer     Offset of first capability in linked list
0x3C    IRQ Line/Pin/Min/Max     Legacy INTx routing

BAR encoding (32-bit BAR):
bit 0:    0 = Memory BAR, 1 = I/O BAR
bits 2:1: 00 = 32-bit, 10 = 64-bit (BAR pair)
bit 3:    Prefetchable

BAR size discovery: write 0xFFFFFFFF to BAR, read back.
Lower bits (except type bits) come back as 0; rest form a size mask.
Real silicon's size masks are device-specific; a spoofed BAR with
64 KB mask when the donor uses 4 KB is detectable in one operation.

Capabilities Chain

If Status[4] is set, 0x34 points to the first capability.
Each capability has a 2-byte header: [ID | Next].
Next is DWord-aligned in 0x40–0xFF, or 0x00 to terminate.

Common capability IDs:
ID    Capability
0x01  PCI Power Management
0x05  MSI
0x10  PCI Express
0x11  MSI-X
0x12  SATA Configuration
0x13  PCI Advanced Features
0x14  Enhanced Allocation

Detection: walk the chain, validate each capability's declared size
doesn't overlap the next, Next is DWord-aligned and within bounds,
no cycle exists. A malformed chain is itself a signal.

PCIe Express Capability (ID 0x10)

The single most important capability for spoofing detection.

Offset  Field                    Notes
+0x02   PCIe Capabilities        Cap Version, Device/Port Type, Slot Impl
+0x04   Device Capabilities      MPS Supported, FLR, Phantom Functions
+0x08   Device Control           MPS current, MRRS, Error Enables
+0x0A   Device Status            CED, NFED, FED, URD, Transactions Pending
+0x0C   Link Capabilities        Max Link Speed/Width, ASPM, L0s/L1 latencies
+0x10   Link Control             ASPM Control, RCB, Link Disable, Retrain
+0x12   Link Status              Current Link Speed/Width, Link Training
+0x24   Device Capabilities 2    Completion Timeout Ranges, AtomicOp,
                                 OBFF, LTR mechanism
+0x28   Device Control 2         Completion Timeout Value, AtomicOp, LTR Enable
+0x2C   Link Capabilities 2      Supported Link Speeds Vector
+0x30   Link Control 2           Target Link Speed, Compliance
+0x32   Link Status 2            De-emphasis, EQ Phase status

Detection leverage per field:
- Device Type (+0x02[7:4]): must match donor's role
- MPS Supported (+0x04[2:0]): hard-IP ceiling contradicts donor
- FLR support (+0x04[28]): verify FLR changes same sticky/non-sticky
  state as claimed donor; naive firmware acknowledges FLR but continues
  unchanged, preserving impossible internal state
- Link Status (+0x12): Width/Speed are negotiated, observable, hard to
  lie about — hard IP reports what LTSSM actually achieved
- Slot Clock Config (+0x12[12]): must match real platform behavior
- Completion Timeout ranges (+0x24): selecting outside claimed ranges
  is a discriminator
- AtomicOp (+0x24[6-9]): server-class GPUs/NICs may support; FPGA
  hard IP almost never does. Mismatch is detectable.

MSI and MSI-X Capabilities

MSI (ID 0x05):
Message Control bits:
  [0]    MSI Enable
  [3:1]  Multiple Message Capable (0–5, representing 1–32 vectors)
  [6:4]  Multiple Message Enable (cannot exceed Capable)
  [7]    64-bit Address Capable
  [8]    Per-Vector Masking Capable

x86 MSI Address: bits [31:20] fixed at 0xFEE (LAPIC prefix)
  [19:12] Destination ID, [3] Redirection Hint, [2] Destination Mode
Message Data: [15] Trigger Mode, [10:8] Delivery Mode, [7:0] Vector

MSI-X (ID 0x11):
- Supports up to 2,048 vectors
- Table stored in BAR-mapped region (not Config Space)
- Each entry: 16 bytes (Addr Low, Addr High, Data, Vector Control)
- PBA (Pending Bit Array): bit-per-vector pending state

Naive MSI-X emulation failures:
- Ignores Vector Control Mask writes
- Sets PBA bits but never clears on unmask
- Returns hardcoded PBA values
- Doesn't retire pending interrupts when masks clear
Detection probe: mask vector → induce interrupt condition →
observe PBA bit → unmask → observe interrupt firing.
Real silicon satisfies this round trip; spoofed firmware rarely does.

AER Extended Capability (ID 0x0001)

Three error classes:
- Correctable: Receiver Error, Bad TLP, Bad DLLP, Replay Timer Timeout
- Uncorrectable Non-Fatal: Completion Timeout, Completer Abort, UR, ACS Violation
- Uncorrectable Fatal: Malformed TLP, DLL Protocol Error, Surprise Down

Each has Status (sticky, W1C), Mask, and Severity registers.
Header Log (16B) captures full TLP header of first logged uncorrectable error.

Detection:
- Absence of AER when donor model is known to expose it = mismatch
- Zero correctable-error count over long window when donor's silicon
  normally produces a baseline rate = anomalous
- Anomalous UR response patterns to probes of unimplemented offsets

Extended Capabilities

4-byte header at each offset:
[31:20] Next Capability Offset (0 to terminate)
[19:16] Capability Version
[15:0]  Extended Capability ID

Key Extended Capability IDs:
0x0001  AER
0x0002  Virtual Channel (VC)
0x0003  DSN (Device Serial Number, 8 bytes)
0x000B  Vendor-Specific Extended Capability (VSEC)
0x000D  ACS (Access Control Services)
0x000E  ARI
0x000F  ATS (Address Translation Services)
0x0010  SR-IOV
0x0015  Resizable BAR (RBAR)
0x0018  LTR (Latency Tolerance Reporting)
0x001B  PASID
0x001D  DPC (Downstream Port Containment)
0x001E  L1 PM Substates
0x001F  Precision Time Measurement (PTM)

Detection-relevant:
- DSN: 8-byte unique serial; donor-cloned firmware can collide
  with another player's identical card
- VSEC: Xilinx PCIe IP optionally emits VSEC blocks with
  characteristic Vendor ID + VSEC ID combinations
- ATS/PASID/SR-IOV presence on consumer-class donor is
  demographically suspicious — rare outside server-class hardware

IOMMU Architecture

Translation Flow

1. Device issues Memory TLP with target IOVA.
   TLP header carries 16-bit Requester ID (BDF).
2. TLP travels upstream through switches/bridges to root complex.
3. IOMMU intercepts, uses Requester ID to look up translation context.
4. IOMMU walks device's I/O page tables: IOVA → physical address.
5. Permission bits (Read, Write) checked against access type.
6. Success: TLP forwarded with translated physical address.
7. Failure: fault logged, device receives UR or CA completion.

Intel VT-d Internals

Two-level table lookup:

BDF → Root Table (256 entries, 16B each, indexed by Bus)
    → Context Table (256 entries, 16B each, indexed by Dev:Func)
      → Second-Level Page Tables (3–5 levels)
        → Final 4 KB physical page

Context Entry fields:
- SLPTPTR: Second-Level Page Table Pointer
- Domain ID: 16-bit (multiple devices can share a domain)
- AW: Address Width (3/4/5-level = 39/48/57-bit IOVA)
- T: Translation Type (untranslated-only, translated-only, or both)
- P: Present
- FPD: Fault Processing Disable

Page table entries (PTE, EPT-like format):
[0]      R - Read permission
[1]      W - Write permission
[7]      PS - Page Size (1=leaf super-page, 0=next-level table)
[N-1:12] Physical address of next-level table or 4 KB page

Super-pages: level-2 leaf = 2 MB, level-3 leaf = 1 GB.

Scalable Mode (VT-d 3.0+):
Context Entry → PASID Directory → PASID Table → per-PASID
first-level page-table roots. Enables Shared Virtual Memory (SVM).
Check RTADDR_REG.TTM to determine which mode is in effect.

AMD-Vi Internals

Single-level Device Table indexed directly by BDF:

BDF → Device Table Entry (32 bytes)
    → I/O Page Tables (1–6 levels)
      → Final page

DTE encodes:
- Page Table Root Pointer
- Mode (0–6, selects paging levels)
- Domain ID (16 bits)
- IR, IW — Default Read/Write permission
- GV — Guest Valid (nested translation)
- PASID-related fields

Page sizes: 4 KB, 2 MB, 1 GB.

IOTLB and Invalidation

Translations cached in IOTLB (I/O Translation Lookaside Buffer).
When mappings change, IOTLB must be invalidated.

Two distinct caches when ATS is in use:
- IOMMU's own IOTLB
- Device-side TLB (DevTLB) caching prior translations

Full invalidation with ATS requires:
1. IOMMU invalidates own IOTLB
2. IOMMU sends ATS Invalidate Request Message to device
3. Device drops affected DevTLB entries, replies with Invalidate Completion

If step 2 or 3 is skipped, device retains stale translations
and can DMA to unmapped addresses.

VT-d invalidation granularities:
- Global: flush entire IOTLB
- Domain-Selective: flush all entries for a Domain ID
- Page-Selective: flush specific IOVA range in a domain

Strict vs lazy invalidation:
Lazy mode defers IOTLB invalidation, batching them for performance.
Opens a window where stale translations remain valid — a device
whose driver has unmapped a buffer can still DMA to the old IOVA.

Fault Recording

VT-d: Fault Recording Registers — circular array capturing
  Requester ID, faulting IOVA, fault reason, TLP type.

AMD-Vi: Event Log Buffer — producer-consumer ring buffer of
  IO_PAGE_FAULT, INVALID_DEVICE_REQUEST, ATS-related events.

Both surface faults via interrupts and event-log entries.
On Windows, some IOMMU violations observable through WHEA/bug-check
paths and Driver Verifier DMA-violation telemetry.

Per-device fault rate is one of the most operationally useful
IOMMU-layer signals. Legitimate devices with correct drivers
rarely produce faults; sustained nonzero rate is direct evidence
of out-of-domain access attempts.

RMRR/IVMD:
ACPI DMAR table contains RMRR (Reserved Memory Region Reporting)
sub-tables declaring physical ranges devices need identity-mapped.
AMD-Vi has analogous IVMD (I/O Virtualization Memory Definition)
in the IVRS table. A defender should enumerate these and reject
configurations where suspect BDFs appear in RMRR scope or
RMRR ranges overlap game memory regions.

IOMMU Topology and Isolation

IOMMU Groups

Devices in the same IOMMU group may not be safely isolated from
one another. Group membership determined by:
- PCIe topology — devices behind a switch share a group
  unless the switch supports and enables ACS
- ACS state of upstream bridges
- Quirks for known-broken hardware

Linux: /sys/kernel/iommu_groups/N/devices/
Windows: equivalent constraints but no simple public group filesystem

ACS (Access Control Services, Extended Cap ID 0x000D)

ACS is a PCIe capability that switches/root ports advertise
to declare they can enforce isolation between downstream ports.

ACS Capability register enable bits:
Bit  Feature                      Effect
0    Source Validation (SV)        Drop TLPs with wrong Requester ID
1    Translation Blocking (TB)    Block AT=10 (Translated) TLPs
2    P2P Request Redirect (RR)    Force P2P requests upstream for IOMMU
3    P2P Completion Redirect (CR) Force P2P completions upstream
4    Upstream Forwarding (UF)     Forward upstream regardless
5    P2P Egress Control (EC)      Allow/deny P2P routing per-port
6    Direct Translated P2P (DT)   Allow P2P with translated addresses

Critical for untrusted endpoints: SV, TB, RR, and CR.
A switch missing Source Validation lets a malicious device spoof
its Requester ID, defeating per-BDF IOMMU translation.
A switch missing P2P Request Redirect allows devices on the same
switch to DMA directly to each other without IOMMU involvement.

Peer-to-Peer DMA

Devices on the same PCIe tree can send Memory TLPs directly to
each other's BAR ranges without involving system memory.
Without ACS forcing redirection, P2P TLPs never reach the IOMMU.

Plausible P2P DMA targets for cheat:
- GPU framebuffer — rendered game state
- Network adapter ring buffers — game traffic
- USB controller queues — input device data

Mitigation: ACS Translation Blocking + P2P Request Redirect
on every intermediate bridge. Defender must walk topology and
confirm both bits are active.

Interrupt Remapping

MSI/MSI-X interrupts are Memory Writes to 0xFEE00000–0xFEEFFFFF.
Without Interrupt Remapping (IR), any device with Bus Master enabled
can write to this range and trigger arbitrary interrupts — NMIs, SMIs,
or vectors targeting wrong CPU.

With IR enabled, IOMMU validates MSI/MSI-X writes and uses
remapping-table state to determine permitted destination.
IR is part of VT-d's broader DMA Remapping architecture.
Both VT-d and AMD-Vi have integrated equivalents.
Both should be mandatory in any anti-cheat threat model.

ATS, PASID, and Address Translation Trust

ATS (Address Translation Services, Extended Cap ID 0x000F)

ATS lets a device cache IOMMU translations locally:
1. Device issues Translation Request TLP (AT=01) with IOVA
2. IOMMU translates and responds with Translation Completion
   carrying physical address
3. Device caches translation in Device-side TLB (DevTLB)
4. Subsequent accesses issued with AT=10 (Translated) —
   IOMMU bypasses page-walk, trusting device's cached translation
5. On mapping changes, IOMMU sends Invalidation Request

Attack surface: malicious device claiming ATS can present
arbitrary AT=10 TLPs whose addresses were never approved
by the IOMMU. The IOMMU forwards them trusting the device's claim.

PASID (Extended Cap ID 0x001B)

Extends ATS to per-process address spaces. 20-bit PASID carried
in a TLP Prefix. IOMMU uses (Requester ID, PASID) jointly
to select translation context.

PASID enables Shared Virtual Memory (SVM) — primarily found in
datacenter NICs, AI accelerators. Presence on a consumer card
is anomalous.

ATS Trust Model and "ATS Untrusted" Mode

The fundamental trust assumption: device honestly reports
translations it has been granted. Unreasonable for external
Thunderbolt enclosures, FPGAs in M.2 slots, or untrusted
accelerator cards.

Modern OS/IOMMU stacks can treat endpoints as ATS-untrusted:
ATS is disabled, blocked by policy, or stripped.
Linux: pci=noats plus per-device quirks.
Windows: Kernel DMA Protection / DMAGuard matters, but don't
treat "Kernel DMA Protection: On" as proof every internal
endpoint is ATS-untrusted. Verify ATS state per endpoint.

Driver–IOMMU Contract and Bypass Catalog

Legitimate DMA Path (Windows)

1. Acquire DMA adapter: IoGetDmaAdapter / WDF wrapper
2. Allocate buffer: MmAllocateContiguousMemorySpecifyCacheNode
   or WdfCommonBufferCreate
3. Map for DMA: AllocateCommonBuffer / MapTransferEx
   - OS allocates IOVA from device's domain
   - Creates IOMMU page-table entries: [IOVA, IOVA+size) → physical pages
   - Returns IOVA to driver
4. Program device: driver writes IOVA into device's BAR registers
5. Device DMAs: TLPs arrive at IOMMU with BDF + IOVA
6. IOMMU translates: page-walk produces physical address
7. Completion and unmap: teardown IOMMU entries + IOTLB invalidation

In this model, device can DMA only to addresses the driver
explicitly mapped. Game memory is not in that range.

Six Paths to Out-of-Domain Access

1. IOMMU not active or not applied to this path
   VT-d/AMD-Vi disabled, OS not enforcing, device outside protected ports

2. Pre-boot DMA injection
   Inject before IOMMU initialized; requires firmware-level exploit

3. Identity-mapped / passthrough domains
   Legacy drivers request 1:1 mapping; modern strict-mode rejects it

4. Driver mapping over-allocation (Thunderclap class)
   OS maps full 4 KB page when buffer is smaller; adjacent kernel data exposed

5. Legitimate-path data exfiltration
   Cheat spoofed as NIC; OS network stack passes game packets through
   NIC's RX ring buffer (legitimately IOMMU-mapped). Cheat reads game data
   without leaving allowed mappings. Undetectable at IOMMU layer.

6. IOMMU page-table manipulation via kernel compromise
   BYOVD / vulnerable driver reprograms IOMMU tables.
   Requires code execution on gaming PC.

Approaches 1–3 are the foundation of most current DMA cheats.

IOMMU Bypass Catalog (16 Techniques)

#   Technique                   Mechanism                           Mitigation
─────────────────────────────────────────────────────────────────────────────────
1   IOMMU disabled              VT-d/AMD-Vi off in BIOS             Refuse misconfigured platforms
2   Pre-boot DMA                Firmware leaves injection window     UEFI updates; verify ACPI indicators
3   Identity/passthrough        1:1 IOVA-to-physical mapping        Strict-mode IOMMU policy
4   Driver over-allocation      Full 4 KB page, adjacent data       OS bounce buffers; strict mappings
5   ATS abuse                   AT=10 TLPs with arbitrary addrs     ATS Untrusted mode for non-allowlisted
6   ACS missing on bridge       P2P or spoofed Requester ID         Verify ACS state on all bridges
7   Lazy IOTLB invalidation    Stale translations valid briefly    Strict invalidation mode
8   FLR race                    FLR/Hot Reset race window           Synchronized FLR handling
9   SMM bypass                  SMM code exempt from IOMMU          Boot Guard / Platform Secure Boot
10  DMA-remapping driver bugs   Bugs in OS IOMMU manager            OS patching
11  Hypervisor escape           Compromised hypervisor              VBS / measured boot; TPM attestation
12  Interrupt injection (no IR) Write arbitrary interrupts           Mandatory IR enforcement
13  RMRR/IVMD scope abuse       Fake ACPI tables cover attacker     Measured boot; runtime RMRR audit
                                physical ranges
14  Snoop-bit manipulation      Stale cache lines visible           Strict snoop enforcement
15  PASID confusion             Misconfigured PASID Table           PASID-aware IOMMU programming
16  DMAR/IVRS spoofing          Compromised firmware, fake tables   Measured boot covering firmware

Techniques 1–6: active attack surface for current commercial DMA cheats
Techniques 7–13: academic, APT, firmware-level contexts
Techniques 14–16: largely theoretical

FPGA Hardware

Xilinx PCIe Integrated Block

Hardened IP block handling:
- Physical Layer (PHY, 8b/10b or 128b/130b, LTSSM, equalization)
- Data Link Layer (sequence numbers, replay buffer, flow control)
- Transaction Layer framing and parsing
- Subset of Configuration Space

IP core documentation:
- PG054 for 7-series
- PG156 for UltraScale Gen3
- PG213 for UltraScale+ Gen4

User logic interfaces over AXI-Stream (TX/RX) and separate
config management: cfg_mgmt_* (7-series), cfg_ext_* (UltraScale).

Detection consequences:
- Default fingerprints leak through: hard block populates Config Space
  with Xilinx-characteristic byte patterns
- 7-series firmware authors who don't understand cfg_mgmt_* leave
  subtle behavioral differences (some CfgTLPs return hard-block defaults)

FPGA Family Hierarchy

Artix-7 (consumer/mid-range, GTP transceivers, PCIe Gen2):
Chip       LUTs      BRAM(Kbit)  PCIe Hard Block
XC7A35T    20,800    1,800       Gen2 x4
XC7A50T    32,600    2,700       Gen2 x4
XC7A75T    46,200    3,780       Gen2 x4
XC7A100T   63,400    4,860       Gen2 x4
XC7A200T   134,600   13,140      Gen2 x4
(Smaller than T35 have no hard PCIe block)

Kintex-7 (high-end, GTX transceivers):
XC7K70T    41,000    4,860       Gen2 x8
XC7K160T   101,400   11,700      Gen2 x8
XC7K325T   203,800   16,020      Gen2 x8 / Gen3 x4
XC7K410T   254,200   28,620      Gen3 x8

Zynq UltraScale+ (ARM Cortex-A53 cores, GTH/GTY):
ZU2EG/CG   ~47,000   ~5.3M      Gen3 x4
ZU3EG/CG   ~70,000   ~7.6M      Gen3 x4
ZU4EG/EV   ~88,000   ~11.0M     Gen3 x8
ZU5EG/EV   ~117,000  ~18.0M     Gen3 x8
ZU6EG/CG   ~230,000  ~32.1M     Gen3 x16
(EV-suffixed: hardened H.265 codec for DMA + video-capture boards)

Resource Constraints and Capability

BRAM size caps:
  shadow config + writable overlay + BAR emulation + state machines.
  T35 (1.8 Mbit) struggles with full 4 KB shadow + 64 KB BAR + jitter buffers.
  T100 (4.86 Mbit) fits comfortably.
  Zynq ZU3 (7+ Mbit) has effectively unlimited room.

LUT count caps behavioral complexity:
  Each subsystem (MSI generator, ASPM FSM, AER counter, BAR responder)
  costs thousands of LUTs. T35 holds 1–2; T100 the full set;
  Kintex/Zynq adds runtime-reconfigurable parameter tables.

PHY transceiver family (GTP/GTX/GTH/GTY) has measurably different
signal characteristics; can sometimes be inferred from root-port
performance counters independent of firmware spoofing.

Form Factors

Form Factor           Description                 Detection
────────────────────────────────────────────────────────────────────
M.2 NGFF Key M        Internal NVMe slot           Dominant modern form;
                                                    physically invisible
M.2 + USB3 bridge     M.2 board with FT601         Gaming PC sees only M.2
PCIe x1/x4 add-in     Traditional add-in card      More physically visible
External USB3          USB3-to-PCIe (legacy)        Mostly obsolete
Combo boards           DMA + HDMI capture +         Complex device tree;
                       input injection              HDMI activity is fingerprint

M.2 slot populations are partially auditable from software through
PCI topology, ACPI, SMBIOS, storage inventory, and vendor board databases.
SMBIOS slot records are often incomplete for M.2, so detection should
be probabilistic and board-model-aware.

pcileech Framework

Project Lineage

Five upstream repositories:
- pcileech:       Host-side C application with attack modules
- pcileech-fpga:  FPGA firmware in Verilog/SystemVerilog, per-board variants
- MemProcFS:      Virtual filesystem mounting target memory as /proc-like tree
- LeechCore:      Low-level device abstraction library
- vmm:            Memory analysis engine (vmm.dll API)

Pipeline: FPGA → LeechCore → PCILeech attack modules / MemProcFS analysis

FPGA Firmware Architecture

Key modules:
- pcileech_pcie_a7.v / _us.v:        Top-level Artix-7 / UltraScale integration
- pcileech_pcie_tlps128_bram_rdwr.v:  128-bit TLP source/sink (AXI-Stream)
- pcileech_pcie_cfgspace_shadow.v:    Shadow config space in BRAM
- pcileech_cfgspace.coe:              Init data (stock: Xilinx 10EE:0666)
- pcileech_bar_impl_zerowrite4k.v:    Default BAR — absorbs writes, returns zero
- pcileech_bar_impl_loopaddr.v:       Alternative BAR — echoes address
- pcileech_bar_impl_none.v:           Disables BAR (returns UR)
- pcileech_pcie_cfg_a7.v:             Config management via cfg_mgmt_*
- pcileech_mux.v:                     TLP multiplexer
- pcileech_fifo.v:                    Internal staging FIFO

Two key architectural choices:
1. Shadow config is spoofable but not spoofed by default.
   .coe ships with placeholder Xilinx IDs. User must overwrite
   with real donor's dump and resynthesize.
2. BAR controller is functionally inert.
   zerowrite4k doesn't emulate device behavior.
   Active BAR probing catches stock builds in one operation.

Host-Side MemProcFS

Mounts target memory as filesystem:
M:\
├── pid\1234\
│   ├── name.txt
│   ├── modules\       ← loaded module list
│   ├── handles\
│   ├── vad\           ← virtual address descriptors
│   ├── memmap.txt
│   └── minidump\
├── sys\
├── name\game.exe\     ← lookup by process name
└── forensic\
    ├── yara\
    ├── timeline\
    └── registry\

Cheat development pattern:
1. Development phase: MemProcFS, signature search, cross-references
   → slow, broad scanning to find entity manager / player array / view matrix
2. Execution phase: custom app via vmm.dll/LeechCore,
   periodic reads of known offsets at 60–240 Hz
This split is fundamental to detection — behavioral analysis
targets the execution phase's statistical signature.

Stock Firmware Fingerprints

Vanilla pcileech-fpga build exhibits:
- VID/DID 10EE:0666 (Xilinx placeholder)
- Xilinx 7-series PCIe IP signature bytes at characteristic offsets
- DSN Extended Capability absent or default
- No AER, LTR, ARI, ATS, or SR-IOV capabilities
- BAR0 mapped (DMA window); BAR1–5 disabled or all-ones
- BAR reads return zero (zerowrite4k) or echo address (loopaddr)
- MSI capability present but no interrupts ever fire
- Config reads complete in deterministically uniform time
  (BRAM lookup with fixed pipeline depth, near-zero variance)
- LTSSM never leaves L0 after training; no ASPM transitions
- AER correctable-error count stays at zero
- Power management never leaves D0
- Class Code matches donor placeholder but no class-specific behavior

Configuration Space Spoofing

Bridge vs Emulated Firmware

Bridge firmware:
  Patches identity fields via Vivado's PCIe IP Core GUI
  (VID, DID, Subsystem IDs, Class Code, sometimes DSN).
  Fast to produce, but 7-series hard IP generates internal capability
  blocks at characteristic offsets that retain FPGA-specific fingerprints.

Emulated (1:1) firmware:
  Implements complete shadow Configuration Space in BRAM.
  Entire 4 KB extended config space initialized from real donor device hex dump.
  When OS issues CfgRd TLP, firmware responds from BRAM.
  IP Core's default registers never appear on the bus.

  Common bugs in emulated firmware:
  - First 16 bytes still come from IP block (mux priority)
  - Type 1 config reads not intercepted
  - Capability blocks bypassed in GUI still leak defaults

Shadow Configuration Space Implementation

Requirements:
1. Intercept incoming CfgRd0/CfgWr0 TLPs
2. Decode target offset
3. Look up value in BRAM
4. Build Completion TLP with correct Completer ID, status, payload
5. Send Completion through hard IP block

4 KB coverage at 4-byte granularity = 1,024 entries × 4 bytes = 4 KB BRAM.
Well within even T35's resources.

Overlay RAM and Writable Register Emulation

Real devices have writable registers. Firmware that returns correct
values on reads but drops writes creates detectable inconsistency.

Detection probe:
  write Command[BME] = 1 → read Command[BME]
  write Command[BME] = 0 → read Command[BME]
  Real silicon: bit toggles. Naive shadow: bit stays at BRAM init value.

Overlay RAM merges at read time:
  response = (base_value & ~writable_mask) | (overlay_value & writable_mask)

The catch: writable mask is register-specific:
- Command Register: different reserved bits than Device Control
- MSI Address Low: bits [1:0] reserved-zero
- BAR: type bits in [3:0] depend on I/O/memory, prefetchable
- Status Register: W1C bits — writing 1 clears, writing 0 no change
- AER Status: W1C across the board

Naive implementations with single global mask fail because
reserved-bit and W1C behavior diverges. Detection probes
W1C cases: write 0x00000000 to Correctable Error Status,
then write known-1 patterns, verify read-back semantics.

Donor Card Extraction

Every serious emulated firmware starts with a donor — a physical
PCIe card whose complete identity is cloned. Not just VID/DID —
entire 4 KB config space, all capabilities, BAR size masks,
MSI/MSI-X table layouts, all extended capabilities, DSN.

Extraction tools:
- lspci -d [VID:DID] -vvv -xxxx (full ECAM dump)
- setpci -s <BDF> ECAP_AER+0x4.L
- Direct ECAM mapping in kernel-mode tools
- VFIO passthrough + programmatic dump

Donor selection pitfalls:
- Duplicating hardware already in gaming PC → caught instantly
- Must match FPGA's actual PCIe capability (x16 Gen4 on Gen2 x1 fails)
- Silicon generation must align with claimed link speed
- Class-functional A/B testing detects non-functional donors

Firmware Sophistication Tiers

Tier  Description              Detection Difficulty
──────────────────────────────────────────────────────────────────
0     Stock                    Trivial — VID/DID blacklist
      Unmodified upstream;
      10EE:0666 IDs

1     Bridge                   Easy — signature residue,
      .coe replaced with       driverless device, BME clearance
      donor VID/DID only

2     Shadow config            Medium — R/W consistency probe
      Full 4 KB shadow;        catches silent write drops
      no overlay RAM

3     Overlay RAM              Medium-hard — BAR probe still fails;
      Shadow + writable layer  class-functional A/B fails
      with per-register masks

4     BAR MMIO + MSI           Hard — interrupt distribution analysis;
      Donor-class BAR memory   BAR-region content baselining
      responder + MSI generator

5     Behavioral emulation     Very hard — multi-session statistical
      Latency jitter, ASPM,    analysis required
      AER counters, donor-
      class-specific behavior

6     Private + randomized     Currently undefeated by hardware-layer
      Per-customer layout,     signatures alone
      capability shuffling,
      custom hard-IP integration

Each tier transition corresponds to a defender technique
that closed the prior tier's escape hatch.
Cost-to-implement is nonlinear: 0–2 are public/trivial,
3 needs moderate Verilog, 4–5 need deep PCIe knowledge,
6 is sustained engineering.

Donor Exhaustion

Each detected donor class becomes nonviable. Remaining viable donors
trend toward obscure industrial cards — which themselves become
a behavioral signal on consumer gaming PCs.

Donor Class            Detection Status
─────────────────────────────────────────────────────────
Wi-Fi adapters         Heavily detected; class-spoof checks ubiquitous
Wired NICs             "NIC with loaded driver but zero packets" test
NVMe SSDs              Harder; detected via missing namespace activity
Audio codecs           Lower bandwidth fits FPGA; class checks present
USB host controllers   Child-device enumeration check breaks naive
Capture cards          Harder (genuinely idle when no source)
Industrial / OEM SKUs  Increasingly only viable; demographic signal
Server-class accel.    Physically implausible on consumer boards

Detection at the PCIe Layer

Configuration Integrity

- VID/DID/SVID/SDID against known-real-silicon list
- Capability-chain walk: DWord-aligned Next pointers, no overlaps, no cycles
- Signature-residue scanning: Xilinx 7-series default byte patterns at
  known relative offsets (Device Capabilities field bits, reserved bits,
  VSEC vendor IDs)
- Capability presence consistency: donor model's known caps must all be present
- BAR mask verification: write 0xFFFFFFFF, compare size mask against donor

BAR Memory Read Probing

Send Memory Read TLPs to BAR ranges, validate responses by donor class:

NIC donor BAR0: register layout with receive/transmit ring descriptors,
  interrupt mask, link status. Offset 0x00 returns specific bit pattern.

NVMe donor BAR0: NVMe controller registers — CAP (MQES, DSTRD,
  MPSMIN/MPSMAX), VS, CC, CSTS, AQA, ASQ/ACQ, doorbells at 0x1000.

USB XHCI donor BAR0: Capability Registers (CAPLENGTH, HCSPARAMS, HCCPARAMS).

zerowrite4k returns all-zeros; loopaddr echoes address. Both are
trivially distinguishable from real content.
Tier-4 firmwares implement donor-class responders but usually only
cover registers checked at probe time, leaving others divergent.

R/W Consistency Probing

- Command Register: toggle Memory Space, I/O Space, Bus Master Enable
- Device Control: change MPS, MRRS, Error Enables
- MSI Control: toggle Enable, change Multiple Message Enable
- Walk every W1C bit (Status, AER Status): write 1s, confirm clear
- Walk reserved bits: write 1s, confirm read-back as 0
- Per-register writable masks must match donor

Tier-2 (no overlay) fails immediately.
Tier-3 (single global mask) fails on W1C and reserved-bit cases.

LTSSM and Link-State Validation

Sample PCIe Express Capability Link Status over time:
- Negotiated Width (Link Status[9:4]): consistent with donor deployment
  and FPGA hard block capability
- Current Link Speed (Link Status[3:0]): track slot's actual speed
- Gen4 x8 capability but Gen2 x1 Link Status = contradiction in one read
- DLL Active (Link Status[13]): should be 1 during operation
- Slot Clock Config (Link Status[12]): match real common-clock state

ASPM Behavioral Validation

Real devices claiming ASPM exhibit characteristic L0 ↔ L1 transitions.

Spoofed device anomalies:
- Claims ASPM capability but never transitions out of L0
- Transitions with exit latency inconsistent with claimed value
- Never reaches L1.1 / L1.2 when donor and platform both support

Sample Link Status "DLL Active" bit over time + PMC counters.

AER Baselining

- Departure from donor baseline: per-silicon correctable-error footprint
  should be stable. Implausibly clean (zero correctables when donor
  normally produces Bad TLP / Replay Timer Timeout) is anomalous.
- Implausible Header Log content (default/zeroed values)
- Inconsistent UR/CA responses to probes of unimplemented offsets

Completion Latency Fingerprinting

Real silicon: completion latency shaped by DRAM contention,
internal arbiters, PCIe pipeline depth → heavy-tailed distributions.

BRAM-backed emulators: fixed FPGA clock cycles + PCIe transit
→ much lower variance, even if mean is similar.

Detection signal is distribution shape, not absolute mean.

Statistical methods:
- Kolmogorov–Smirnov test: compare empirical CDFs
- Hill estimator: estimate tail index (real silicon has non-trivial tail;
  emulated firmware without stochastic jitter has no tail)
- Anderson-Darling test: sensitive to tail differences

Collect N latency samples (Memory Reads to BAR), compare against
per-donor reference distribution, flag devices deviating
beyond per-test-statistic threshold.

Tier-5 firmwares add LFSR-based jitter generators, but matching
real distribution shape (mean, variance, tail index, mode count)
requires modeling donor's DRAM access pattern.

MSI/MSI-X Behavioral Validation

A device with MSI Enable, Address/Data programmed, and attached driver
should produce interrupts:

- Zero interrupts when driver should exercise device = anomalous
- Implausibly uniform arrival times (exact 60 Hz heartbeat)
  = timer-driven generator, not event-driven
- Implausibly bursty patterns not matching donor class

Monitor via OS interrupt accounting, ETW/performance telemetry,
driver counters, kernel instrumentation.

Cheat-Phase Access Pattern Recognition

Two distinct patterns:

Development phase:
  Slow, broad scanning, signature search, MemProcFS walking.
  Rare during live competitive play.

Execution phase:
  Narrow, periodic reads (60–240 Hz) of small offset set
  (player positions, entity arrays, view matrices).

Execution phase statistical signature:
  High temporal periodicity, low address-space breadth,
  alignment to game-frame intervals.

Distinguishing features:
- Fano factor
- Autocorrelation at frame intervals
- Address-space coverage entropy

Honeypot regions complement this when combined with:
  IOMMU denial/fault logging, hypervisor-managed protected pages,
  device-domain sandboxing, decoy IOVA mappings, or server-side
  behavioral traps tied to fake game objects.

Hot-Plug and Link Disruption Events

Firmware swaps and donor reconfigurations require link renegotiation.
Three observable signals:

1. AER Surprise Down — uncorrectable fatal error logged when link
   drops unexpectedly mid-session.

2. Slot Status hot-plug events — Presence Detect Changed,
   Data Link Layer State Changed, MRL Sensor Changed.
   Register via IoRegisterPlugPlayNotification with
   EventCategoryTargetDeviceChange or EventCategoryDeviceInterfaceChange.

3. LTSSM re-training cycles — Link Training bit toggling,
   transient Recovery state.

A device that surprise-downs then reappears with subtly different
Config Space (adjusted MSI layout, different latency profile,
different VID/DID) = fingerprint of mid-session firmware reload.

Detection at the IOMMU Layer

Fault-Rate Monitoring

Sample per-device fault counters from IOMMU fault-recording structure
(VT-d Fault Recording Registers, AMD-Vi Event Log) or from WHEA.

Pipeline:
1. Snapshot baseline at session start
2. Sample at intervals; compute per-device fault rate
3. Threshold-trigger additional PCIe probing on devices exceeding baseline
4. Aggregate evidence across signals before action

Domain Assignment Audit

Walk IOMMU domain assignments for anomalies:
- Devices on passthrough/identity domains when strict mode is active
- Devices in unexpectedly large IOMMU groups (poor ACS topology)
- Multiple devices sharing Domain ID when they shouldn't

ACS Topology Verification

Walk PCIe bridge topology between every endpoint and root complex.
For each bridge with ACS Capability:
- Verify Source Validation (SV) enabled
- Verify Translation Blocking (TB) enabled
- Verify P2P Request Redirect (RR) and Completion Redirect (CR) enabled

Bridges without ACS at all = isolation holes by topology.
Bridges with ACS Capability but Control bits not set = misconfiguration.

IOMMU as Containment Primitive

Active containment when suspect device is identified:

1. IOMMU domain re-remapping:
   Reprogram device's domain to sandbox memory instead of revoking access.
   Cheat keeps "reading" but receives garbage data.

2. Bus Master Enable clearance:
   Toggle Command[2] to 0. Effective for tier-0 through tier-3.
   Cheats monitoring BME can race; may need repeated clearance.

3. Downstream Port Containment (DPC):
   When DPC is enabled on root port (Extended Cap ID 0x001D),
   triggers cause port to enter Contained state — all TLPs dropped,
   completions blocked, link logically isolated.
   Enforced at upstream port, no race against firmware-side BME restore.
   Not universal on all chipsets.

4. Anti-cheat-owned device domain:
   For device owned by AC driver, allocate and map only sandbox IOVAs,
   never expose game memory.

5. Hypervisor-integrated enforcement:
   Enforce policy above guest kernel by trapping IOMMU MMIO programming.
   Requires privileged platform integration.

Hypervisor-Level Defense

EPT-Based Memory Protection

EPT translates Guest Physical Address (GPA) to Host Physical Address (HPA).
A hypervisor owning the EPT can:

- Mark game memory as read-execute-only in EPT, even if guest OS marks
  read-write. Writes cause EPT violations the hypervisor traps.
- Hide pages by clearing EPT mappings.
- Implement watchpoints on specific GPA ranges.

IOMMU blocks DMA at device-to-memory boundary;
EPT blocks CPU access at guest-to-host boundary.
A cheat combining DMA card with kernel-mode payload faces both.

VBS, HVCI, and VTL Split

VBS creates Secure Kernel (VTL 1) alongside regular kernel (VTL 0)
in a Hyper-V partition. HVCI uses VTL 1 to enforce no executable page
in VTL 0 is simultaneously writable.

Anti-cheat interaction:
- Register VTL 1 callouts to validate guest state
- Attest against System Guard Secure Launch (DRTM) measurements
- Rely on HVCI to block BYOVD patterns

Combined VBS+HVCI+TPM+SecureBoot is assumed baseline for
serious anti-cheat threat models.

SMM Considerations

System Management Mode (Ring -2) runs in SMRAM, isolated from
OS and hypervisor. SMM handlers can read all physical memory
and are exempt from IOMMU enforcement.

A vulnerable SMI handler = path to arbitrary memory access
without IOMMU mediation.

Mitigations:
- Intel Boot Guard / AMD Platform Secure Boot (firmware signatures)
- SMI Transfer Monitor (STM): hypervisor-resident, treats SMM as
  constrained guest. Rarely implemented by board vendors.
- Runtime verification of SMM lockdown registers

PCIe IDE (Integrity and Data Encryption)

IDE adds link-level TLP protection: integrity (MAC-based, required)
and confidentiality (encryption, optional).
Incorporated into PCIe 6.0 base specification.

Valuable against: physical link interposers, malicious retimers/switches,
traffic tampering.

NOT a DMA-cheat silver bullet: cheat installed as the endpoint
still originates legitimate IDE-protected TLPs after key establishment.
IDE raises the bar for passive bus sniffing but endpoint identity,
IOMMU policy, ACS topology, ATS policy, and attestation remain required.

External Trust Anchors

TPM 2.0

Hardware (or firmware-isolated) cryptoprocessor with:
- PCRs: extend-only registers, PCR[n] = SHA256(PCR[n] || new_value)
- Persistent keys: EK (manufacturer), SRK (provisioned), user-defined
- Hierarchy: Endorsement, Storage, Platform, Null

PCR Allocation (Measured Boot)

PCR   Measured Content
0     SRTM / Core Root of Trust — UEFI firmware code
1     Platform configuration data — firmware variables
2     Option ROM code — third-party UEFI drivers
3     Option ROM configuration and data
4     IPL / boot manager binary (e.g., bootmgfw.efi)
5     IPL configuration — GPT/partition table, boot config
6     Manufacturer-specific / state-transition events
7     Secure Boot policy (PK, KEK, db, dbx)
8–15  OS-defined (BitLocker binds to PCR[11])
16    Debug
17–22 DRTM measurements (Secure Launch)
23    Application-defined

Remote Attestation Cryptography

Trust property: compromised local kernel cannot forge PCR values.

Flow:
1. Server sends nonce
2. Client calls TPM2_Quote(AIK, PCR_selection, nonce)
   TPM computes PCR composite, builds TPMS_ATTEST, signs with AIK
3. Client sends Quote + AIK certificate chain
4. Verifier checks:
   - AIK signature valid
   - AIK certificate chains to trusted TPM manufacturer root
   - EK on known-EK list (binds AIK to real TPM)
   - Nonce matches (freshness, replay protection)
   - PCR composite matches known-good value

A rootkit loading after measured boot cannot alter PCRs.
A software simulator cannot produce valid Quote without TPM private key.

DRTM and Secure Launch

Dynamic Root of Trust for Measurement allows "late launch" —
trusted execution environment established after OS boot,
measurement captured into PCR[17].

Intel: GETSEC[SENTER] (TXT)
AMD: SKINIT (SVM extension)

CPU enters measured execution state, Secure Loader Block (SLB)
loaded and hashed into PCR[17], control transfers to
Measured Launch Environment (MLE).

Microsoft System Guard Secure Launch uses this to load HVCI's
hypervisor into measured state independent of SRTM chain.
Defender requests Quote including PCR[17] and matches against
known-good MLE measurement.

UEFI Pre-Boot DMA Integrity

Pre-Boot DMA Protection: firmware must isolate DMA-capable devices'
I/O buffers before ExitBootServices().

ACPI indicators:
- Intel: DMA_CTRL_PLATFORM_OPT_IN_FLAG in DMAR table flags
- AMD: DMA remap support bit in IVRS IVinfo field

Windows PCR[7] event: firmware extends EV_EFI_ACTION with
"DMA Protection Disabled" when IOMMU/Kernel DMA Protection
is disabled, providing attestation hook.

Combined picture:
PCR[0]/PCR[7] anchor firmware and DMA-protection policy,
ACPI tables describe runtime IOMMU config,
documented DMA interfaces show what OS actually remaps,
attestation ties local claims to remote-verified known-good policy.

Layered Detection Pipeline

Pre-Game Environmental Verification

- IOMMU active and applied to DMA-capable PCIe paths
- Interrupt Remapping enabled
- Secure Boot enabled
- VBS/HVCI active
- TPM 2.0 present and provisioned
- Attestation Quote validates against expected policy
- BIOS/UEFI version not in known vulnerable pre-boot DMA list
- ACS topology walk: all relevant bridges enforce SV, TB, RR, CR

PCIe Inventory Pass

- Enumerate all PCIe devices via PnP tree
- Full 4 KB config-space dump for each
- Check device problem codes (DEVPKEY_Device_ProblemCode)
- Cross-reference SMBIOS slot inventory with populated devices

Configuration Integrity Per Device

- VID/DID/SVID/SDID against known-good list
- Capability-chain walk and validation
- Signature-residue scan
- BAR mask verification
- R/W consistency probing
- Compare against per-donor reference database

Behavioral Sampling During Play

- Periodic Link Status reads (LTSSM, ASPM transitions)
- AER counter snapshots
- Per-device interrupt rate and distribution
- Per-device IOMMU fault rate
- BAR-region content sampling for class consistency

Statistical Analysis Over Session

- Latency distribution comparison (KS test, Hill estimator)
- Interrupt arrival distribution
- ASPM transition rate

Cheat-Phase Detection

- Honeypot region access
- Memory access frequency / locality classifiers

Containment Before Verdict

Each detection produces evidence, not a verdict.
Verdict informed by:
- Multi-signal correlation (single signals can false-positive;
  combinations rarely do)
- Server-side aggregation across sessions
- Behavioral verification (input timing, gameplay statistics)

While verdict accumulates, containment protects the live match:
IOMMU re-remapping to sandbox, BME clearance, or EPT-level
game-process protection degrades cheat effectiveness in real time.

Realistic Limits

A firmware that:
- Clones donor byte-for-byte (full 4 KB config + all capabilities)
- Implements donor-class BAR MMIO, MSI generation, overlay RAM
- Adds completion-latency jitter matching donor distribution
- Generates plausible AER correctable-error rates
- Transitions through ASPM states like the donor
- Uses donor not present in gaming PC and not on blacklists
- Operates only within driver-mapped IOMMU domains (legitimate-path exfil)
- Avoids honeypot regions through gameplay-aware address whitelisting

...can defeat every PCIe-layer and IOMMU-layer signature in isolation.

This is why external trust anchors are required:
TPM attestation, measured boot, and server-side correlation operate
outside the "spoof a PCIe endpoint" problem.

A perfectly emulated DMA card cannot forge a TPM Quote.
But the verifier must bind Quote to allowlist/blocklist of BIOS versions,
DMA-protection events, Secure Boot state, VBS/HVCI, and IOMMU policy.

The cost of defeating all four layers simultaneously
(PCIe + IOMMU + hypervisor + attestation) exceeds typical cheat value.

Forensic Evidence Capture

What to Capture

Artifact                     Source                          Purpose
──────────────────────────────────────────────────────────────────────────────
Full 4 KB config dump        Bus interface / ECAM            Donor ID post-hoc
Capability chain walk        Parsed from config              Capability presence
PCIe link state history      Link Status over session        LTSSM anomaly proof
MSI/MSI-X arrival timeline   OS interrupt telemetry          Rate claim refutation
AER correctable counts       AER capability registers        Baseline outlier proof
IOMMU fault log entries      WHEA/ETW, Driver Verifier       DMA-violation proof
IOMMU domain assignments     IOMMU manager state walk        Passthrough anomaly
ACS bridge state             Bridge enumeration              Isolation proof
Honeypot access record       Hypervisor EPT trap log         Unauthorized read evidence
TPM PCR snapshot             TPM Quote API                   Boot-chain attestation
MCFG / DMAR / IVRS tables   ACPI subsystem                  Platform config baseline
SMBIOS slot inventory        DMI subsystem                   Slot-population audit
BIOS version + patch level   SMBIOS                          Pre-Boot DMA fix verify
Latency-distribution hists   Per-session sampling            Statistical fingerprint

Multi-Signal Correlation

Strongest evidence packages combine:
1. Hardware-layer signal (config space, BAR, link state)
2. Behavioral-layer signal (interrupt distribution, IOMMU fault rate, honeypot)
3. Temporal correlation (hardware signal preceded behavioral by plausible interval)

Three independent signals push false-positive rates below threshold
where appeals become a practical workload.

PCIe Protocol Captures

A PCIe protocol analyzer (interposer) produces the strongest forensic evidence:
full TLP-level captures with byte-exact accuracy and nanosecond timestamps.

Commercial analyzers capture every TLP, DLLP, and physical-layer ordered set.
Traces can be replayed to confirm fingerprinting findings.

Cost (tens to hundreds of thousands USD) limits routine use, but for
high-profile cases (competitive integrity, tournament, prosecution),
protocol-level captures by independent labs are the unambiguous reference.

Thunderbolt / USB4 DMA

Attack Surface

- Thunderbolt 1-4 / USB4 provide direct PCIe tunneling
- Hot-plug capable: device can be attached at runtime
- Pre-boot DMA: device has memory access before OS loads
- Thunderbolt Security Levels:
  - SL0 (None): no security, legacy mode
  - SL1 (User Auth): user must approve new devices
  - SL2 (Secure Connect): device must match previously approved UUID
  - SL3 (No PCIe tunneling): completely disables DMA

Thunderbolt-Specific Attacks

- Thunderclap: malicious Thunderbolt peripherals bypass IOMMU
- Device re-identification: change UUID to bypass SL2
- OS-level Thunderbolt driver vulnerabilities
- PCIe tunneling through USB4 hubs

Defensive Measures

- Kernel DMA Protection (Windows 10 1803+): automatic IOMMU for hot-plug
- Thunderbolt firmware verification
- Platform-level: BIOS setting to disable Thunderbolt PCIe tunneling
- macOS: T2 chip enforces DMA restrictions on Thunderbolt ports

Shadow CR3 / Split TLB

Page Table Manipulation

- Maintain two sets of page tables (two CR3 values):
  - "Clean" CR3: legitimate page tables visible to anti-cheat
  - "Shadow" CR3: modified page tables with cheat-accessible mappings
- Swap CR3 before/after anti-cheat inspection windows
- Combine with EPT manipulation for hypervisor-level split

Split TLB Techniques

- Desync instruction TLB (iTLB) and data TLB (dTLB):
  - Execute code from one physical page
  - Read data from another physical page at same virtual address
- Requires precise TLB invalidation control
- Hypervisor can create EPT-based split: execute on page A,
  read on page B, at same GPA
- Anti-cheat mitigation: TLB flush + re-walk, serializing instructions

Memory Access Techniques

Physical Memory Reading

// Typical pcileech API usage
HANDLE hDevice;
BYTE buffer[0x1000];
pcileech_read_phys(hDevice, physAddr, buffer, sizeof(buffer));

Virtual Address Translation

// Walk page tables: PML4 → PDPT → PD → PT → Physical
PHYSICAL_ADDRESS TranslateVA(UINT64 cr3, UINT64 virtualAddr) {
    UINT64 pml4e = ReadPhys(cr3 + PML4_INDEX(virtualAddr) * 8);
    UINT64 pdpte = ReadPhys(PFN(pml4e) + PDPT_INDEX(virtualAddr) * 8);
    UINT64 pde = ReadPhys(PFN(pdpte) + PD_INDEX(virtualAddr) * 8);
    UINT64 pte = ReadPhys(PFN(pde) + PT_INDEX(virtualAddr) * 8);
    return PFN(pte) + PAGE_OFFSET(virtualAddr);
}

DTB (Directory Table Base) Finding

- Scan physical memory for valid CR3 values
- Look for kernel structures
- Use signature scanning
- Validate page table entries

Security Considerations

Ethical Use

- Security research only
- Authorized testing environments
- Responsible disclosure
- Legal compliance

Risk Awareness

- Physical hardware access required
- Potential system instability
- Detection by advanced anti-cheat
- Legal implications

Resource Organization

The README contains:

pcileech and derivatives
FPGA firmware projects
DMA libraries
Integration tools
Device emulation firmware
Anti-detection implementations

Data Source

Important: This skill provides conceptual guidance and overview information. For detailed information use the following sources:

1. Project Overview & Resource Index

Fetch the main README for the full curated list of repositories, tools, and descriptions:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/README.md

The main README contains thousands of curated links organized by category. When users ask for specific tools, projects, or implementations, retrieve and reference the appropriate sections from this source.

2. Repository Code Details (Archive)

For detailed repository information (file structure, source code, implementation details), the project maintains a local archive. If a repository has been archived, always prefer fetching from the archive over cloning or browsing GitHub directly.

Archive URL format:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/archive/{owner}/{repo}.txt

Examples:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/archive/ufrisk/pcileech.txt
https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/archive/000-aki-000/GameDebugMenu.txt

How to use:

Identify the GitHub repository the user is asking about (owner and repo name from the URL).
Construct the archive URL: replace {owner} with the GitHub username/org and {repo} with the repository name (no .git suffix).
Fetch the archive file — it contains a full code snapshot with file trees and source code generated by code2prompt.
If the fetch returns a 404, the repository has not been archived yet; fall back to the README or direct GitHub browsing.

3. Repository Descriptions

For a concise English summary of what a repository does, the project maintains auto-generated description files.

Description URL format:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/description/{owner}/{repo}/description_en.txt

Examples:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/description/00christian00/UnityDecompiled/description_en.txt
https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/description/ufrisk/pcileech/description_en.txt

How to use:

Identify the GitHub repository the user is asking about (owner and repo name from the URL).
Construct the description URL: replace {owner} with the GitHub username/org and {repo} with the repository name.
Fetch the description file — it contains a short, human-readable summary of the repository's purpose and contents.
If the fetch returns a 404, the description has not been generated yet; fall back to the README entry or the archive.

Priority order when answering questions about a specific repository:

Description (quick summary) — fetch first for concise context
Archive (full code snapshot) — fetch when deeper implementation details are needed
README entry — fallback when neither description nor archive is available

name	dma-attack-techniques
description	Guide for PCIe DMA threat modeling, FPGA-based memory access, and defensive implications in game security. Use this skill when researching pcileech, BAR and TLP behavior, page-table walking, IOMMU or VT-d, device impersonation, firmware mimicry, or DMA detection and mitigation in game security research.

DMA Attack Techniques

Overview

README Coverage

Cheat > DMA
Anti Cheat > Detection:DMA
Anti Cheat > Detection: Hacked Hypervisor
Anti Cheat > Detection:Virtual Environments
Anti Cheat > Detection:HWID
Windows Security Features

Threat Model

External DMA Cheat Architecture

A modern external DMA cheat consists of three components:

1. Cheat PC — runs the cheat application, signature databases,
   aim assistance, ESP rendering, and a network/USB link to the gaming PC.

2. DMA Card — an FPGA-based PCIe endpoint installed in the gaming PC
   (typically M.2 NVMe slot). Exposes a memory-read/write interface to
   the cheat PC. Uses Bus Master capability to issue Memory Read TLPs
   against the gaming PC's RAM.

3. Actuator (optional) — a USB HID emulator (microcontroller-based) that
   injects keyboard/mouse input on the gaming PC according to commands
   from the cheat PC, closing the loop.

The structural property that makes this threat distinctive:
no attacker code executes on the gaming PC. The DMA card performs
hardware-level transactions between the FPGA and the gaming PC's
memory controller, mediated by the chipset and (when configured) the IOMMU.
The gaming PC's OS, drivers, and anti-cheat see only a PCIe device
announcing itself through Configuration Space and performing what looks
like ordinary DMA.

Three Defense Layers

Layer              Mechanism                    What It Catches
─────────────────────────────────────────────────────────────────────────────
PCIe-layer         Inspect Config Space &        Identity mismatch — spoofed
fingerprinting     behavior at the bus level     device that doesn't match
                                                 real silicon's full signature

IOMMU              Use the IOMMU to bound        Out-of-domain DMA — device
enforcement        what physical memory the       trying to read game memory
                   device can touch               it wasn't allocated

External           TPM-anchored measured boot,   Boot-chain compromise — IOMMU
attestation        cloud-verified                or kernel itself subverted

PCIe Protocol Stack

Three Protocol Layers

Layer              Unit                Function
────────────────────────────────────────────────────────────────
Transaction        TLP                 Memory/IO/Config reads & writes,
                                       completions, messages
Data Link          DLLP                Acknowledgements, flow control
                                       credits, power management
Physical           Ordered Sets        Link training, equalization,
                                       clock recovery

A real device's behavior is shaped by all three layers.
An FPGA emulating a real device only fully controls the Transaction Layer;
the Physical and Data Link layers leak fingerprints that
BRAM-based emulation cannot fully hide.

TLP (Transaction Layer Packet) Format

Every TLP begins with a 3 DW (12-byte) or 4 DW (16-byte) header.
4 DW headers are used for 64-bit addresses and certain message types.

First DWord (DW0) encoding:
Bits       Field         Notes
[31:29]    Fmt[2:0]      Header format + data presence
[28:24]    Type[4:0]     TLP type (combined with Fmt)
[22:20]    TC[2:0]       Traffic Class (default 0)
[18]       Attr[2]       ID-Based Ordering (IDO)
[15]       TD            TLP Digest (ECRC trailer)
[14]       EP            Poisoned data
[13:12]    Attr[1:0]     Relaxed Ordering, No Snoop
[11:10]    AT[1:0]       Address Type (critical for ATS bypass)
[9:0]      Length[9:0]   Payload length in DWords (0x000 = 1024 DW = 4 KB)

Fmt[2:0] encoding:
000 = 3 DW header, no data
001 = 4 DW header, no data
010 = 3 DW header, with data
011 = 4 DW header, with data
100 = TLP Prefix

Key TLP types (Fmt + Type combinations):
Fmt  Type     TLP
000  0_0000   MRd (Memory Read, 3DW / 32-bit addr)
001  0_0000   MRd (Memory Read, 4DW / 64-bit addr)
010  0_0000   MWr (Memory Write, 3DW)
011  0_0000   MWr (Memory Write, 4DW)
000  0_0100   CfgRd0 (Config read — terminate at this device)
010  0_0100   CfgWr0
000  0_0101   CfgRd1 (Config read — forwarded by bridges)
010  0_0101   CfgWr1
000  0_1010   Cpl (Completion without data)
010  0_1010   CplD (Completion with data)
001  1_0rrr   Msg (Message, no data)
011  1_0rrr   MsgD (Message with data)

Detection-Relevant DW0 Fields

TC[2:0] — Traffic Class. Default 0; real silicon rarely uses non-zero TC.
A spoofed device generating non-zero TC is anomalous.

Attr[2:0] — RO/NS/IDO. A device emulating a NIC must follow that NIC's
typical NS/RO usage pattern; mismatches are visible.

AT[1:0] — Address Type:
  00 = Untranslated (IOMMU will translate)
  01 = Translation Request (ATS only)
  10 = Translated (device claims it has already translated via ATS)
This field is the basis of ATS bypass attacks.

TD — TLP Digest. If set, an ECRC trailer is present.
EP — Poisoned. Indicates data is known-bad.

TLP Routing and Requester ID

Three routing modes:
- Address routing — Memory and IO TLPs, matched against bridge apertures
- ID routing — Config TLPs and Completions, by BDF
- Implicit routing — Some Messages (broadcast, terminate at root)

DW1 carries the Requester ID (16 bits = Bus:Device:Function, "BDF")
and an 8-bit Tag for matching completions to requests.

The Requester ID is the entire input to per-device security policy:
IOMMU translation lookup, ACS source validation, AER source ID,
MSI/MSI-X routing. Anything that lets a device send TLPs with a
different Requester ID fundamentally compromises isolation.

Transaction categories:
- Posted (P) — fire-and-forget (Memory Writes, Messages)
- Non-Posted (NP) — requires completion (Memory Reads, IO/Config R/W)
- Completion (Cpl/CplD) — response to Non-Posted requests

Completion Status codes:
000 = Successful Completion (SC)
001 = Unsupported Request (UR)
010 = Configuration Request Retry Status (CRS)
100 = Completer Abort (CA)

UR vs CA distinction matters for spoofing detection — real silicon
responds differently to malformed config accesses vs accesses to
unimplemented offsets. Many spoofed firmwares hard-code one or the other.

Memory Read Completion Splitting

A single Memory Read TLP returns up to Max_Read_Request_Size (MRRS) bytes.
The completer splits the payload at any boundary >= RCB
(Read Completion Boundary, 64 or 128 bytes).
Each fragment cannot exceed Max_Payload_Size (MPS).

Each Completion carries:
- Lower Address[6:0] — lowest 7 bits of first byte address
- Byte Count[11:0] — bytes remaining (last fragment's Byte Count
  equals its own payload length)
- BCM — PCI-X compatibility (typically 0)
- Tag — matches originating MRd's Tag

The split pattern (fragment count, boundary positions) is a
strong fingerprint: real memory controllers produce characteristic
distributions of fragment sizes and inter-fragment gaps.
BRAM-backed emulators producing perfectly uniform 64-byte fragments
at constant cadence are anomalous.

Tag Space and Fingerprinting

- 5-bit Tag (original): 32 outstanding non-posted requests per Requester ID
- Extended Tag (PCIe 1.1+, Device Control[8]): 8-bit / 256 outstanding
- 10-Bit Tag (PCIe 4.0+, Device Control 2[12]): 1024 outstanding

Tag turnover discipline — which tags get reissued and how quickly —
reflects the device's internal request tracking pipeline.
Firmware that issues reads with no tag turnover (same tag, or monotonic
beyond negotiated limit) is observably distinct from real silicon.

MPS and MRRS as Fingerprints

Both are negotiated once at link bring-up and fixed for the session.
- Device Capabilities[2:0]: Max_Payload_Size_Supported
  (0=128, 1=256, 2=512, 3=1024, 4=2048, 5=4096 bytes)
- Device Control[7:5]: current MPS (must be <= Supported,
  set to minimum of all devices in hierarchy)
- Device Control[14:12]: Max_Read_Request_Size (same encoding)

The discriminator is donor consistency: a device claiming a donor
that is known to support larger payloads, different tag behavior,
or a different negotiated profile should match that donor under
the same root-port constraints.

Data Link Layer

DLLPs provide reliable delivery between Physical and Transaction layers.

DLLP          Purpose
─────────────────────────────────────────
Ack           TLP received correctly
Nak           TLP received with error; sender must replay
InitFC1/2     Flow control credit initialization at link bring-up
UpdateFC      Ongoing flow control credit updates
PM_*          Power management (L0s, L1 entry/exit)
Vendor        Vendor-defined

Flow control credits are per TLP category:
- PH / PD — Posted Header / Data
- NPH / NPD — Non-Posted Header / Data
- CplH / CplD — Completion Header / Data

Negotiated credit values are not generally exposed through standard
Link Capabilities register. They are visible in protocol-level traces,
some root-port/vendor performance counters, or FPGA-side debug.
Useful for lab fingerprinting and forensic captures, not normal
runtime config-space detection.

Physical Layer

Two details matter even without PHY-level instrumentation:

LTSSM (Link Training and Status State Machine):
- States: Detect → Polling → Configuration → L0 (operational)
  → L0s, L1, L2 (low-power) → Recovery → Hot Reset → Disabled → Loopback
- Observable via Link Status Register and root-port performance counters

Detection-relevant:
- Negotiated Link Width (Link Status[9:4]):
  Device advertising x16 but negotiating x1 is a tell
- Current Link Speed (Link Status[3:0]):
  Capability claims Gen4 but stays Gen2/Gen3 is anomalous
- Recovery cycle frequency:
  Comparative signal; materially different from donor reference is anomalous

ASPM (Active State Power Management):
- L0s and L1 are link-level low-power states
- A device claiming ASPM support in Link Capabilities but
  never transitioning out of L0 contradicts its class

Configuration Access Mechanisms

Two mechanisms on x86:

CAM (Legacy I/O-port path):
1. CPU writes to I/O port 0xCF8 (Bus:Device:Function:Register)
2. CPU reads/writes at I/O port 0xCFC
- Reaches only first 256 bytes
- Still used during early BIOS/UEFI boot

ECAM (Enhanced, MMIO path):
1. Read MCFG ACPI table for segment base addresses
2. Compute: addr = base + ((bus << 20) | (dev << 15) | (func << 12) | offset)
3. OS maps physical address into kernel virtual memory
- Required for Extended Configuration Space (0x100–0xFFF)
- Where AER, DSN, LTR, VSEC, ATS, PASID, SR-IOV live

On Windows, supported paths are:
- IRP_MN_READ_CONFIG / IRP_MN_WRITE_CONFIG
- BUS_INTERFACE_STANDARD.GetBusData / SetBusData
Production anti-cheat should use documented bus interfaces;
direct MCFG mapping is a lab-only technique.

PCIe Configuration Space

Legacy 256-Byte Header (Type 0 Endpoint)

Offset  Field                    Notes
0x00    Vendor ID (2B)           Chip manufacturer (e.g., 0x8086 Intel)
0x02    Device ID (2B)           Specific product
0x04    Command (2B)             BME (bit 2), MemSpace (bit 1), IOSpace (bit 0)
0x06    Status (2B)              Capabilities List (bit 4)
0x08    Revision ID + Class Code Class triplet: Base / Sub / ProgIF
0x0C    Cache Line / Latency /   Header Type 0x00 = endpoint,
        Header Type / BIST       0x01 = bridge, 0x80 = multi-function
0x10–27 BAR0–BAR5               Memory or I/O windows
0x2C    Subsystem Vendor ID      Often distinguishes board manufacturers
0x2E    Subsystem Device ID
0x30–33 Expansion ROM Base
0x34    Capabilities Pointer     Offset of first capability in linked list
0x3C    IRQ Line/Pin/Min/Max     Legacy INTx routing

BAR encoding (32-bit BAR):
bit 0:    0 = Memory BAR, 1 = I/O BAR
bits 2:1: 00 = 32-bit, 10 = 64-bit (BAR pair)
bit 3:    Prefetchable

BAR size discovery: write 0xFFFFFFFF to BAR, read back.
Lower bits (except type bits) come back as 0; rest form a size mask.
Real silicon's size masks are device-specific; a spoofed BAR with
64 KB mask when the donor uses 4 KB is detectable in one operation.

Capabilities Chain

If Status[4] is set, 0x34 points to the first capability.
Each capability has a 2-byte header: [ID | Next].
Next is DWord-aligned in 0x40–0xFF, or 0x00 to terminate.

Common capability IDs:
ID    Capability
0x01  PCI Power Management
0x05  MSI
0x10  PCI Express
0x11  MSI-X
0x12  SATA Configuration
0x13  PCI Advanced Features
0x14  Enhanced Allocation

Detection: walk the chain, validate each capability's declared size
doesn't overlap the next, Next is DWord-aligned and within bounds,
no cycle exists. A malformed chain is itself a signal.

PCIe Express Capability (ID 0x10)

The single most important capability for spoofing detection.

Offset  Field                    Notes
+0x02   PCIe Capabilities        Cap Version, Device/Port Type, Slot Impl
+0x04   Device Capabilities      MPS Supported, FLR, Phantom Functions
+0x08   Device Control           MPS current, MRRS, Error Enables
+0x0A   Device Status            CED, NFED, FED, URD, Transactions Pending
+0x0C   Link Capabilities        Max Link Speed/Width, ASPM, L0s/L1 latencies
+0x10   Link Control             ASPM Control, RCB, Link Disable, Retrain
+0x12   Link Status              Current Link Speed/Width, Link Training
+0x24   Device Capabilities 2    Completion Timeout Ranges, AtomicOp,
                                 OBFF, LTR mechanism
+0x28   Device Control 2         Completion Timeout Value, AtomicOp, LTR Enable
+0x2C   Link Capabilities 2      Supported Link Speeds Vector
+0x30   Link Control 2           Target Link Speed, Compliance
+0x32   Link Status 2            De-emphasis, EQ Phase status

Detection leverage per field:
- Device Type (+0x02[7:4]): must match donor's role
- MPS Supported (+0x04[2:0]): hard-IP ceiling contradicts donor
- FLR support (+0x04[28]): verify FLR changes same sticky/non-sticky
  state as claimed donor; naive firmware acknowledges FLR but continues
  unchanged, preserving impossible internal state
- Link Status (+0x12): Width/Speed are negotiated, observable, hard to
  lie about — hard IP reports what LTSSM actually achieved
- Slot Clock Config (+0x12[12]): must match real platform behavior
- Completion Timeout ranges (+0x24): selecting outside claimed ranges
  is a discriminator
- AtomicOp (+0x24[6-9]): server-class GPUs/NICs may support; FPGA
  hard IP almost never does. Mismatch is detectable.

MSI and MSI-X Capabilities

MSI (ID 0x05):
Message Control bits:
  [0]    MSI Enable
  [3:1]  Multiple Message Capable (0–5, representing 1–32 vectors)
  [6:4]  Multiple Message Enable (cannot exceed Capable)
  [7]    64-bit Address Capable
  [8]    Per-Vector Masking Capable

x86 MSI Address: bits [31:20] fixed at 0xFEE (LAPIC prefix)
  [19:12] Destination ID, [3] Redirection Hint, [2] Destination Mode
Message Data: [15] Trigger Mode, [10:8] Delivery Mode, [7:0] Vector

MSI-X (ID 0x11):
- Supports up to 2,048 vectors
- Table stored in BAR-mapped region (not Config Space)
- Each entry: 16 bytes (Addr Low, Addr High, Data, Vector Control)
- PBA (Pending Bit Array): bit-per-vector pending state

Naive MSI-X emulation failures:
- Ignores Vector Control Mask writes
- Sets PBA bits but never clears on unmask
- Returns hardcoded PBA values
- Doesn't retire pending interrupts when masks clear
Detection probe: mask vector → induce interrupt condition →
observe PBA bit → unmask → observe interrupt firing.
Real silicon satisfies this round trip; spoofed firmware rarely does.

AER Extended Capability (ID 0x0001)

Three error classes:
- Correctable: Receiver Error, Bad TLP, Bad DLLP, Replay Timer Timeout
- Uncorrectable Non-Fatal: Completion Timeout, Completer Abort, UR, ACS Violation
- Uncorrectable Fatal: Malformed TLP, DLL Protocol Error, Surprise Down

Each has Status (sticky, W1C), Mask, and Severity registers.
Header Log (16B) captures full TLP header of first logged uncorrectable error.

Detection:
- Absence of AER when donor model is known to expose it = mismatch
- Zero correctable-error count over long window when donor's silicon
  normally produces a baseline rate = anomalous
- Anomalous UR response patterns to probes of unimplemented offsets

Extended Capabilities

4-byte header at each offset:
[31:20] Next Capability Offset (0 to terminate)
[19:16] Capability Version
[15:0]  Extended Capability ID

Key Extended Capability IDs:
0x0001  AER
0x0002  Virtual Channel (VC)
0x0003  DSN (Device Serial Number, 8 bytes)
0x000B  Vendor-Specific Extended Capability (VSEC)
0x000D  ACS (Access Control Services)
0x000E  ARI
0x000F  ATS (Address Translation Services)
0x0010  SR-IOV
0x0015  Resizable BAR (RBAR)
0x0018  LTR (Latency Tolerance Reporting)
0x001B  PASID
0x001D  DPC (Downstream Port Containment)
0x001E  L1 PM Substates
0x001F  Precision Time Measurement (PTM)

Detection-relevant:
- DSN: 8-byte unique serial; donor-cloned firmware can collide
  with another player's identical card
- VSEC: Xilinx PCIe IP optionally emits VSEC blocks with
  characteristic Vendor ID + VSEC ID combinations
- ATS/PASID/SR-IOV presence on consumer-class donor is
  demographically suspicious — rare outside server-class hardware

IOMMU Architecture

Translation Flow

1. Device issues Memory TLP with target IOVA.
   TLP header carries 16-bit Requester ID (BDF).
2. TLP travels upstream through switches/bridges to root complex.
3. IOMMU intercepts, uses Requester ID to look up translation context.
4. IOMMU walks device's I/O page tables: IOVA → physical address.
5. Permission bits (Read, Write) checked against access type.
6. Success: TLP forwarded with translated physical address.
7. Failure: fault logged, device receives UR or CA completion.

Intel VT-d Internals

Two-level table lookup:

BDF → Root Table (256 entries, 16B each, indexed by Bus)
    → Context Table (256 entries, 16B each, indexed by Dev:Func)
      → Second-Level Page Tables (3–5 levels)
        → Final 4 KB physical page

Context Entry fields:
- SLPTPTR: Second-Level Page Table Pointer
- Domain ID: 16-bit (multiple devices can share a domain)
- AW: Address Width (3/4/5-level = 39/48/57-bit IOVA)
- T: Translation Type (untranslated-only, translated-only, or both)
- P: Present
- FPD: Fault Processing Disable

Page table entries (PTE, EPT-like format):
[0]      R - Read permission
[1]      W - Write permission
[7]      PS - Page Size (1=leaf super-page, 0=next-level table)
[N-1:12] Physical address of next-level table or 4 KB page

Super-pages: level-2 leaf = 2 MB, level-3 leaf = 1 GB.

Scalable Mode (VT-d 3.0+):
Context Entry → PASID Directory → PASID Table → per-PASID
first-level page-table roots. Enables Shared Virtual Memory (SVM).
Check RTADDR_REG.TTM to determine which mode is in effect.

AMD-Vi Internals

Single-level Device Table indexed directly by BDF:

BDF → Device Table Entry (32 bytes)
    → I/O Page Tables (1–6 levels)
      → Final page

DTE encodes:
- Page Table Root Pointer
- Mode (0–6, selects paging levels)
- Domain ID (16 bits)
- IR, IW — Default Read/Write permission
- GV — Guest Valid (nested translation)
- PASID-related fields

Page sizes: 4 KB, 2 MB, 1 GB.

IOTLB and Invalidation

Translations cached in IOTLB (I/O Translation Lookaside Buffer).
When mappings change, IOTLB must be invalidated.

Two distinct caches when ATS is in use:
- IOMMU's own IOTLB
- Device-side TLB (DevTLB) caching prior translations

Full invalidation with ATS requires:
1. IOMMU invalidates own IOTLB
2. IOMMU sends ATS Invalidate Request Message to device
3. Device drops affected DevTLB entries, replies with Invalidate Completion

If step 2 or 3 is skipped, device retains stale translations
and can DMA to unmapped addresses.

VT-d invalidation granularities:
- Global: flush entire IOTLB
- Domain-Selective: flush all entries for a Domain ID
- Page-Selective: flush specific IOVA range in a domain

Strict vs lazy invalidation:
Lazy mode defers IOTLB invalidation, batching them for performance.
Opens a window where stale translations remain valid — a device
whose driver has unmapped a buffer can still DMA to the old IOVA.

Fault Recording

VT-d: Fault Recording Registers — circular array capturing
  Requester ID, faulting IOVA, fault reason, TLP type.

AMD-Vi: Event Log Buffer — producer-consumer ring buffer of
  IO_PAGE_FAULT, INVALID_DEVICE_REQUEST, ATS-related events.

Both surface faults via interrupts and event-log entries.
On Windows, some IOMMU violations observable through WHEA/bug-check
paths and Driver Verifier DMA-violation telemetry.

Per-device fault rate is one of the most operationally useful
IOMMU-layer signals. Legitimate devices with correct drivers
rarely produce faults; sustained nonzero rate is direct evidence
of out-of-domain access attempts.

RMRR/IVMD:
ACPI DMAR table contains RMRR (Reserved Memory Region Reporting)
sub-tables declaring physical ranges devices need identity-mapped.
AMD-Vi has analogous IVMD (I/O Virtualization Memory Definition)
in the IVRS table. A defender should enumerate these and reject
configurations where suspect BDFs appear in RMRR scope or
RMRR ranges overlap game memory regions.

IOMMU Topology and Isolation

IOMMU Groups

Devices in the same IOMMU group may not be safely isolated from
one another. Group membership determined by:
- PCIe topology — devices behind a switch share a group
  unless the switch supports and enables ACS
- ACS state of upstream bridges
- Quirks for known-broken hardware

Linux: /sys/kernel/iommu_groups/N/devices/
Windows: equivalent constraints but no simple public group filesystem

ACS (Access Control Services, Extended Cap ID 0x000D)

ACS is a PCIe capability that switches/root ports advertise
to declare they can enforce isolation between downstream ports.

ACS Capability register enable bits:
Bit  Feature                      Effect
0    Source Validation (SV)        Drop TLPs with wrong Requester ID
1    Translation Blocking (TB)    Block AT=10 (Translated) TLPs
2    P2P Request Redirect (RR)    Force P2P requests upstream for IOMMU
3    P2P Completion Redirect (CR) Force P2P completions upstream
4    Upstream Forwarding (UF)     Forward upstream regardless
5    P2P Egress Control (EC)      Allow/deny P2P routing per-port
6    Direct Translated P2P (DT)   Allow P2P with translated addresses

Critical for untrusted endpoints: SV, TB, RR, and CR.
A switch missing Source Validation lets a malicious device spoof
its Requester ID, defeating per-BDF IOMMU translation.
A switch missing P2P Request Redirect allows devices on the same
switch to DMA directly to each other without IOMMU involvement.

Peer-to-Peer DMA

Devices on the same PCIe tree can send Memory TLPs directly to
each other's BAR ranges without involving system memory.
Without ACS forcing redirection, P2P TLPs never reach the IOMMU.

Plausible P2P DMA targets for cheat:
- GPU framebuffer — rendered game state
- Network adapter ring buffers — game traffic
- USB controller queues — input device data

Mitigation: ACS Translation Blocking + P2P Request Redirect
on every intermediate bridge. Defender must walk topology and
confirm both bits are active.

Interrupt Remapping

MSI/MSI-X interrupts are Memory Writes to 0xFEE00000–0xFEEFFFFF.
Without Interrupt Remapping (IR), any device with Bus Master enabled
can write to this range and trigger arbitrary interrupts — NMIs, SMIs,
or vectors targeting wrong CPU.

With IR enabled, IOMMU validates MSI/MSI-X writes and uses
remapping-table state to determine permitted destination.
IR is part of VT-d's broader DMA Remapping architecture.
Both VT-d and AMD-Vi have integrated equivalents.
Both should be mandatory in any anti-cheat threat model.

ATS, PASID, and Address Translation Trust

ATS (Address Translation Services, Extended Cap ID 0x000F)

ATS lets a device cache IOMMU translations locally:
1. Device issues Translation Request TLP (AT=01) with IOVA
2. IOMMU translates and responds with Translation Completion
   carrying physical address
3. Device caches translation in Device-side TLB (DevTLB)
4. Subsequent accesses issued with AT=10 (Translated) —
   IOMMU bypasses page-walk, trusting device's cached translation
5. On mapping changes, IOMMU sends Invalidation Request

Attack surface: malicious device claiming ATS can present
arbitrary AT=10 TLPs whose addresses were never approved
by the IOMMU. The IOMMU forwards them trusting the device's claim.

PASID (Extended Cap ID 0x001B)

Extends ATS to per-process address spaces. 20-bit PASID carried
in a TLP Prefix. IOMMU uses (Requester ID, PASID) jointly
to select translation context.

PASID enables Shared Virtual Memory (SVM) — primarily found in
datacenter NICs, AI accelerators. Presence on a consumer card
is anomalous.

ATS Trust Model and "ATS Untrusted" Mode

The fundamental trust assumption: device honestly reports
translations it has been granted. Unreasonable for external
Thunderbolt enclosures, FPGAs in M.2 slots, or untrusted
accelerator cards.

Modern OS/IOMMU stacks can treat endpoints as ATS-untrusted:
ATS is disabled, blocked by policy, or stripped.
Linux: pci=noats plus per-device quirks.
Windows: Kernel DMA Protection / DMAGuard matters, but don't
treat "Kernel DMA Protection: On" as proof every internal
endpoint is ATS-untrusted. Verify ATS state per endpoint.

Driver–IOMMU Contract and Bypass Catalog

Legitimate DMA Path (Windows)

1. Acquire DMA adapter: IoGetDmaAdapter / WDF wrapper
2. Allocate buffer: MmAllocateContiguousMemorySpecifyCacheNode
   or WdfCommonBufferCreate
3. Map for DMA: AllocateCommonBuffer / MapTransferEx
   - OS allocates IOVA from device's domain
   - Creates IOMMU page-table entries: [IOVA, IOVA+size) → physical pages
   - Returns IOVA to driver
4. Program device: driver writes IOVA into device's BAR registers
5. Device DMAs: TLPs arrive at IOMMU with BDF + IOVA
6. IOMMU translates: page-walk produces physical address
7. Completion and unmap: teardown IOMMU entries + IOTLB invalidation

In this model, device can DMA only to addresses the driver
explicitly mapped. Game memory is not in that range.

Six Paths to Out-of-Domain Access

1. IOMMU not active or not applied to this path
   VT-d/AMD-Vi disabled, OS not enforcing, device outside protected ports

2. Pre-boot DMA injection
   Inject before IOMMU initialized; requires firmware-level exploit

3. Identity-mapped / passthrough domains
   Legacy drivers request 1:1 mapping; modern strict-mode rejects it

4. Driver mapping over-allocation (Thunderclap class)
   OS maps full 4 KB page when buffer is smaller; adjacent kernel data exposed

5. Legitimate-path data exfiltration
   Cheat spoofed as NIC; OS network stack passes game packets through
   NIC's RX ring buffer (legitimately IOMMU-mapped). Cheat reads game data
   without leaving allowed mappings. Undetectable at IOMMU layer.

6. IOMMU page-table manipulation via kernel compromise
   BYOVD / vulnerable driver reprograms IOMMU tables.
   Requires code execution on gaming PC.

Approaches 1–3 are the foundation of most current DMA cheats.

IOMMU Bypass Catalog (16 Techniques)

#   Technique                   Mechanism                           Mitigation
─────────────────────────────────────────────────────────────────────────────────
1   IOMMU disabled              VT-d/AMD-Vi off in BIOS             Refuse misconfigured platforms
2   Pre-boot DMA                Firmware leaves injection window     UEFI updates; verify ACPI indicators
3   Identity/passthrough        1:1 IOVA-to-physical mapping        Strict-mode IOMMU policy
4   Driver over-allocation      Full 4 KB page, adjacent data       OS bounce buffers; strict mappings
5   ATS abuse                   AT=10 TLPs with arbitrary addrs     ATS Untrusted mode for non-allowlisted
6   ACS missing on bridge       P2P or spoofed Requester ID         Verify ACS state on all bridges
7   Lazy IOTLB invalidation    Stale translations valid briefly    Strict invalidation mode
8   FLR race                    FLR/Hot Reset race window           Synchronized FLR handling
9   SMM bypass                  SMM code exempt from IOMMU          Boot Guard / Platform Secure Boot
10  DMA-remapping driver bugs   Bugs in OS IOMMU manager            OS patching
11  Hypervisor escape           Compromised hypervisor              VBS / measured boot; TPM attestation
12  Interrupt injection (no IR) Write arbitrary interrupts           Mandatory IR enforcement
13  RMRR/IVMD scope abuse       Fake ACPI tables cover attacker     Measured boot; runtime RMRR audit
                                physical ranges
14  Snoop-bit manipulation      Stale cache lines visible           Strict snoop enforcement
15  PASID confusion             Misconfigured PASID Table           PASID-aware IOMMU programming
16  DMAR/IVRS spoofing          Compromised firmware, fake tables   Measured boot covering firmware

Techniques 1–6: active attack surface for current commercial DMA cheats
Techniques 7–13: academic, APT, firmware-level contexts
Techniques 14–16: largely theoretical

FPGA Hardware

Xilinx PCIe Integrated Block

Hardened IP block handling:
- Physical Layer (PHY, 8b/10b or 128b/130b, LTSSM, equalization)
- Data Link Layer (sequence numbers, replay buffer, flow control)
- Transaction Layer framing and parsing
- Subset of Configuration Space

IP core documentation:
- PG054 for 7-series
- PG156 for UltraScale Gen3
- PG213 for UltraScale+ Gen4

User logic interfaces over AXI-Stream (TX/RX) and separate
config management: cfg_mgmt_* (7-series), cfg_ext_* (UltraScale).

Detection consequences:
- Default fingerprints leak through: hard block populates Config Space
  with Xilinx-characteristic byte patterns
- 7-series firmware authors who don't understand cfg_mgmt_* leave
  subtle behavioral differences (some CfgTLPs return hard-block defaults)

FPGA Family Hierarchy

Artix-7 (consumer/mid-range, GTP transceivers, PCIe Gen2):
Chip       LUTs      BRAM(Kbit)  PCIe Hard Block
XC7A35T    20,800    1,800       Gen2 x4
XC7A50T    32,600    2,700       Gen2 x4
XC7A75T    46,200    3,780       Gen2 x4
XC7A100T   63,400    4,860       Gen2 x4
XC7A200T   134,600   13,140      Gen2 x4
(Smaller than T35 have no hard PCIe block)

Kintex-7 (high-end, GTX transceivers):
XC7K70T    41,000    4,860       Gen2 x8
XC7K160T   101,400   11,700      Gen2 x8
XC7K325T   203,800   16,020      Gen2 x8 / Gen3 x4
XC7K410T   254,200   28,620      Gen3 x8

Zynq UltraScale+ (ARM Cortex-A53 cores, GTH/GTY):
ZU2EG/CG   ~47,000   ~5.3M      Gen3 x4
ZU3EG/CG   ~70,000   ~7.6M      Gen3 x4
ZU4EG/EV   ~88,000   ~11.0M     Gen3 x8
ZU5EG/EV   ~117,000  ~18.0M     Gen3 x8
ZU6EG/CG   ~230,000  ~32.1M     Gen3 x16
(EV-suffixed: hardened H.265 codec for DMA + video-capture boards)

Resource Constraints and Capability

BRAM size caps:
  shadow config + writable overlay + BAR emulation + state machines.
  T35 (1.8 Mbit) struggles with full 4 KB shadow + 64 KB BAR + jitter buffers.
  T100 (4.86 Mbit) fits comfortably.
  Zynq ZU3 (7+ Mbit) has effectively unlimited room.

LUT count caps behavioral complexity:
  Each subsystem (MSI generator, ASPM FSM, AER counter, BAR responder)
  costs thousands of LUTs. T35 holds 1–2; T100 the full set;
  Kintex/Zynq adds runtime-reconfigurable parameter tables.

PHY transceiver family (GTP/GTX/GTH/GTY) has measurably different
signal characteristics; can sometimes be inferred from root-port
performance counters independent of firmware spoofing.

Form Factors

Form Factor           Description                 Detection
────────────────────────────────────────────────────────────────────
M.2 NGFF Key M        Internal NVMe slot           Dominant modern form;
                                                    physically invisible
M.2 + USB3 bridge     M.2 board with FT601         Gaming PC sees only M.2
PCIe x1/x4 add-in     Traditional add-in card      More physically visible
External USB3          USB3-to-PCIe (legacy)        Mostly obsolete
Combo boards           DMA + HDMI capture +         Complex device tree;
                       input injection              HDMI activity is fingerprint

M.2 slot populations are partially auditable from software through
PCI topology, ACPI, SMBIOS, storage inventory, and vendor board databases.
SMBIOS slot records are often incomplete for M.2, so detection should
be probabilistic and board-model-aware.

pcileech Framework

Project Lineage

Five upstream repositories:
- pcileech:       Host-side C application with attack modules
- pcileech-fpga:  FPGA firmware in Verilog/SystemVerilog, per-board variants
- MemProcFS:      Virtual filesystem mounting target memory as /proc-like tree
- LeechCore:      Low-level device abstraction library
- vmm:            Memory analysis engine (vmm.dll API)

Pipeline: FPGA → LeechCore → PCILeech attack modules / MemProcFS analysis

FPGA Firmware Architecture

Key modules:
- pcileech_pcie_a7.v / _us.v:        Top-level Artix-7 / UltraScale integration
- pcileech_pcie_tlps128_bram_rdwr.v:  128-bit TLP source/sink (AXI-Stream)
- pcileech_pcie_cfgspace_shadow.v:    Shadow config space in BRAM
- pcileech_cfgspace.coe:              Init data (stock: Xilinx 10EE:0666)
- pcileech_bar_impl_zerowrite4k.v:    Default BAR — absorbs writes, returns zero
- pcileech_bar_impl_loopaddr.v:       Alternative BAR — echoes address
- pcileech_bar_impl_none.v:           Disables BAR (returns UR)
- pcileech_pcie_cfg_a7.v:             Config management via cfg_mgmt_*
- pcileech_mux.v:                     TLP multiplexer
- pcileech_fifo.v:                    Internal staging FIFO

Two key architectural choices:
1. Shadow config is spoofable but not spoofed by default.
   .coe ships with placeholder Xilinx IDs. User must overwrite
   with real donor's dump and resynthesize.
2. BAR controller is functionally inert.
   zerowrite4k doesn't emulate device behavior.
   Active BAR probing catches stock builds in one operation.

Host-Side MemProcFS

Mounts target memory as filesystem:
M:\
├── pid\1234\
│   ├── name.txt
│   ├── modules\       ← loaded module list
│   ├── handles\
│   ├── vad\           ← virtual address descriptors
│   ├── memmap.txt
│   └── minidump\
├── sys\
├── name\game.exe\     ← lookup by process name
└── forensic\
    ├── yara\
    ├── timeline\
    └── registry\

Cheat development pattern:
1. Development phase: MemProcFS, signature search, cross-references
   → slow, broad scanning to find entity manager / player array / view matrix
2. Execution phase: custom app via vmm.dll/LeechCore,
   periodic reads of known offsets at 60–240 Hz
This split is fundamental to detection — behavioral analysis
targets the execution phase's statistical signature.

Stock Firmware Fingerprints

Vanilla pcileech-fpga build exhibits:
- VID/DID 10EE:0666 (Xilinx placeholder)
- Xilinx 7-series PCIe IP signature bytes at characteristic offsets
- DSN Extended Capability absent or default
- No AER, LTR, ARI, ATS, or SR-IOV capabilities
- BAR0 mapped (DMA window); BAR1–5 disabled or all-ones
- BAR reads return zero (zerowrite4k) or echo address (loopaddr)
- MSI capability present but no interrupts ever fire
- Config reads complete in deterministically uniform time
  (BRAM lookup with fixed pipeline depth, near-zero variance)
- LTSSM never leaves L0 after training; no ASPM transitions
- AER correctable-error count stays at zero
- Power management never leaves D0
- Class Code matches donor placeholder but no class-specific behavior

Configuration Space Spoofing

Bridge vs Emulated Firmware

Bridge firmware:
  Patches identity fields via Vivado's PCIe IP Core GUI
  (VID, DID, Subsystem IDs, Class Code, sometimes DSN).
  Fast to produce, but 7-series hard IP generates internal capability
  blocks at characteristic offsets that retain FPGA-specific fingerprints.

Emulated (1:1) firmware:
  Implements complete shadow Configuration Space in BRAM.
  Entire 4 KB extended config space initialized from real donor device hex dump.
  When OS issues CfgRd TLP, firmware responds from BRAM.
  IP Core's default registers never appear on the bus.

  Common bugs in emulated firmware:
  - First 16 bytes still come from IP block (mux priority)
  - Type 1 config reads not intercepted
  - Capability blocks bypassed in GUI still leak defaults

Shadow Configuration Space Implementation

Requirements:
1. Intercept incoming CfgRd0/CfgWr0 TLPs
2. Decode target offset
3. Look up value in BRAM
4. Build Completion TLP with correct Completer ID, status, payload
5. Send Completion through hard IP block

4 KB coverage at 4-byte granularity = 1,024 entries × 4 bytes = 4 KB BRAM.
Well within even T35's resources.

Overlay RAM and Writable Register Emulation

Real devices have writable registers. Firmware that returns correct
values on reads but drops writes creates detectable inconsistency.

Detection probe:
  write Command[BME] = 1 → read Command[BME]
  write Command[BME] = 0 → read Command[BME]
  Real silicon: bit toggles. Naive shadow: bit stays at BRAM init value.

Overlay RAM merges at read time:
  response = (base_value & ~writable_mask) | (overlay_value & writable_mask)

The catch: writable mask is register-specific:
- Command Register: different reserved bits than Device Control
- MSI Address Low: bits [1:0] reserved-zero
- BAR: type bits in [3:0] depend on I/O/memory, prefetchable
- Status Register: W1C bits — writing 1 clears, writing 0 no change
- AER Status: W1C across the board

Naive implementations with single global mask fail because
reserved-bit and W1C behavior diverges. Detection probes
W1C cases: write 0x00000000 to Correctable Error Status,
then write known-1 patterns, verify read-back semantics.

Donor Card Extraction

Every serious emulated firmware starts with a donor — a physical
PCIe card whose complete identity is cloned. Not just VID/DID —
entire 4 KB config space, all capabilities, BAR size masks,
MSI/MSI-X table layouts, all extended capabilities, DSN.

Extraction tools:
- lspci -d [VID:DID] -vvv -xxxx (full ECAM dump)
- setpci -s <BDF> ECAP_AER+0x4.L
- Direct ECAM mapping in kernel-mode tools
- VFIO passthrough + programmatic dump

Donor selection pitfalls:
- Duplicating hardware already in gaming PC → caught instantly
- Must match FPGA's actual PCIe capability (x16 Gen4 on Gen2 x1 fails)
- Silicon generation must align with claimed link speed
- Class-functional A/B testing detects non-functional donors

Firmware Sophistication Tiers

Tier  Description              Detection Difficulty
──────────────────────────────────────────────────────────────────
0     Stock                    Trivial — VID/DID blacklist
      Unmodified upstream;
      10EE:0666 IDs

1     Bridge                   Easy — signature residue,
      .coe replaced with       driverless device, BME clearance
      donor VID/DID only

2     Shadow config            Medium — R/W consistency probe
      Full 4 KB shadow;        catches silent write drops
      no overlay RAM

3     Overlay RAM              Medium-hard — BAR probe still fails;
      Shadow + writable layer  class-functional A/B fails
      with per-register masks

4     BAR MMIO + MSI           Hard — interrupt distribution analysis;
      Donor-class BAR memory   BAR-region content baselining
      responder + MSI generator

5     Behavioral emulation     Very hard — multi-session statistical
      Latency jitter, ASPM,    analysis required
      AER counters, donor-
      class-specific behavior

6     Private + randomized     Currently undefeated by hardware-layer
      Per-customer layout,     signatures alone
      capability shuffling,
      custom hard-IP integration

Each tier transition corresponds to a defender technique
that closed the prior tier's escape hatch.
Cost-to-implement is nonlinear: 0–2 are public/trivial,
3 needs moderate Verilog, 4–5 need deep PCIe knowledge,
6 is sustained engineering.

Donor Exhaustion

Each detected donor class becomes nonviable. Remaining viable donors
trend toward obscure industrial cards — which themselves become
a behavioral signal on consumer gaming PCs.

Donor Class            Detection Status
─────────────────────────────────────────────────────────
Wi-Fi adapters         Heavily detected; class-spoof checks ubiquitous
Wired NICs             "NIC with loaded driver but zero packets" test
NVMe SSDs              Harder; detected via missing namespace activity
Audio codecs           Lower bandwidth fits FPGA; class checks present
USB host controllers   Child-device enumeration check breaks naive
Capture cards          Harder (genuinely idle when no source)
Industrial / OEM SKUs  Increasingly only viable; demographic signal
Server-class accel.    Physically implausible on consumer boards

Detection at the PCIe Layer

Configuration Integrity

- VID/DID/SVID/SDID against known-real-silicon list
- Capability-chain walk: DWord-aligned Next pointers, no overlaps, no cycles
- Signature-residue scanning: Xilinx 7-series default byte patterns at
  known relative offsets (Device Capabilities field bits, reserved bits,
  VSEC vendor IDs)
- Capability presence consistency: donor model's known caps must all be present
- BAR mask verification: write 0xFFFFFFFF, compare size mask against donor

BAR Memory Read Probing

Send Memory Read TLPs to BAR ranges, validate responses by donor class:

NIC donor BAR0: register layout with receive/transmit ring descriptors,
  interrupt mask, link status. Offset 0x00 returns specific bit pattern.

NVMe donor BAR0: NVMe controller registers — CAP (MQES, DSTRD,
  MPSMIN/MPSMAX), VS, CC, CSTS, AQA, ASQ/ACQ, doorbells at 0x1000.

USB XHCI donor BAR0: Capability Registers (CAPLENGTH, HCSPARAMS, HCCPARAMS).

zerowrite4k returns all-zeros; loopaddr echoes address. Both are
trivially distinguishable from real content.
Tier-4 firmwares implement donor-class responders but usually only
cover registers checked at probe time, leaving others divergent.

R/W Consistency Probing

- Command Register: toggle Memory Space, I/O Space, Bus Master Enable
- Device Control: change MPS, MRRS, Error Enables
- MSI Control: toggle Enable, change Multiple Message Enable
- Walk every W1C bit (Status, AER Status): write 1s, confirm clear
- Walk reserved bits: write 1s, confirm read-back as 0
- Per-register writable masks must match donor

Tier-2 (no overlay) fails immediately.
Tier-3 (single global mask) fails on W1C and reserved-bit cases.

LTSSM and Link-State Validation

Sample PCIe Express Capability Link Status over time:
- Negotiated Width (Link Status[9:4]): consistent with donor deployment
  and FPGA hard block capability
- Current Link Speed (Link Status[3:0]): track slot's actual speed
- Gen4 x8 capability but Gen2 x1 Link Status = contradiction in one read
- DLL Active (Link Status[13]): should be 1 during operation
- Slot Clock Config (Link Status[12]): match real common-clock state

ASPM Behavioral Validation

Real devices claiming ASPM exhibit characteristic L0 ↔ L1 transitions.

Spoofed device anomalies:
- Claims ASPM capability but never transitions out of L0
- Transitions with exit latency inconsistent with claimed value
- Never reaches L1.1 / L1.2 when donor and platform both support

Sample Link Status "DLL Active" bit over time + PMC counters.

AER Baselining

- Departure from donor baseline: per-silicon correctable-error footprint
  should be stable. Implausibly clean (zero correctables when donor
  normally produces Bad TLP / Replay Timer Timeout) is anomalous.
- Implausible Header Log content (default/zeroed values)
- Inconsistent UR/CA responses to probes of unimplemented offsets

Completion Latency Fingerprinting

Real silicon: completion latency shaped by DRAM contention,
internal arbiters, PCIe pipeline depth → heavy-tailed distributions.

BRAM-backed emulators: fixed FPGA clock cycles + PCIe transit
→ much lower variance, even if mean is similar.

Detection signal is distribution shape, not absolute mean.

Statistical methods:
- Kolmogorov–Smirnov test: compare empirical CDFs
- Hill estimator: estimate tail index (real silicon has non-trivial tail;
  emulated firmware without stochastic jitter has no tail)
- Anderson-Darling test: sensitive to tail differences

Collect N latency samples (Memory Reads to BAR), compare against
per-donor reference distribution, flag devices deviating
beyond per-test-statistic threshold.

Tier-5 firmwares add LFSR-based jitter generators, but matching
real distribution shape (mean, variance, tail index, mode count)
requires modeling donor's DRAM access pattern.

MSI/MSI-X Behavioral Validation

A device with MSI Enable, Address/Data programmed, and attached driver
should produce interrupts:

- Zero interrupts when driver should exercise device = anomalous
- Implausibly uniform arrival times (exact 60 Hz heartbeat)
  = timer-driven generator, not event-driven
- Implausibly bursty patterns not matching donor class

Monitor via OS interrupt accounting, ETW/performance telemetry,
driver counters, kernel instrumentation.

Cheat-Phase Access Pattern Recognition

Two distinct patterns:

Development phase:
  Slow, broad scanning, signature search, MemProcFS walking.
  Rare during live competitive play.

Execution phase:
  Narrow, periodic reads (60–240 Hz) of small offset set
  (player positions, entity arrays, view matrices).

Execution phase statistical signature:
  High temporal periodicity, low address-space breadth,
  alignment to game-frame intervals.

Distinguishing features:
- Fano factor
- Autocorrelation at frame intervals
- Address-space coverage entropy

Honeypot regions complement this when combined with:
  IOMMU denial/fault logging, hypervisor-managed protected pages,
  device-domain sandboxing, decoy IOVA mappings, or server-side
  behavioral traps tied to fake game objects.

Hot-Plug and Link Disruption Events

Firmware swaps and donor reconfigurations require link renegotiation.
Three observable signals:

1. AER Surprise Down — uncorrectable fatal error logged when link
   drops unexpectedly mid-session.

2. Slot Status hot-plug events — Presence Detect Changed,
   Data Link Layer State Changed, MRL Sensor Changed.
   Register via IoRegisterPlugPlayNotification with
   EventCategoryTargetDeviceChange or EventCategoryDeviceInterfaceChange.

3. LTSSM re-training cycles — Link Training bit toggling,
   transient Recovery state.

A device that surprise-downs then reappears with subtly different
Config Space (adjusted MSI layout, different latency profile,
different VID/DID) = fingerprint of mid-session firmware reload.

Detection at the IOMMU Layer

Fault-Rate Monitoring

Sample per-device fault counters from IOMMU fault-recording structure
(VT-d Fault Recording Registers, AMD-Vi Event Log) or from WHEA.

Pipeline:
1. Snapshot baseline at session start
2. Sample at intervals; compute per-device fault rate
3. Threshold-trigger additional PCIe probing on devices exceeding baseline
4. Aggregate evidence across signals before action

Domain Assignment Audit

Walk IOMMU domain assignments for anomalies:
- Devices on passthrough/identity domains when strict mode is active
- Devices in unexpectedly large IOMMU groups (poor ACS topology)
- Multiple devices sharing Domain ID when they shouldn't

ACS Topology Verification

Walk PCIe bridge topology between every endpoint and root complex.
For each bridge with ACS Capability:
- Verify Source Validation (SV) enabled
- Verify Translation Blocking (TB) enabled
- Verify P2P Request Redirect (RR) and Completion Redirect (CR) enabled

Bridges without ACS at all = isolation holes by topology.
Bridges with ACS Capability but Control bits not set = misconfiguration.

IOMMU as Containment Primitive

Active containment when suspect device is identified:

1. IOMMU domain re-remapping:
   Reprogram device's domain to sandbox memory instead of revoking access.
   Cheat keeps "reading" but receives garbage data.

2. Bus Master Enable clearance:
   Toggle Command[2] to 0. Effective for tier-0 through tier-3.
   Cheats monitoring BME can race; may need repeated clearance.

3. Downstream Port Containment (DPC):
   When DPC is enabled on root port (Extended Cap ID 0x001D),
   triggers cause port to enter Contained state — all TLPs dropped,
   completions blocked, link logically isolated.
   Enforced at upstream port, no race against firmware-side BME restore.
   Not universal on all chipsets.

4. Anti-cheat-owned device domain:
   For device owned by AC driver, allocate and map only sandbox IOVAs,
   never expose game memory.

5. Hypervisor-integrated enforcement:
   Enforce policy above guest kernel by trapping IOMMU MMIO programming.
   Requires privileged platform integration.

Hypervisor-Level Defense

EPT-Based Memory Protection

EPT translates Guest Physical Address (GPA) to Host Physical Address (HPA).
A hypervisor owning the EPT can:

- Mark game memory as read-execute-only in EPT, even if guest OS marks
  read-write. Writes cause EPT violations the hypervisor traps.
- Hide pages by clearing EPT mappings.
- Implement watchpoints on specific GPA ranges.

IOMMU blocks DMA at device-to-memory boundary;
EPT blocks CPU access at guest-to-host boundary.
A cheat combining DMA card with kernel-mode payload faces both.

VBS, HVCI, and VTL Split

VBS creates Secure Kernel (VTL 1) alongside regular kernel (VTL 0)
in a Hyper-V partition. HVCI uses VTL 1 to enforce no executable page
in VTL 0 is simultaneously writable.

Anti-cheat interaction:
- Register VTL 1 callouts to validate guest state
- Attest against System Guard Secure Launch (DRTM) measurements
- Rely on HVCI to block BYOVD patterns

Combined VBS+HVCI+TPM+SecureBoot is assumed baseline for
serious anti-cheat threat models.

SMM Considerations

System Management Mode (Ring -2) runs in SMRAM, isolated from
OS and hypervisor. SMM handlers can read all physical memory
and are exempt from IOMMU enforcement.

A vulnerable SMI handler = path to arbitrary memory access
without IOMMU mediation.

Mitigations:
- Intel Boot Guard / AMD Platform Secure Boot (firmware signatures)
- SMI Transfer Monitor (STM): hypervisor-resident, treats SMM as
  constrained guest. Rarely implemented by board vendors.
- Runtime verification of SMM lockdown registers

PCIe IDE (Integrity and Data Encryption)

IDE adds link-level TLP protection: integrity (MAC-based, required)
and confidentiality (encryption, optional).
Incorporated into PCIe 6.0 base specification.

Valuable against: physical link interposers, malicious retimers/switches,
traffic tampering.

NOT a DMA-cheat silver bullet: cheat installed as the endpoint
still originates legitimate IDE-protected TLPs after key establishment.
IDE raises the bar for passive bus sniffing but endpoint identity,
IOMMU policy, ACS topology, ATS policy, and attestation remain required.

External Trust Anchors

TPM 2.0

Hardware (or firmware-isolated) cryptoprocessor with:
- PCRs: extend-only registers, PCR[n] = SHA256(PCR[n] || new_value)
- Persistent keys: EK (manufacturer), SRK (provisioned), user-defined
- Hierarchy: Endorsement, Storage, Platform, Null

PCR Allocation (Measured Boot)

PCR   Measured Content
0     SRTM / Core Root of Trust — UEFI firmware code
1     Platform configuration data — firmware variables
2     Option ROM code — third-party UEFI drivers
3     Option ROM configuration and data
4     IPL / boot manager binary (e.g., bootmgfw.efi)
5     IPL configuration — GPT/partition table, boot config
6     Manufacturer-specific / state-transition events
7     Secure Boot policy (PK, KEK, db, dbx)
8–15  OS-defined (BitLocker binds to PCR[11])
16    Debug
17–22 DRTM measurements (Secure Launch)
23    Application-defined

Remote Attestation Cryptography

Trust property: compromised local kernel cannot forge PCR values.

Flow:
1. Server sends nonce
2. Client calls TPM2_Quote(AIK, PCR_selection, nonce)
   TPM computes PCR composite, builds TPMS_ATTEST, signs with AIK
3. Client sends Quote + AIK certificate chain
4. Verifier checks:
   - AIK signature valid
   - AIK certificate chains to trusted TPM manufacturer root
   - EK on known-EK list (binds AIK to real TPM)
   - Nonce matches (freshness, replay protection)
   - PCR composite matches known-good value

A rootkit loading after measured boot cannot alter PCRs.
A software simulator cannot produce valid Quote without TPM private key.

DRTM and Secure Launch

Dynamic Root of Trust for Measurement allows "late launch" —
trusted execution environment established after OS boot,
measurement captured into PCR[17].

Intel: GETSEC[SENTER] (TXT)
AMD: SKINIT (SVM extension)

CPU enters measured execution state, Secure Loader Block (SLB)
loaded and hashed into PCR[17], control transfers to
Measured Launch Environment (MLE).

Microsoft System Guard Secure Launch uses this to load HVCI's
hypervisor into measured state independent of SRTM chain.
Defender requests Quote including PCR[17] and matches against
known-good MLE measurement.

UEFI Pre-Boot DMA Integrity

Pre-Boot DMA Protection: firmware must isolate DMA-capable devices'
I/O buffers before ExitBootServices().

ACPI indicators:
- Intel: DMA_CTRL_PLATFORM_OPT_IN_FLAG in DMAR table flags
- AMD: DMA remap support bit in IVRS IVinfo field

Windows PCR[7] event: firmware extends EV_EFI_ACTION with
"DMA Protection Disabled" when IOMMU/Kernel DMA Protection
is disabled, providing attestation hook.

Combined picture:
PCR[0]/PCR[7] anchor firmware and DMA-protection policy,
ACPI tables describe runtime IOMMU config,
documented DMA interfaces show what OS actually remaps,
attestation ties local claims to remote-verified known-good policy.

Layered Detection Pipeline

Pre-Game Environmental Verification

- IOMMU active and applied to DMA-capable PCIe paths
- Interrupt Remapping enabled
- Secure Boot enabled
- VBS/HVCI active
- TPM 2.0 present and provisioned
- Attestation Quote validates against expected policy
- BIOS/UEFI version not in known vulnerable pre-boot DMA list
- ACS topology walk: all relevant bridges enforce SV, TB, RR, CR

PCIe Inventory Pass

- Enumerate all PCIe devices via PnP tree
- Full 4 KB config-space dump for each
- Check device problem codes (DEVPKEY_Device_ProblemCode)
- Cross-reference SMBIOS slot inventory with populated devices

Configuration Integrity Per Device

- VID/DID/SVID/SDID against known-good list
- Capability-chain walk and validation
- Signature-residue scan
- BAR mask verification
- R/W consistency probing
- Compare against per-donor reference database

Behavioral Sampling During Play

- Periodic Link Status reads (LTSSM, ASPM transitions)
- AER counter snapshots
- Per-device interrupt rate and distribution
- Per-device IOMMU fault rate
- BAR-region content sampling for class consistency

Statistical Analysis Over Session

- Latency distribution comparison (KS test, Hill estimator)
- Interrupt arrival distribution
- ASPM transition rate

Cheat-Phase Detection

- Honeypot region access
- Memory access frequency / locality classifiers

Containment Before Verdict

Each detection produces evidence, not a verdict.
Verdict informed by:
- Multi-signal correlation (single signals can false-positive;
  combinations rarely do)
- Server-side aggregation across sessions
- Behavioral verification (input timing, gameplay statistics)

While verdict accumulates, containment protects the live match:
IOMMU re-remapping to sandbox, BME clearance, or EPT-level
game-process protection degrades cheat effectiveness in real time.

Realistic Limits

A firmware that:
- Clones donor byte-for-byte (full 4 KB config + all capabilities)
- Implements donor-class BAR MMIO, MSI generation, overlay RAM
- Adds completion-latency jitter matching donor distribution
- Generates plausible AER correctable-error rates
- Transitions through ASPM states like the donor
- Uses donor not present in gaming PC and not on blacklists
- Operates only within driver-mapped IOMMU domains (legitimate-path exfil)
- Avoids honeypot regions through gameplay-aware address whitelisting

...can defeat every PCIe-layer and IOMMU-layer signature in isolation.

This is why external trust anchors are required:
TPM attestation, measured boot, and server-side correlation operate
outside the "spoof a PCIe endpoint" problem.

A perfectly emulated DMA card cannot forge a TPM Quote.
But the verifier must bind Quote to allowlist/blocklist of BIOS versions,
DMA-protection events, Secure Boot state, VBS/HVCI, and IOMMU policy.

The cost of defeating all four layers simultaneously
(PCIe + IOMMU + hypervisor + attestation) exceeds typical cheat value.

Forensic Evidence Capture

What to Capture

Artifact                     Source                          Purpose
──────────────────────────────────────────────────────────────────────────────
Full 4 KB config dump        Bus interface / ECAM            Donor ID post-hoc
Capability chain walk        Parsed from config              Capability presence
PCIe link state history      Link Status over session        LTSSM anomaly proof
MSI/MSI-X arrival timeline   OS interrupt telemetry          Rate claim refutation
AER correctable counts       AER capability registers        Baseline outlier proof
IOMMU fault log entries      WHEA/ETW, Driver Verifier       DMA-violation proof
IOMMU domain assignments     IOMMU manager state walk        Passthrough anomaly
ACS bridge state             Bridge enumeration              Isolation proof
Honeypot access record       Hypervisor EPT trap log         Unauthorized read evidence
TPM PCR snapshot             TPM Quote API                   Boot-chain attestation
MCFG / DMAR / IVRS tables   ACPI subsystem                  Platform config baseline
SMBIOS slot inventory        DMI subsystem                   Slot-population audit
BIOS version + patch level   SMBIOS                          Pre-Boot DMA fix verify
Latency-distribution hists   Per-session sampling            Statistical fingerprint

Multi-Signal Correlation

Strongest evidence packages combine:
1. Hardware-layer signal (config space, BAR, link state)
2. Behavioral-layer signal (interrupt distribution, IOMMU fault rate, honeypot)
3. Temporal correlation (hardware signal preceded behavioral by plausible interval)

Three independent signals push false-positive rates below threshold
where appeals become a practical workload.

PCIe Protocol Captures

A PCIe protocol analyzer (interposer) produces the strongest forensic evidence:
full TLP-level captures with byte-exact accuracy and nanosecond timestamps.

Commercial analyzers capture every TLP, DLLP, and physical-layer ordered set.
Traces can be replayed to confirm fingerprinting findings.

Cost (tens to hundreds of thousands USD) limits routine use, but for
high-profile cases (competitive integrity, tournament, prosecution),
protocol-level captures by independent labs are the unambiguous reference.

Thunderbolt / USB4 DMA

Attack Surface

- Thunderbolt 1-4 / USB4 provide direct PCIe tunneling
- Hot-plug capable: device can be attached at runtime
- Pre-boot DMA: device has memory access before OS loads
- Thunderbolt Security Levels:
  - SL0 (None): no security, legacy mode
  - SL1 (User Auth): user must approve new devices
  - SL2 (Secure Connect): device must match previously approved UUID
  - SL3 (No PCIe tunneling): completely disables DMA

Thunderbolt-Specific Attacks

- Thunderclap: malicious Thunderbolt peripherals bypass IOMMU
- Device re-identification: change UUID to bypass SL2
- OS-level Thunderbolt driver vulnerabilities
- PCIe tunneling through USB4 hubs

Defensive Measures

- Kernel DMA Protection (Windows 10 1803+): automatic IOMMU for hot-plug
- Thunderbolt firmware verification
- Platform-level: BIOS setting to disable Thunderbolt PCIe tunneling
- macOS: T2 chip enforces DMA restrictions on Thunderbolt ports

Shadow CR3 / Split TLB

Page Table Manipulation

- Maintain two sets of page tables (two CR3 values):
  - "Clean" CR3: legitimate page tables visible to anti-cheat
  - "Shadow" CR3: modified page tables with cheat-accessible mappings
- Swap CR3 before/after anti-cheat inspection windows
- Combine with EPT manipulation for hypervisor-level split

Split TLB Techniques

- Desync instruction TLB (iTLB) and data TLB (dTLB):
  - Execute code from one physical page
  - Read data from another physical page at same virtual address
- Requires precise TLB invalidation control
- Hypervisor can create EPT-based split: execute on page A,
  read on page B, at same GPA
- Anti-cheat mitigation: TLB flush + re-walk, serializing instructions

Memory Access Techniques

Physical Memory Reading

// Typical pcileech API usage
HANDLE hDevice;
BYTE buffer[0x1000];
pcileech_read_phys(hDevice, physAddr, buffer, sizeof(buffer));

Virtual Address Translation

// Walk page tables: PML4 → PDPT → PD → PT → Physical
PHYSICAL_ADDRESS TranslateVA(UINT64 cr3, UINT64 virtualAddr) {
    UINT64 pml4e = ReadPhys(cr3 + PML4_INDEX(virtualAddr) * 8);
    UINT64 pdpte = ReadPhys(PFN(pml4e) + PDPT_INDEX(virtualAddr) * 8);
    UINT64 pde = ReadPhys(PFN(pdpte) + PD_INDEX(virtualAddr) * 8);
    UINT64 pte = ReadPhys(PFN(pde) + PT_INDEX(virtualAddr) * 8);
    return PFN(pte) + PAGE_OFFSET(virtualAddr);
}

DTB (Directory Table Base) Finding

- Scan physical memory for valid CR3 values
- Look for kernel structures
- Use signature scanning
- Validate page table entries

Security Considerations

Ethical Use

- Security research only
- Authorized testing environments
- Responsible disclosure
- Legal compliance

Risk Awareness

- Physical hardware access required
- Potential system instability
- Detection by advanced anti-cheat
- Legal implications

Resource Organization

The README contains:

pcileech and derivatives
FPGA firmware projects
DMA libraries
Integration tools
Device emulation firmware
Anti-detection implementations

Data Source

Important: This skill provides conceptual guidance and overview information. For detailed information use the following sources:

1. Project Overview & Resource Index

Fetch the main README for the full curated list of repositories, tools, and descriptions:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/README.md

2. Repository Code Details (Archive)

Archive URL format:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/archive/{owner}/{repo}.txt

Examples:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/archive/ufrisk/pcileech.txt
https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/archive/000-aki-000/GameDebugMenu.txt

How to use:

Identify the GitHub repository the user is asking about (owner and repo name from the URL).
Construct the archive URL: replace {owner} with the GitHub username/org and {repo} with the repository name (no .git suffix).
Fetch the archive file — it contains a full code snapshot with file trees and source code generated by code2prompt.
If the fetch returns a 404, the repository has not been archived yet; fall back to the README or direct GitHub browsing.

3. Repository Descriptions

For a concise English summary of what a repository does, the project maintains auto-generated description files.

Description URL format:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/description/{owner}/{repo}/description_en.txt

Examples:

https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/description/00christian00/UnityDecompiled/description_en.txt
https://raw.githubusercontent.com/gmh5225/awesome-game-security/refs/heads/main/description/ufrisk/pcileech/description_en.txt

How to use:

Identify the GitHub repository the user is asking about (owner and repo name from the URL).
Construct the description URL: replace {owner} with the GitHub username/org and {repo} with the repository name.
Fetch the description file — it contains a short, human-readable summary of the repository's purpose and contents.
If the fetch returns a 404, the description has not been generated yet; fall back to the README entry or the archive.

Priority order when answering questions about a specific repository:

Description (quick summary) — fetch first for concise context
Archive (full code snapshot) — fetch when deeper implementation details are needed
README entry — fallback when neither description nor archive is available

dma-attack-techniques

이 저장소의 다른 Skills

이 저장소의 다른 Skills

DMA Attack Techniques

Overview

README Coverage

Threat Model

External DMA Cheat Architecture

Three Defense Layers

PCIe Protocol Stack

Three Protocol Layers

TLP (Transaction Layer Packet) Format

Detection-Relevant DW0 Fields

TLP Routing and Requester ID

Memory Read Completion Splitting

Tag Space and Fingerprinting

MPS and MRRS as Fingerprints

Data Link Layer

Physical Layer

Configuration Access Mechanisms

PCIe Configuration Space

Legacy 256-Byte Header (Type 0 Endpoint)

Capabilities Chain

PCIe Express Capability (ID 0x10)

MSI and MSI-X Capabilities

AER Extended Capability (ID 0x0001)

Extended Capabilities

IOMMU Architecture

Translation Flow

Intel VT-d Internals

AMD-Vi Internals

IOTLB and Invalidation

Fault Recording

IOMMU Topology and Isolation

IOMMU Groups

ACS (Access Control Services, Extended Cap ID 0x000D)

Peer-to-Peer DMA

Interrupt Remapping

ATS, PASID, and Address Translation Trust

ATS (Address Translation Services, Extended Cap ID 0x000F)

PASID (Extended Cap ID 0x001B)

ATS Trust Model and "ATS Untrusted" Mode

Driver–IOMMU Contract and Bypass Catalog

Legitimate DMA Path (Windows)

Six Paths to Out-of-Domain Access

IOMMU Bypass Catalog (16 Techniques)

FPGA Hardware

Xilinx PCIe Integrated Block

FPGA Family Hierarchy

Resource Constraints and Capability

Form Factors

pcileech Framework

Project Lineage

FPGA Firmware Architecture

Host-Side MemProcFS

Stock Firmware Fingerprints

Configuration Space Spoofing

Bridge vs Emulated Firmware

Shadow Configuration Space Implementation

Overlay RAM and Writable Register Emulation

Donor Card Extraction

Firmware Sophistication Tiers

Donor Exhaustion

Detection at the PCIe Layer

Configuration Integrity

BAR Memory Read Probing

R/W Consistency Probing

LTSSM and Link-State Validation

ASPM Behavioral Validation

AER Baselining

Completion Latency Fingerprinting

MSI/MSI-X Behavioral Validation

Cheat-Phase Access Pattern Recognition

Hot-Plug and Link Disruption Events

Detection at the IOMMU Layer

Fault-Rate Monitoring

Domain Assignment Audit

ACS Topology Verification

IOMMU as Containment Primitive

Hypervisor-Level Defense