Run any Skill in Manus with one click

Get Started

detecting-data-and-model-poisoning

Stars20,049

Forks2,331

UpdatedJune 22, 2026 at 17:08

Identify poisoned training data and backdoored models across the ML pipeline.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

mukul975

mukul975/Anthropic-Cybersecurity-Skills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

File Explorer

5 files

SKILL.md

readonly

More from this repository

same repository

abusing-dpapi-for-credential-access

mukul975/Anthropic-Cybersecurity-Skills

Extract DPAPI-protected secrets such as credentials and browser data offline and online.

2026-06-2220.0k

abusing-shadow-credentials-for-privesc

mukul975/Anthropic-Cybersecurity-Skills

Take over Active Directory user and computer accounts by writing alternate certificate keys to msDS-KeyCredentialLink (Shadow Credentials) with pyWhisker, Whisker, and Certipy, then authenticate via PKINIT.

2026-06-2220.0k

assessing-vector-and-embedding-weaknesses

mukul975/Anthropic-Cybersecurity-Skills

Test vector stores for embedding inversion, cross-tenant leakage, and poisoning.

2026-06-2220.0k

attacking-entra-id-with-roadtools

mukul975/Anthropic-Cybersecurity-Skills

Enumerate Entra ID with ROADrecon and acquire and exchange tokens with roadtx.

2026-06-2220.0k

attacking-oauth-with-device-code-phishing

mukul975/Anthropic-Cybersecurity-Skills

Run OAuth 2.0 device-code and illicit-consent phishing against Microsoft Entra ID to steal access and refresh tokens, bypass MFA, and pivot across Microsoft 365 services.

2026-06-2220.0k

auditing-entra-id-with-aadinternals

mukul975/Anthropic-Cybersecurity-Skills

Run Microsoft Entra ID tenant reconnaissance, token acquisition and manipulation, and federation backdoor testing with the AADInternals PowerShell toolkit to validate identity-attack resilience.

2026-06-2220.0k

name	detecting-data-and-model-poisoning
description	Identify poisoned training data and backdoored models across the ML pipeline.
domain	cybersecurity
subdomain	ai-security
tags	["ai-security","data-poisoning","model-backdoor","ml-supply-chain","adversarial-robustness-toolbox","activation-clustering","spectral-signatures","model-integrity"]
version	1.0
author	mahipal
license	Apache-2.0
nist_csf	["MEASURE-2.7"]
mitre_attack	["AML.T0020","AML.T0018"]

ID	Official Name	Relevance
AML.T0020	Poison Training Data	Injection of manipulated samples into the training corpus
AML.T0018	Backdoor ML Model	Trigger-activated hidden behavior in the trained model
AML.T0010	ML Supply Chain Compromise	Poisoned public datasets / trojaned downloaded weights
AML.T0024	Exfiltration via ML Inference API	Some poisoning aims to leak data via the model's responses

Tool	Purpose	Source
Adversarial Robustness Toolbox	Activation clustering & spectral-signature poisoning defenses	https://github.com/Trusted-AI/adversarial-robustness-toolbox
Cleanlab	Label/data-quality issue detection	https://github.com/cleanlab/cleanlab
safetensors	Safe (non-pickle) weight serialization	https://github.com/huggingface/safetensors
OWASP LLM04:2025	Data and Model Poisoning reference	https://genai.owasp.org/llmrisk/llm042025-data-and-model-poisoning/
MITRE ATLAS	AI threat technique taxonomy	https://atlas.mitre.org/

Layer	Method	Tool	Signal
Supply chain	Hash/signature + safe format	sha256/safetensors	Tampered or unsafe artifact
Data	Label issues / outliers	Cleanlab	Mislabeled / injected samples
Model	Activation clustering	ART ActivationDefence	Per-class activation split
Model	Spectral signatures	ART SpectralSignatureDefense	Outlier covariance spectrum
Model	Trigger probing	custom	High trigger attack-success-rate

name	detecting-data-and-model-poisoning
description	Identify poisoned training data and backdoored models across the ML pipeline.
domain	cybersecurity
subdomain	ai-security
tags	["ai-security","data-poisoning","model-backdoor","ml-supply-chain","adversarial-robustness-toolbox","activation-clustering","spectral-signatures","model-integrity"]
version	1.0
author	mahipal
license	Apache-2.0
nist_csf	["MEASURE-2.7"]
mitre_attack	["AML.T0020","AML.T0018"]

detecting-data-and-model-poisoning

Detecting Data and Model Poisoning

Overview

When to Use

Prerequisites

Objectives

MITRE ATT&CK Mapping

Workflow

1. Verify data and model provenance/integrity

2. Detect label/data-quality issues with Cleanlab

3. Detect poisoned samples via ART activation clustering

4. Confirm with ART spectral signatures

5. Probe the model for backdoor triggers

6. Quarantine, retrain, and report

Tools and Resources

Detection Method Reference

Validation Criteria

Detecting Data and Model Poisoning

Overview

When to Use

Prerequisites

Objectives

MITRE ATT&CK Mapping

Workflow

1. Verify data and model provenance/integrity

2. Detect label/data-quality issues with Cleanlab

3. Detect poisoned samples via ART activation clustering

4. Confirm with ART spectral signatures

5. Probe the model for backdoor triggers

6. Quarantine, retrain, and report

Tools and Resources

Detection Method Reference

Validation Criteria