ワンクリックで
assessing-vector-and-embedding-weaknesses
Test vector stores for embedding inversion, cross-tenant leakage, and poisoning.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Test vector stores for embedding inversion, cross-tenant leakage, and poisoning.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
Extract DPAPI-protected secrets such as credentials and browser data offline and online.
Take over Active Directory user and computer accounts by writing alternate certificate keys to msDS-KeyCredentialLink (Shadow Credentials) with pyWhisker, Whisker, and Certipy, then authenticate via PKINIT.
Enumerate Entra ID with ROADrecon and acquire and exchange tokens with roadtx.
Run OAuth 2.0 device-code and illicit-consent phishing against Microsoft Entra ID to steal access and refresh tokens, bypass MFA, and pivot across Microsoft 365 services.
Run Microsoft Entra ID tenant reconnaissance, token acquisition and manipulation, and federation backdoor testing with the AADInternals PowerShell toolkit to validate identity-attack resilience.
Find over-permissive RBAC roles and service-account token abuse paths in Kubernetes using kubectl auth can-i, rbac-police, kubectl-who-can, and rakkess during authorized cluster security reviews.
| name | assessing-vector-and-embedding-weaknesses |
| description | Test vector stores for embedding inversion, cross-tenant leakage, and poisoning. |
| domain | cybersecurity |
| subdomain | ai-security |
| tags | ["ai-security","vector-database","embedding-inversion","rag-security","owasp-llm08","multi-tenant-isolation","data-poisoning","retrieval-augmented-generation"] |
| version | 1.0 |
| author | mahipal |
| license | Apache-2.0 |
| nist_csf | ["MEASURE-2.7"] |
| mitre_attack | ["AML.T0024"] |
Authorized use only: These tests interact with vector stores and embedding models in RAG systems you own or are authorized to assess. Embedding inversion and cross-tenant probing against systems you do not control may expose third-party data and is prohibited without authorization.
Retrieval-Augmented Generation (RAG) systems convert documents into embedding vectors stored in a vector database (Pinecone, Qdrant, Weaviate, Chroma, pgvector, FAISS) and retrieve the nearest vectors to ground LLM responses. OWASP LLM08:2025 Vector and Embedding Weaknesses covers the security risks unique to this layer:
The parent technique is AML.T0024 — Exfiltration via ML Inference API: an attacker uses legitimate inference/query access to exfiltrate data (source text via inversion, membership, or model extraction). This skill provides a repeatable assessment of all five weakness classes.
# Vector DB clients + embeddings + similarity tooling
python -m pip install numpy scikit-learn sentence-transformers
python -m pip install qdrant-client chromadb pinecone-client weaviate-client
# (optional) text-embedding inversion research baseline
python -m pip install vec2text
| ID | Tactic | Official Technique Name | Role in this skill |
|---|---|---|---|
| AML.T0024 | ATLAS: Exfiltration | Exfiltration via ML Inference API | Using query/embedding access to exfiltrate source data |
| AML.T0024.000 | ATLAS: Exfiltration | Infer Training Data Membership | Membership-inference probe against the corpus |
| AML.T0024.001 | ATLAS: Exfiltration | Invert ML Model | Embedding-inversion reconstruction of source text |
| AML.T0020 | ATLAS: Resource Development | Poison Training Data | Knowledge-base poisoning of the corpus |
| AML.T0051.001 | ATLAS: Initial Access | LLM Prompt Injection: Indirect | Injection payloads embedded in retrieved chunks |
Document the embedding model + dimensions, the vector store and its tenancy model, the chunking strategy, retrieval top_k and similarity metric (cosine/dot/L2), and any metadata filters applied at query time.
# Example: inspect a Qdrant collection
from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
info = client.get_collection("docs")
print(info.config.params.vectors) # size + distance metric
print(client.count("docs")) # corpus size
Embeddings of similar text are close; an attacker with the embedding endpoint can iteratively reconstruct text whose embedding matches a target vector. Measure how much a nearest-neighbour-in-embedding-space recovers, using cosine similarity between candidate reconstructions and the target.
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
model = SentenceTransformer("all-MiniLM-L6-v2")
secret = "Patient John Doe, MRN 553120, diagnosed with hypertension."
target_vec = model.encode([secret])
# Attacker has only target_vec and the embedding endpoint. Hill-climb candidate text.
candidates = [
"Patient name and medical record number with a diagnosis.",
"John Doe medical record hypertension diagnosis",
"Patient John Doe MRN diagnosed hypertension",
]
cand_vecs = model.encode(candidates)
sims = cosine_similarity(target_vec, cand_vecs)[0]
for c, s in sorted(zip(candidates, sims), key=lambda x: -x[1]):
print(f"{s:.3f} {c}")
# High similarity for a near-verbatim guess => inversion risk is real for this model.
For a research-grade reconstruction baseline, vec2text can be used against compatible embedding models to demonstrate full-text recovery.
Determine whether a specific document is in the corpus by measuring the top-1 retrieval similarity for an exact-quote query: in-corpus items return a markedly higher max similarity than out-of-corpus controls.
def membership_score(client, collection, embed, text):
vec = embed([text])[0].tolist()
hits = client.search(collection_name=collection, query_vector=vec, limit=1)
return hits[0].score if hits else 0.0
in_corpus = membership_score(client, "docs", model.encode, "<exact quote from a known chunk>")
control = membership_score(client, "docs", model.encode, "An unrelated random sentence.")
print(f"in-corpus={in_corpus:.3f} control={control:.3f} delta={in_corpus-control:.3f}")
# A large positive delta indicates the item is in the corpus (membership leak).
Confirm that tenant B cannot retrieve tenant A's chunks. Issue tenant-B-authenticated queries that should be filtered, and verify no tenant-A tenant_id appears in results.
# Query as tenant B; expect ONLY tenant_id == "B" results.
from qdrant_client.models import Filter, FieldCondition, MatchValue
vec = model.encode(["confidential salary information"])[0].tolist()
hits = client.search(
collection_name="docs",
query_vector=vec,
limit=10,
query_filter=Filter(must=[FieldCondition(key="tenant_id", match=MatchValue(value="B"))]),
)
leaked = [h for h in hits if h.payload.get("tenant_id") != "B"]
print("CROSS-TENANT LEAK" if leaked else "isolation OK", "->", len(leaked), "foreign rows")
# Critical test: repeat WITHOUT the filter to confirm the server, not the client,
# enforces isolation. If unfiltered queries return tenant A data, isolation is client-side only.
hits_nofilter = client.search(collection_name="docs", query_vector=vec, limit=10)
print("server-side isolation FAILS" if any(h.payload.get("tenant_id") != "B" for h in hits_nofilter) else "OK")
Insert a benign poisoned chunk crafted to be retrieved for many unrelated queries, then measure how often it appears in top_k.
from qdrant_client.models import PointStruct
# Benign marker payload (no real injection) to measure retrieval dominance.
poison = "POISON-CANARY. " + " ".join(
["password reset billing refund account login support error help"] * 8
)
client.upsert("docs_test", points=[
PointStruct(id=999999, vector=model.encode([poison])[0].tolist(),
payload={"tenant_id": "B", "source": "poison-test"})
])
queries = ["how do I get a refund", "reset my password", "what is the weather"]
for q in queries:
hits = client.search("docs_test", model.encode([q])[0].tolist(), limit=5)
dominated = any(h.payload.get("source") == "poison-test" for h in hits)
print(f"{'POISON in top5' if dominated else 'clean'}: {q}")
Scan retrieved chunk text for injection markers before it is concatenated into the prompt.
import re
INJECTION_PATTERNS = [
r"ignore (all|previous|the above) instructions",
r"system prompt", r"you are now", r"disregard", r"</?(system|instructions)>",
]
def chunk_is_injection(text):
low = text.lower()
return [p for p in INJECTION_PATTERNS if re.search(p, low)]
for hit in client.search("docs", model.encode(["help"])[0].tolist(), limit=10):
flags = chunk_is_injection(hit.payload.get("text", ""))
if flags:
print("INDIRECT INJECTION in chunk", hit.id, flags)
defending-llms-with-guardrails).| Tool | Purpose | Primary Source |
|---|---|---|
| OWASP LLM08 | Vector and Embedding Weaknesses guidance | https://genai.owasp.org/llmrisk/llm082025-vector-and-embedding-weaknesses/ |
| sentence-transformers | Embedding generation for testing | https://www.sbert.net/ |
| Qdrant client | Vector store + filtered search | https://qdrant.tech/documentation/ |
| Chroma / Weaviate / Pinecone | Alternative vector stores | https://docs.trychroma.com/ |
| vec2text | Embedding-inversion research baseline | https://github.com/jxmorris12/vec2text |
| MITRE ATLAS | AML.T0024 Exfiltration via ML Inference API | https://atlas.mitre.org/ |