تشغيل أي مهارة في Manus بنقرة واحدة

rag-review

النجوم٣١

التفرعات١٩

آخر تحديث٢٥ مايو ٢٠٢٦ في ١٧:١٨

Use when reviewing or auditing a RAG pipeline end-to-end against production best practices. Covers all 8 stages — ingestion/chunking, versioning/updates, embedding/indexing, query routing, retrieval/fusion/reranking, generation/abstention, security, and eval/observability. Triggers on phrases like "review RAG pipeline", "audit RAG", "RAG E2E review", "is this RAG production ready", "check RAG implementation", "RAG pre-merge review", "review this RAG code".

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

kumaran-is

kumaran-is/claude-code-onboarding

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات·SOC 15-1253

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

browser-testing

kumaran-is/claude-code-onboarding

Browser automation and testing using playwright-cli (stateful Bash CLI for scripted tests — network inspection, console monitoring, screenshots, tracing) and Browser-Use MCP (autonomous agent flows). Use when the user needs to test web apps, debug browser issues, analyze performance, fill forms, run E2E user flows, or inspect network/console activity.

2026-05-2631

decision-frameworks

kumaran-is/claude-code-onboarding

Use when working through a specific problem or decision using a single reasoning framework applied deeply and interactively. Covers First Principles (break assumptions, rebuild from truth), Inversion (guarantee failure, then flip), Regret Minimization (decide from age 80), and Opportunity Cost (make tradeoffs visible). Triggers: "first principles", "inversion", "regret minimization", "opportunity cost", "help me think through", "challenge my assumptions", "what am I giving up", "work backwards from failure", "what would I regret".

2026-05-2631

mental-model-applier

kumaran-is/claude-code-onboarding

Use when stuck on any problem or decision and need frameworks that actually apply to the specific situation — not a generic list. Selects the three most relevant mental models for the problem at hand and applies each one to produce a specific insight. Triggers: "apply mental models", "I'm stuck on", "need a framework for", "different perspective on", "mental model", "thinking framework", "perspective shift", "been thinking about this too long".

2026-05-2631

second-order-thinker

kumaran-is/claude-code-onboarding

Use before any significant decision, when analyzing a trend, or when evaluating the impact of any action beyond the obvious. Maps first, second, and third order consequences — the effects of the effects that most people miss. Triggers: "second order effects", "map consequences", "think ahead", "what happens after", "downstream effects", "systems thinking", "analyze this decision", "what are the ripple effects".

2026-05-2631

code-explainer

kumaran-is/claude-code-onboarding

Use when you need to explain any piece of code for handoff, onboarding, or knowledge transfer — produces a dual-audience explanation (user-facing and modifier-facing) plus the fragile part and key assumption. Triggers: "explain this code", "what does this do", "help me understand", "onboard someone to", "document this", "explain for handoff", "code walkthrough".

2026-05-2631

pr-review

kumaran-is/claude-code-onboarding

Use when reviewing someone else's PR or preparing your own review comments for posting to GitHub. Implements a two-stage approval process — internal rich analysis first, human approval gate, then clean public posting. Nothing posts to GitHub until you explicitly approve. Triggers: "review this PR", "post a PR review", "review PR #N", "give feedback on PR", "submit a code review", "pr comment".

2026-05-2631

name	rag-review
description	Use when reviewing or auditing a RAG pipeline end-to-end against production best practices. Covers all 8 stages — ingestion/chunking, versioning/updates, embedding/indexing, query routing, retrieval/fusion/reranking, generation/abstention, security, and eval/observability. Triggers on phrases like "review RAG pipeline", "audit RAG", "RAG E2E review", "is this RAG production ready", "check RAG implementation", "RAG pre-merge review", "review this RAG code".
allowed-tools	Read, Glob, Grep

RAG E2E Review Checklist

Iron Law

A RAG pipeline ships only when retrieval, generation, security, AND eval all pass. Three out of four is a production incident.

This skill is the single source of truth for "what does a production-grade RAG pipeline look like across all 8 stages." Reviewers cite reference files for detail — they do not restate content here.

How to use this skill

For human users: Invoke /rag-review for an E2E audit. For deeper review, dispatch @rag-implementation-reviewer or @rag-pipeline-reviewer — both auto-load this skill.

For agents consuming this skill: Walk the 8 stages in order. For every check, cite the file:line of the authoritative reference. Don't manufacture findings — if a stage is clean, say so in one line.

Severity rubric

Severity	Meaning	Verdict impact
🔴 Critical	Security incident, data integrity, prompt injection, hallucination on high-risk path	BLOCK
🟠 High	Production correctness, recall ceiling, missing audit log, missing abstention	NEEDS_REVIEW (fix before merge)
🟡 Medium	Quality + ops drift (no eval gate, no per-axis breakdown, missing cache)	NEEDS_REVIEW (recommend)
🟢 Low	Hygiene (missing comments, magic numbers, undocumented alpha)	APPROVE with note

Stage 1 — Ingestion & Chunking

Cross-refs: rag-architect/references/advanced-chunking-guide.md, rag-architect/references/table-chunking-strategy.md

#	Check	Severity if failing	Reference
1.1	Parser matches corpus type — not raw `pdf.extract_text()` for complex PDFs; Unstructured.io or LlamaParse for table-heavy docs	🟠	`advanced-chunking-guide.md §6`
1.2	Document-aware chunking on headings/sections — NOT fixed-size sliding window as the only strategy	🟠	`advanced-chunking-guide.md §1`
1.3	Chunk size calibrated by document type: prose 500–800 tokens, structured/SOP 300–600, tables = whole or row-groups	🟡	`advanced-chunking-guide.md §1` + `rag-operations-guide.md §6`
1.4	10–20% overlap on fixed-size strategies (unless chunking on natural boundaries like paragraphs)	🟡	`advanced-chunking-guide.md §1`
1.5	Stable chunk IDs: `(document_id, document_version, chunk_index)` — never auto-increment IDs	🟠	`advanced-chunking-guide.md §4`
1.6	Required metadata per chunk: `document_id`, `chunk_id`, `tenant_id`, `source_uri`, `section_title`, `page`, `chunk_index`, `embedding_model`, `chunker_version`, `parser_version`, `content_hash`, `created_at`	🟠	`advanced-chunking-guide.md §4` (16-field schema)
1.7	Tables NOT split mid-row; column headers repeated in every row-group chunk	🔴	`table-chunking-strategy.md §3`
1.8	High-stakes tables stored with multiple representations (plain text + Markdown + JSON)	🟡	`table-chunking-strategy.md §4`
1.9	OCR quality gate applied before ingestion: per-character confidence threshold checked, garbled text rejected	🟡	`advanced-chunking-guide.md §6`
1.10	Parent-child relationship stored when narrow questions need surrounding context	🟡	`advanced-chunking-guide.md §1.4`

Stage 2 — Versioning, Updates & Deletion

Cross-refs: rag-architect/references/versioning-and-freshness.md, rag-architect/references/incremental-ingestion-guide.md

#	Check	Severity if failing	Reference
2.1	Two-level hash gate implemented: document hash (fast skip) AND chunk hash (exact diff) — NOT document hash alone	🟠	`incremental-ingestion-guide.md §1`
2.2	Text normalized before hashing (whitespace, page numbers, timestamps stripped) to prevent false re-embeds from formatting noise	🟠	`incremental-ingestion-guide.md §2`
2.3	Stable chunk IDs enable clean upsert — changed chunk updates in place, does not create duplicate vector	🔴	`incremental-ingestion-guide.md §3`
2.4	Soft-delete: removed chunks set `is_active=false`; retrieval filters `WHERE is_active = true`	🔴	`incremental-ingestion-guide.md §4`
2.5	Version fields stored per chunk: `embedding_model`, `chunker_version`, `parser_version`, `normalizer_version`	🟠	`incremental-ingestion-guide.md §5`
2.6	Versioning metadata: `effective_from`, `effective_to`, `superseded_at` — default retrieval filter `superseded_at IS NULL`	🟠	`versioning-and-freshness.md §2–3`
2.7	New document version supersession runs in a single atomic transaction (marks old superseded + inserts new together)	🔴	`versioning-and-freshness.md §5`
2.8	Historical query detection implemented — temporal keywords trigger `effective_from`/`effective_to` filter instead of current-only filter	🟡	`versioning-and-freshness.md §4`
2.9	GDPR/RTBF: hard-delete across all versions (vector index + metadata + caches); audit log entry retained forever even after deletion	🔴	`versioning-and-freshness.md §8`

Stage 3 — Embedding & Indexing

Cross-refs: vector-database skill, vector-database/references/embedding-migration-guide.md, vector-database/references/ann-vs-knn.md

#	Check	Severity if failing	Reference
3.1	Embedding model pinned as a constant or env config — NEVER inline string literal at each call site	🔴	`vector-database/SKILL.md`
3.2	`embedding_model` version stored in chunk metadata so index-time and query-time models can be verified	🔴	`advanced-chunking-guide.md §4`
3.3	Query and index use the SAME distance operator (`cosine <=>` matches `vector_cosine_ops` index)	🔴	`vector-database/references/ann-vs-knn.md`
3.4	ANN/HNSW used for production retrieval; KNN/Flat reserved for golden-set eval baseline and scoped high-risk paths	🟠	`vector-database/references/ann-vs-knn.md` Decision rule
3.5	`efSearch` / `nprobe` calibrated against KNN ground truth on golden set — never tuned by feel	🟠	`vector-database/references/ann-vs-knn.md` Tuning
3.6	Null guard: `WHERE embedding IS NOT NULL` (or equivalent) in retrieval query	🟠	`rag-pipeline-reviewer` agent checklist
3.7	Batch embedding — NOT one API call per chunk in a loop	🟡	`rag-pipeline-reviewer` agent checklist
3.8	Embedding model upgrade plan documented (dual-write or shadow index strategy before cutover)	🟡	`vector-database/references/embedding-migration-guide.md`

Stage 4 — Query Understanding & Routing

Cross-refs: rag-architect/references/query-classification-taxonomy.md, rag-architect/references/query-transformation-guide.md

#	Check	Severity if failing	Reference
4.1	Query classification covers the 9 canonical intent classes (`factual_lookup`, `policy_question`, `structured_record_lookup`, `comparison`, `summarization`, `multi_hop`, `troubleshooting`, `personalized`, `unsafe`)	🟠	`query-classification-taxonomy.md §1`
4.2	Structured record lookups (exact IDs, counts, current account state) route to SQL/API — NOT RAG	🔴	`query-classification-taxonomy.md §4`
4.3	`unsafe_or_disallowed` class refuses before any retrieval takes place; attempt is audit-logged	🔴	`query-classification-taxonomy.md §1`
4.4	Router type matches corpus complexity: rule-based for v1 (< ~30 rules), classifier for more; LLM router only for agentic flows	🟡	`query-classification-taxonomy.md §3`
4.5	Conversational follow-up queries rewritten (pronouns resolved, standalone query) before retrieval	🟠	`query-transformation-guide.md §1`
4.6	HyDE NOT deployed on factual, legal, or contractual query types without A/B measured win on golden set	🟠	`query-transformation-guide.md §5`
4.7	Sub-index routing considered when a corpus segment exceeds ~100k documents	🟡	`query-classification-taxonomy.md §4`

Stage 5 — Retrieval, Fusion & Reranking

Cross-refs: rag-architect/SKILL.md (Default Stack), rag-architect/references/rag-operations-guide.md

#	Check	Severity if failing	Reference
5.1	Hybrid retrieval: dense + BM25 — or documented measured justification for skipping sparse	🟠	`rag-architect/SKILL.md` Step 1
5.2	Fusion via RRF (k=60) — NOT ad-hoc weighted score blending without normalization	🟠	`rag-architect/SKILL.md` Step 1
5.3	Pre-filtered ANN (filters in vector DB call) — NEVER post-filter on top-k results	🔴	`rag-security-reviewer/SKILL.md §1`
5.4	`top_k_retrieve` >> `top_k_final`: retrieve 50–80 candidates, rerank down to 5–10 for generation	🟠	`rag-operations-guide.md §6`
5.5	Cross-encoder reranker present on production paths	🟠	`rag-architect/SKILL.md` Step 1
5.6	Environment-aware reranker: local/dev uses lighter model (Gemini Flash); staging/prod uses production-grade (Vertex AI Ranking API)	🟡	`rag-operations-guide.md §5`
5.7	Query embedding cache keys include `tenant_id` — NOT query text alone	🔴	`rag-operations-guide.md §1`
5.8	MMR or diversity filter applied during context packing to prevent near-duplicate chunks in final context	🟡	`context-packing-patterns.md §1`

Stage 6 — Generation Contract & Abstention

Cross-refs: rag-architect/references/context-packing-patterns.md, rag-architect/references/abstention-decision-framework.md, rag-architect/references/rag-response-contract.md

#	Check	Severity if failing	Reference
6.1	Strict grounding prompt: "answer ONLY using the provided context" — no room for LLM to draw on training data	🟠	`context-packing-patterns.md §3`
6.2	Structured answer contract returned (not free-form prose with embedded citations the system can't parse)	🟠	`rag-response-contract.md §1`
6.3	Per-claim citations — every factual sentence in the answer carries a `[chunk_id]` reference	🟠	`context-packing-patterns.md §3`
6.4	`schema_version` emitted on every response	🟠	`rag-response-contract.md §8`
6.5	`status` field is a value from the closed 7-value enum (`answered`/`partial`/`abstained`/`clarification_needed`/`tool_error`/`policy_blocked`/`no_retrieval`)	🔴	`rag-response-contract.md §3`
6.6	`no_retrieval` status is explicitly policy-governed — NOT used as a silent fallback that bypasses RAG	🔴	`rag-response-contract.md §3`
6.7	`confidence_level` is a band (`high`/`medium`/`low`) — NEVER raw float exposed to UI	🟠	`rag-response-contract.md §4`
6.8	Confidence derived from evidence signals (retrieval score, rerank score, citation coverage, freshness), NOT from LLM self-report	🟠	`rag-response-contract.md §4`
6.9	Faithfulness gate implemented: if faithfulness < threshold (~0.7), band clamped to `low` regardless of weighted score	🟠	`rag-response-contract.md §4`
6.10	`safety` block emitted on every response — missing block ≠ empty `{}`	🔴	`rag-response-contract.md §6`
6.11	`answer_spans` character offsets use UTF-16 code units; encoding documented in `schema_version` notes	🟠	`rag-response-contract.md §5`
6.12	`evidence_type` enum (`direct`/`inferred`/`contextual`) present on every citation	🟡	`rag-response-contract.md §5`
6.13	Internal-only fields (`relevance_score`, `retrieval_metadata`, `trace`, `usage`, raw `chunk_id`) stripped before response reaches client	🟠	`rag-response-contract.md §2`
6.14	Abstention enforced at agent/pipeline layer via calibrated threshold checks — NOT relying solely on LLM prompt instruction	🔴	`abstention-decision-framework.md §3`
6.15	All 8 abstention triggers handled: no context, off-topic retrieval, missing clause, conflicting docs, citation unsupported, stale evidence, high-risk domain without grounding, required doc not retrieved	🟠	`abstention-decision-framework.md §3`
6.16	Refusal text uses one of the 4 named templates (missing context / weak context / conflicting context / high-risk) — NOT free LLM improvisation	🟡	`abstention-decision-framework.md §6`
6.17	`clarification_reason` field REQUIRED when `status = clarification_needed`	🟠	`rag-response-contract.md §3`
6.18	Streaming uses 6-event model (`status`/`answer_delta`/`citation`/`warning_flag`/`answer_span`/`done`); `answer_span` emitted only after generation is complete, NOT mid-stream	🟠	`rag-response-contract.md §7`

Stage 7 — Security

This stage delegates to rag-security-reviewer/SKILL.md — do NOT duplicate it here. Load that skill in parallel when reviewing security-sensitive RAG code. The five items below are the minimum gate; the dedicated skill has full coverage.

#	Minimum gate (full coverage in rag-security-reviewer)	Severity	Reference
7.1	Pre-filtered ANN at a SINGLE chokepoint — NEVER post-filter on top-k for security enforcement	🔴	`rag-security-reviewer/SKILL.md §1–2`
7.2	Retrieved content treated as untrusted: clear delimiters in prompt, no tool-call execution allowed from retrieved text	🔴	`rag-security-reviewer/SKILL.md §3`
7.3	Per-response audit log: `request_id`, `user_id`, `tenant_id`, `filters_applied`, `chunks_used`, `model`, `prompt_version`, `latency_ms`	🔴	`rag-security-reviewer/SKILL.md §4`
7.4	PII redaction at ingest AND at response (`safety.redactions[]`)	🔴	`rag-security-reviewer/SKILL.md §5` + `rag-response-contract.md §6`
7.5	Right-to-be-forgotten tested end-to-end: vector index + BM25 index + metadata + all caches; audit log entry retained	🔴	`rag-security-reviewer/SKILL.md §6` + `versioning-and-freshness.md §8`

If ANY 🔴 item in Stage 7 fails, the review verdict is BLOCK regardless of other stages.

Stage 8 — Evaluation & Observability

Cross-refs: rag-evaluator/SKILL.md, rag-architect/references/rag-operations-guide.md

#	Check	Severity if failing	Reference
8.1	Golden set exists with ≥ 50 labeled queries — or has a concrete plan with date	🔴	`rag-evaluator/SKILL.md` Golden set design
8.2	Coverage targets met: ~60% happy path, ~20% edge cases, ~20% known unanswerable	🟠	`rag-evaluator/SKILL.md` Coverage targets
8.3	CI gate configured: fails build on Recall@K, Faithfulness, or Citation Quality regression beyond tolerance	🔴	`rag-evaluator/SKILL.md` CI gate pattern
8.4	Abstention metrics tracked as two separate rates — False Abstention Rate + False Answer Rate — NOT collapsed into one metric	🟠	`rag-evaluator/SKILL.md` Answer metrics
8.5	Eval breakdown by dimension when corpus grows: doc type, recency, query style, query length, corpus segment	🟡	`rag-evaluator/SKILL.md` Eval Breakdown by Dimension
8.6	LLM-as-judge calibrated against human labels (≥ 80% agreement) before use at scale	🟠	`rag-evaluator/SKILL.md` LLM-as-judge dangers
8.7	All 5 production dashboards wired: Health (p95 latency), Quality (faithfulness, abstention rate), Cost (per-query cost, cache hit rate), Ingestion (queue depth, failed ingestions), Security (cross-tenant attempts, audit log gaps)	🟠	`rag-operations-guide.md §3`
8.8	Async eval pattern: sample 1–10% of production traffic via trace store — NOT inline in sync response path	🟡	`rag-response-contract.md §1`
8.9	Corpus hygiene checks run after bulk ingestion: near-duplicate detection (`content_hash` dedup), stale/superseded doc sweep, authority tagging for canonical versions	🟡	`rag-operations-guide.md §9`
8.10	Post-ingestion smoke test runs after every ingestion that changed ≥ 1 chunk; ingestion trace logged	🟡	`incremental-ingestion-guide.md §7`
8.11	p95 latency target defined and monitored per stage: query embedding ≤ 50ms, ANN retrieval ≤ 200ms, reranker ≤ 250ms, LLM first token ≤ 500ms, total ≤ 2s	🟡	`rag-operations-guide.md §4`

Verdict format

Output verdict as the final line of the review report:

VERDICT: [APPROVE | NEEDS_REVIEW | BLOCK] — CRITICAL: N | HIGH: N | MEDIUM: N | LOW: N

Rules:

BLOCK — any 🔴 Critical finding. Must fix before merge.
NEEDS_REVIEW — zero Critical, but ≥ 1 High or pattern of Medium. Recommend fix.
APPROVE — zero Critical, zero High, ≤ 3 Medium. Ship with notes on Lows.

When NOT to use this skill

Situation	Use instead
Pure security review only	`rag-security-reviewer/SKILL.md` directly (more depth)
Designing a NEW RAG pipeline	`rag-architect/SKILL.md`
Diagnosing one failed production query	`rag-debugger/SKILL.md` 9-layer chain via `/rag-debug`
Running the eval suite to detect regressions	Dispatch `@rag-eval-runner` agent
Debugging why corpus-scale accuracy dropped	`rag-evaluator/SKILL.md` Eval Breakdown by Dimension

References table

Stage	Authoritative references
1 — Ingestion & chunking	`rag-architect/references/advanced-chunking-guide.md`, `rag-architect/references/table-chunking-strategy.md`
2 — Versioning / updates / deletion	`rag-architect/references/versioning-and-freshness.md`, `rag-architect/references/incremental-ingestion-guide.md`
3 — Embedding & indexing	`vector-database/SKILL.md`, `vector-database/references/ann-vs-knn.md`, `vector-database/references/embedding-migration-guide.md`
4 — Query understanding & routing	`rag-architect/references/query-classification-taxonomy.md`, `rag-architect/references/query-transformation-guide.md`
5 — Retrieval, fusion & reranking	`rag-architect/SKILL.md` (Default Stack), `rag-architect/references/rag-operations-guide.md`
6 — Generation contract & abstention	`rag-architect/references/context-packing-patterns.md`, `rag-architect/references/abstention-decision-framework.md`, `rag-architect/references/rag-response-contract.md`
7 — Security	`rag-security-reviewer/SKILL.md` (dedicated)
8 — Eval & observability	`rag-evaluator/SKILL.md`, `rag-architect/references/rag-operations-guide.md`

Maintenance note

When a new reference file is added to any sibling skill (rag-architect, rag-evaluator, rag-security-reviewer, vector-database, rag-debugger), update the corresponding stage table here in the same PR. The skill is the single point of update; the consuming agents (rag-implementation-reviewer, rag-pipeline-reviewer) and command (/rag-review) inherit automatically via skills: frontmatter.