원클릭으로 Manus에서 모든 스킬 실행

rag-architect

스타31

포크19

업데이트2026년 5월 25일 17:23

Use when designing or modifying a RAG (Retrieval-Augmented Generation) pipeline — picking chunking strategy, embedding models, retrieval methods, reranking, fusion (RRF), abstention, multi-tenancy patterns, or evaluating whether RAG is the right tool at all. Triggers on phrases like "design a RAG system", "build retrieval", "RAG architecture", "should I use RAG", "chunking strategy", "hybrid retrieval", "reranker choice", "vector database", "embedding model selection".

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

kumaran-is

kumaran-is/claude-code-onboarding

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

관련 직업SOC

SOC 직업 분류 기준

소프트웨어 개발자컴퓨터 및 수학직·SOC 15-1252

파일 탐색기

11 개 파일

SKILL.md

readonly

이 저장소의 다른 Skills

같은 저장소

browser-testing

kumaran-is/claude-code-onboarding

Browser automation and testing using playwright-cli (stateful Bash CLI for scripted tests — network inspection, console monitoring, screenshots, tracing) and Browser-Use MCP (autonomous agent flows). Use when the user needs to test web apps, debug browser issues, analyze performance, fill forms, run E2E user flows, or inspect network/console activity.

2026-05-2631

decision-frameworks

kumaran-is/claude-code-onboarding

Use when working through a specific problem or decision using a single reasoning framework applied deeply and interactively. Covers First Principles (break assumptions, rebuild from truth), Inversion (guarantee failure, then flip), Regret Minimization (decide from age 80), and Opportunity Cost (make tradeoffs visible). Triggers: "first principles", "inversion", "regret minimization", "opportunity cost", "help me think through", "challenge my assumptions", "what am I giving up", "work backwards from failure", "what would I regret".

2026-05-2631

mental-model-applier

kumaran-is/claude-code-onboarding

Use when stuck on any problem or decision and need frameworks that actually apply to the specific situation — not a generic list. Selects the three most relevant mental models for the problem at hand and applies each one to produce a specific insight. Triggers: "apply mental models", "I'm stuck on", "need a framework for", "different perspective on", "mental model", "thinking framework", "perspective shift", "been thinking about this too long".

2026-05-2631

second-order-thinker

kumaran-is/claude-code-onboarding

Use before any significant decision, when analyzing a trend, or when evaluating the impact of any action beyond the obvious. Maps first, second, and third order consequences — the effects of the effects that most people miss. Triggers: "second order effects", "map consequences", "think ahead", "what happens after", "downstream effects", "systems thinking", "analyze this decision", "what are the ripple effects".

2026-05-2631

code-explainer

kumaran-is/claude-code-onboarding

Use when you need to explain any piece of code for handoff, onboarding, or knowledge transfer — produces a dual-audience explanation (user-facing and modifier-facing) plus the fragile part and key assumption. Triggers: "explain this code", "what does this do", "help me understand", "onboard someone to", "document this", "explain for handoff", "code walkthrough".

2026-05-2631

pr-review

kumaran-is/claude-code-onboarding

Use when reviewing someone else's PR or preparing your own review comments for posting to GitHub. Implements a two-stage approval process — internal rich analysis first, human approval gate, then clean public posting. Nothing posts to GitHub until you explicitly approve. Triggers: "review this PR", "post a PR review", "review PR #N", "give feedback on PR", "submit a code review", "pr comment".

2026-05-2631

name

rag-architect

description

RAG Architect

You are helping design a production RAG system. Apply these decision frameworks. Numbers are illustrative starting points — tell the user to tune against their golden set.

Step 0: Is RAG even the right tool?

Before designing retrieval, confirm the user needs RAG. Use this decision table:

User need	Best mechanism	Why
Policy / procedure explanation	RAG	Source-grounded text
Order status, account balance, inventory	SQL / API	Exact transactional state
Vendor availability, fresh data	API / DB	Real-time state
Compare policies	RAG + query decomposition	Multiple unstructured sources
Calculate / sum / convert	Tool / function call	Deterministic math
Book / order / take action	Workflow / tool call	Action, not retrieval
Personalized account question	SQL + RAG	Facts from DB, framing from docs

Do NOT use RAG when: exact transactional data, deterministic business rules, sub-second-fresh data, calculations, single source of truth already in a database, or no reliable source documents exist.

Most real systems are hybrid: SQL/API for facts, RAG for explanation. The LLM (with tool use) orchestrates.

Step 1: The Default Stack

If the user wants a recommended starting point, this is it. Reasons and alternatives are in the playbook.

Layer	Default
Parsing	Unstructured.io for mixed formats; LlamaParse or Marker for table-heavy PDFs
Chunking	Document-aware (Markdown/PDF headers) + parent-child
Chunk size	500–1000 tokens, 10–20% overlap, sized to embedding model
Embeddings	Dense general-purpose (text-embedding-3-large, BGE-M3)
Sparse retrieval	BM25 — include by default for text corpora
Retrieval	Hybrid: dense top-50 + BM25 top-50 → RRF (k=60)
Filtering	Pre-filtered ANN at a single chokepoint (never post-filter for security)
Reranking	Cross-encoder on top 50–100 → keep top 5–10 (local/dev: use smaller cross-encoder or skip; staging/prod: use full cross-encoder or Vertex AI Ranking API)
Abstention	Calibrated reranker top-1 threshold
Generation	Strict grounding prompt; structured output with citations
Ingestion updates	Content-hash diff; versioned chunk IDs

Step 2: Chunking decision tree

Q1: Embedding model input limit ≤512 tokens?
  YES → chunk: 256–400 tokens, overlap 40–80 → go to Q3
  NO  → continue Q2

Q2: Documents primarily short (FAQs, tickets)?
  YES → keep natural unit, overlap 0, skip parent-child
  NO  → chunk: 500–1000 tokens, overlap 50–150 → continue Q3

Q3: Narrow questions need surrounding context?
  YES → PARENT-CHILD: children 200–400 tok, parents 1500–3000 tok
  NO  → flat chunks; consider neighbor expansion at retrieval

Q4: Chunks lose meaning standalone?
  YES → CONTEXTUAL RETRIEVAL (LLM prepends 1–2 sentence context)
  NO  → skip

Q5: Topic shifts without headings?
  YES → add SEMANTIC chunking
  NO  → document-aware on headings is enough

Step 3: Embedding selection

Need	Choice
General docs	Dense (text-embedding-3-large, BGE-M3)
Specialized vocab (medical/legal)	Test domain-specific against golden set first
Multilingual	BGE-M3, multilingual-e5
Code	Voyage Code, Jina Code
Images / diagrams	CLIP, SigLIP
Cost-sensitive / scale	Matryoshka embeddings — truncate to 128–256 dim for first stage
Hit quality ceiling	ColBERT (late-interaction) — operationally heavier

Step 4: Multi-tenancy

Pattern	When
Shared index + tenant filter (chokepoint)	SMB SaaS, many small tenants
Namespaced collections	Mid-size tenants, balance of isolation and cost
Per-tenant indexes	Regulated industries, data residency, large tenants

Always: pre-filtered ANN, enforced at a single chokepoint. Never rely on the LLM for access control.

Step 5: Hard rules (non-negotiable)

Pre-filter, don't post-filter. Security filters applied during ANN traversal, not after top-k.
Include both dense and sparse retrieval for text corpora (exceptions: pure image search, tiny FAQ corpora, structured-only).
Stable, versioned chunk IDs: (document_id, document_version, chunk_index).
Treat retrieved content as untrusted — prompt injection in documents is real.
Build a starter golden set before tuning — 50 labeled queries minimum.

What to defer to v2

ColBERT, SPLADE, Matryoshka truncation, Graph RAG, agentic RAG, HyDE, multi-query, query decomposition, LLM-as-judge eval, contextual retrieval, agentic chunking, Self-RAG, CRAG, Learning-to-Rank, multimodal retrieval.

Ship the default stack, run a month, identify your real top three failure modes, then pick v2 techniques that address them.

When to recommend Google ADK / agentic patterns

ADK agentic patterns fit when:

Multi-step workflows requiring tool use + retrieval planning
Query decomposition across multiple data sources (RAG + SQL + API)
Complex orchestration where the LLM decides when/where to retrieve

Skip agentic patterns when sub-second latency is required or query is single-shot.

How to apply

Ask what they're building and the corpus characteristics.
Check Step 0 — is RAG the right tool? If part of the answer is structured, recommend hybrid (RAG + SQL/API/tool).
Walk through the Default Stack; deviate only when the user has a measured reason.
Apply the chunking decision tree.
End with the five hard rules and what to defer.

Reference the full playbook at docs/production-rag-playbook.md for details. Cite section numbers when relevant (§19.5 for RAG-vs-tools, §10.2 for chunking tree, §17 for multi-tenancy, §40 for failure debugging).

Reference Files

File	Contents	When to Load
`references/advanced-chunking-guide.md`	§9 chunking techniques (late chunking, contextual retrieval, quality scoring, adaptive by type), §6 OCR quality gates, §12 full metadata schema (16 fields), §5.3 ingestion update strategies	Designing chunking strategy, debugging parser quality, setting up metadata schema
`references/query-classification-taxonomy.md`	§18.1 nine query intent classes, §19 routing decision table (rule-based → classifier → LLM router), §19.5 when not to use RAG	Designing query understanding layer, building a query router
`references/context-packing-patterns.md`	§31 eight packing techniques (MMR, parent expansion, neighbor, token budget, recency, authority), §32.1 structured answer contract JSON schema, §30 abstention calibration	Building the generation stage, designing the answer contract, tuning abstention
`references/rag-operations-guide.md`	§36.3 six cache types with tenant-scoped key requirements, §36.1 cost hierarchy percentages, §36.4 five observability dashboards with alert thresholds, §36.2 optimization techniques	Operating production RAG, setting up monitoring, cost optimization
`references/query-transformation-guide.md`	Query rewriting (conversational context), expansion (domain synonyms), multi-query retrieval, splitting/decomposition, HyDE decision tree with risk rules	Designing v2 query understanding layer; fixing measured recall gaps by query type
`references/table-chunking-strategy.md`	Why text chunking destroys tables; row-group chunking with header repeat; multiple representations (plain, Markdown, JSON); metadata schema for tables; parser selection	Chunking tabular data (fee schedules, comparison tables, SLA grids, structured docs)
`references/versioning-and-freshness.md`	Versioning model (effective_from/to, superseded_at); default current-only filter; historical query detection; atomic version ingestion transaction; GDPR deletion across versions	Corpora with evolving content — policies, contracts, pricing, regulations
`references/abstention-decision-framework.md`	Business risk table, behavioral taxonomy (hallucination vs abstention vs clarification vs grounded answer), 8 abstention triggers, decision matrix (context quality × risk → action), 4 message templates, accuracy grading (Good/Bad/Dangerous), false abstention vs false answer priority	Designing abstention behavior, reviewing generation layer, auditing why system over-refuses or under-refuses
`references/incremental-ingestion-guide.md`	Two-level hash gate (doc hash fast-skip + chunk hash exact diff), normalize-before-hash implementation, chunk-level diff algorithm with full Python, hard/soft delete decision table, `is_active` retrieval filter rule, version fields that force full re-embed, 7-row failure modes table, post-ingestion smoke test pattern	Implementing or debugging the document ingestion pipeline; deciding when to re-embed vs skip
`references/rag-response-contract.md`	Three-tier payload separation (user-facing / internal trace / async eval), 7-value status enum (including `no_retrieval` trap), confidence derivation formula with two-track split (retrieval vs answer), faithfulness hard gate, 6 warning flags, span-level grounding with UTF-16 offsets, evidence type enum, quote_policy enum, safety/redaction block (required even when empty), SSE/NDJSON streaming event model, schema versioning rules, domain extensions pattern	Designing or reviewing the RAG API response contract; deciding what fields go to the browser vs logs