// Set up RAG pipelines with document chunking, embedding generation, and retrieval strategies using LlamaIndex. Use when building new RAG systems, choosing chunking approaches, selecting embedding models, or implementing vector/hybrid retrieval for src/ or src-iLand/ pipelines.
| name | implementing-rag |
| description | Set up RAG pipelines with document chunking, embedding generation, and retrieval strategies using LlamaIndex. Use when building new RAG systems, choosing chunking approaches, selecting embedding models, or implementing vector/hybrid retrieval for src/ or src-iLand/ pipelines. |
Quick-start guide for building production-grade RAG pipelines with LlamaIndex. This skill helps you set up the foundational components: document processing, embedding generation, and retrieval.
src/ or src-iLand/similarity_top_kfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure chunking
Settings.chunk_size = 1024
Settings.chunk_overlap = 20
# Configure embeddings
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
embed_batch_size=100
)
# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create query engine
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Your question here")
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.core.node_parser import SentenceSplitter
# Use multilingual embeddings
embed_model = CohereEmbedding(
model_name="embed-multilingual-v3.0",
api_key="YOUR_COHERE_API_KEY"
)
# Configure chunking for Thai text
node_parser = SentenceSplitter(
chunk_size=1024,
chunk_overlap=50
)
# Load Thai documents
documents = SimpleDirectoryReader("./data").load_data()
# Create index with multilingual support
index = VectorStoreIndex.from_documents(
documents,
embed_model=embed_model,
transformations=[node_parser]
)
query_engine = index.as_query_engine(similarity_top_k=5)
from llama_index.core import VectorStoreIndex
from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core.retrievers import QueryFusionRetriever
import Stemmer
# Create vector index
index = VectorStoreIndex.from_documents(documents)
vector_retriever = index.as_retriever(similarity_top_k=10)
# Create BM25 retriever
bm25_retriever = BM25Retriever.from_defaults(
docstore=index.docstore,
similarity_top_k=10,
stemmer=Stemmer.Stemmer("english")
)
# Combine with query fusion
hybrid_retriever = QueryFusionRetriever(
[vector_retriever, bm25_retriever],
similarity_top_k=5,
mode="reciprocal_rerank",
use_async=True
)
# Use in query engine
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(hybrid_retriever)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
# Use local model with ONNX acceleration (3-7x faster)
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-base-en-v1.5",
backend="onnx" # Requires: pip install optimum[onnxruntime]
)
# Rest of the pipeline remains the same
index = VectorStoreIndex.from_documents(documents)
src/ PipelineDocument Preprocessing (src/02_prep_doc_for_embedding.py):
Batch Embeddings (src/09_enhanced_batch_embeddings.py):
embed_batch_size to 100 for faster processingBasic Retrieval (src/10_basic_query_engine.py):
similarity_top_k based on chunk sizesrc-iLand/ Pipeline (Thai Land Deeds)Data Processing (src-iLand/data_processing/):
top_k for better precisionEmbeddings (src-iLand/docs_embedding/):
Retrieval (src-iLand/retrieval/retrievers/):
Load these reference files when you need comprehensive details:
reference-chunking.md: Complete chunking strategies guide
reference-embeddings.md: Embedding model selection and optimization
reference-retrieval-basics.md: Core retrieval patterns
Step 1: Choose chunking strategy based on document type
reference-chunking.md for detailed guidanceStep 2: Select embedding model
reference-embeddings.md for model comparisonStep 3: Implement basic vector retrieval
Step 4: Optimize chunk size and top_k
Step 5: (Optional) Upgrade to hybrid search
Step 1: Choose local model (see reference-embeddings.md)
Step 2: Install optimization backend
pip install optimum[onnxruntime]
Step 3: Update embedding configuration
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-base-en-v1.5",
backend="onnx"
)
Step 4: Re-index all documents
Step 5: Validate retrieval quality
Step 1: Use multilingual embedding model
embed_model = CohereEmbedding(
model_name="embed-multilingual-v3.0"
)
Step 2: Configure appropriate chunking
Step 3: Implement hybrid search
Step 4: Add metadata filtering
Step 5: Test with Thai queries
Critical Requirements:
similarity_top_kBest Practices:
num_workers=10) for 13x speedupPerformance Tips:
embed_batch_size=100 for API callsSimpleDirectoryReader(...).load_data(num_workers=10)After implementing basic RAG:
optimizing-rag skill for reranking, caching, production deploymentevaluating-rag skill to measure hit rate, MRR, and compare strategies