// Index local folders and query them using RAG (Retrieval Augmented Generation). Supports PDF, DOCX, PPTX, XLSX, images with OCR, and text files.
| name | local-rag |
| description | Index local folders and query them using RAG (Retrieval Augmented Generation). Supports PDF, DOCX, PPTX, XLSX, images with OCR, and text files. |
Index and semantically search your local documents using embeddings and ChromaDB.
| Command | Action |
|---|---|
update rag from [path] | Index files in the specified folder |
query rag [question] | Search the indexed documents |
search documents [query] | Alternative search command |
.pdf) - Native text extraction + OCR fallback.docx) - Full document text extraction.pptx) - Slide text extraction.xlsx) - Cell content extraction.md).txt).json, .yaml, .yml).py).js).ts).html, .css).c, .cpp, .h).sh).png).jpg, .jpeg).tiff).webp)update_ragIndex a directory of documents.
Parameters:
path: Absolute path to the directory to indexExample:
update rag from ~/Documents/research
Behavior:
query_ragSearch the indexed knowledge base.
Parameters:
query: Natural language question or search termsk: Number of results (default: 5)Example:
query rag what are the key findings about climate change?
Output:
~/MyDrive/claude-skills-data/local-rag/
โโโ vectordb/ # Vector database storage
โ โโโ chroma.sqlite3 # ChromaDB database
โ โโโ [uuid]/ # Collection data
โโโ state/
โโโ ingest_state.json # File tracking state
โโโ bm25_index.json # BM25 keyword index
| Variable | Default | Description |
|---|---|---|
OCR_ENABLED | true | Enable OCR for images/scanned PDFs |
OCR_MAX_PAGES | 120 | Max pages to OCR per PDF |
OCR_PAGE_DPI | 200 | DPI for PDF-to-image conversion |
Uses sentence-transformers/all-MiniLM-L6-v2 for:
User: update rag from ~/Documents/research-papers
Claude: Scanning ~/Documents/research-papers...
Found 47 PDF files, 12 markdown files
Indexing complete. Processed 59 files (1,247 chunks).
User: query rag neural network training techniques
Claude: Found 5 relevant results:
1. deep-learning-survey.pdf (score: 0.89)
"...backpropagation remains the primary training algorithm..."
2. optimization-methods.md (score: 0.84)
"...Adam optimizer combines momentum with adaptive learning..."
User: update rag from ~/Documents/research-papers
Claude: Scanning for changes...
3 files modified since last index
Indexing complete. Processed 3 files.
chromadb>=0.4.0
sentence-transformers>=2.2.0
rapidfuzz>=3.0.0
python-dotenv>=1.0.0
pydantic-settings>=2.0.0
pypdf>=3.0.0
pdf2image>=1.16.0
Pillow>=9.0.0
python-docx>=0.8.11
python-pptx>=0.6.21
openpyxl>=3.0.0
ocrmypdf>=15.0.0
pytesseract>=0.3.10
Install on macOS:
brew install poppler tesseract tesseract-lang antiword
Install on Ubuntu:
sudo apt install poppler-utils tesseract-ocr tesseract-ocr-heb antiword
local-rag/
โโโ SKILL.md # This file
โโโ AI_GUIDE.md # AI assistant guide
โโโ README.md # Quick intro
โโโ CHANGELOG.md # Version history
โโโ version.yaml # Version info
โโโ pyproject.toml # Installable package metadata
โโโ local_rag/
โ โโโ cli.py # Unified CLI entrypoint
โ โโโ indexer.py # Document indexing
โ โโโ query.py # Search functionality
โ โโโ visualize.py # Chunk visualization
โ โโโ vectorstore.py # Vector DB abstraction
โ โโโ search.py # Hybrid search logic
โ โโโ chunking.py # Chunking strategies
โ โโโ ingest/
โ โโโ extractor.py # File content extraction
โ โโโ ocr.py # OCR processing
โ โโโ utils.py # Utilities
โโโ tests/ # Test suite
โโโ docs/ # Additional documentation