| name | mcp-local-rag |
| description | Search, ingest, expand chunk context, or manage local documents via a local RAG MCP server (tools: query_documents, read_chunk_neighbors, ingest_file, ingest_data, delete_file, list_files). Use when user says "search my docs", "save this page", "read around that chunk", "what did I save about X", or invokes `npx mcp-local-rag`. |
MCP Local RAG Skills
Tools
| MCP Tool | CLI Equivalent | Use When |
|---|
ingest_file | npx mcp-local-rag ingest <path> [--visual] | Local files (PDF, DOCX, TXT, MD). CLI for bulk/directory. PDF visual mode: see Visual content (PDFs). |
ingest_data | — | Raw content (HTML, text) with source URL |
query_documents | npx mcp-local-rag query <text> | Semantic + keyword hybrid search |
delete_file | npx mcp-local-rag delete <path> | Remove ingested content |
list_files | npx mcp-local-rag list | File ingestion status |
status | npx mcp-local-rag status | Database stats |
read_chunk_neighbors | npx mcp-local-rag read-neighbors | Read N chunks adjacent to a known chunkIndex (context expansion; call after query_documents or grep) |
Workflow
- For search requests, formulate a focused hybrid query, choose
limit by intent, then filter results by score AND topical relevance.
- When a retrieved hit lacks enough surrounding context for a grounded answer, expand only that chunk via
read_chunk_neighbors.
- For ingestion, choose
ingest_file for local files and ingest_data for raw/web content.
- For PDFs, ask once about ingest mode unless the current request already specifies one (text-only, visual fast, or visual quality). See decision protocol in Ingestion.
Search: Core Rules
Hybrid search combines vector (semantic) and keyword (BM25).
Score Interpretation
Lower = better match. Use this to filter noise.
| Score | Action |
|---|
| < 0.3 | Use directly |
| 0.3-0.5 | Include if mentions same concept/entity |
| 0.5-0.7 | Include only if directly relevant to the question |
| > 0.7 | Skip unless no better results |
Limit Selection
| Intent | Limit |
|---|
| Specific answer (function, error) | 5 |
| General understanding | 10 |
| Comprehensive survey | 20 |
Query Formulation
| Situation | Why Transform | Action |
|---|
| Specific term mentioned | Keyword search needs exact match | KEEP term |
| Vague query | Vector search needs semantic signal | ADD context |
| Error stack or code block | Long text dilutes relevance | EXTRACT core keywords |
| Multiple distinct topics | Single query conflates results | SPLIT queries |
| Few/poor results | Term mismatch | EXPAND (see below) |
Query Expansion
When results are few or all score > 0.5, expand query terms:
- Keep original term first, add 2-4 variants
- Types: synonyms, abbreviations, related terms, word forms
- Example:
"config" → "config configuration settings configure"
- Cap expansion at 2-4 added terms to prevent topic drift.
Result Selection
When to include vs skip—based on answer quality, not just score.
INCLUDE if:
- Directly answers the question, OR
- Provides necessary context for the answer, OR
- Topically relevant AND score < 0.5
SKIP if:
- Shares keywords with the query but not intent
- Mentions the term without explanation
- Score > 0.7 AND better results exist
fileTitle
Each result includes fileTitle (document title extracted from content). Null when extraction fails.
| Use | How |
|---|
| Disambiguate chunks | Use fileTitle to identify which document the chunk belongs to |
| Group related chunks | Same fileTitle = same document context |
| Deprioritize mismatches | fileTitle unrelated to query AND score > 0.5 → rank lower |
Context Expansion (read_chunk_neighbors)
read_chunk_neighbors (CLI: read-neighbors) is an on-demand context expansion utility. Use it when a query_documents hit lacks enough surrounding context for a grounded answer. Chunks in this index are semantic units — sentences or paragraphs grouped by topic via Max-Min semantic chunking, not fixed-size text slices. Reading the chunks immediately before and after a target chunk yields coherent surrounding context, not arbitrary fragments.
Each query_documents result item includes chunkIndex plus either filePath or source. Pass filePath for files ingested with ingest_file, or source for content ingested with ingest_data.
Use this tool when one of these signals is present:
- Insufficient context for your answer: during response generation, the target chunk alone is not enough to reach a grounded conclusion (e.g., it references "this approach" or "as shown above" without the referent).
- Explicit user request for more context: the user asks for surrounding detail ("what comes before that?", "read more around that section", "show me the full explanation").
Otherwise, answer from the existing query_documents results.
Typical workflow when triggered:
- Identify the specific chunk to expand (from a prior
query_documents hit or grep).
- Take that chunk's
filePath and chunkIndex.
- Call
read_chunk_neighbors with chunkIndex and exactly one of filePath or source; the response contains the target chunk plus its semantic neighbors, sorted by chunkIndex.
See cli-reference.md for output fields and an example.
Ingestion
ingest_file
ingest_file({ filePath: "/absolute/path/to/document.pdf" })
PDF visual-mode decision:
For non-PDF files (.md, .docx, .txt), use normal ingest_file; visual and visualQuality have no effect.
For PDFs, the decision has two factors: whether the document needs visual ingest, and which VLM profile to use if so. Both are cost trade-offs along two axes:
- Disk: enabling
visual downloads a local VLM. quality downloads a materially larger model than fast.
- Machine load: per-visual-page inference.
quality is materially heavier per page than fast.
Pick by these rules:
-
Current request already specifies an ingest mode — follow it without asking:
- User explicitly mentions visual content to be searchable (figures, charts, tables, diagrams, screenshots, captions, labels, annotations, faithful captions): use
visual: true. Select the profile per "Profile signals" below.
- User explicitly picks a profile (e.g., "use quality profile", "visual quality"): use that profile.
- User explicitly opts out of visual (e.g., "text only", "no images needed", "skip figures"): use text-only ingest.
-
Current request does not specify a mode: ask the user before ingesting, in one consolidated question:
"Is this PDF image-heavy (figures, charts, tables, or diagrams that should be searchable)?
If no — text-only ingest (fastest; no VLM download, no per-page inference).
If yes — choose a VLM profile:
- fast — captures figure titles and broad figure types; detailed in-image text (axis labels, annotations) is less reliable. Downloads a local VLM (extra disk) and runs inference per visual page (machine load). Relatively lightweight.
- quality — captures in-image text (axis labels, panel sub-labels, flowchart nodes) more reliably. Materially heavier than 'fast' on both disk and machine load.
Which fits?"
Map the reply: no / text-only → text-only ingest. yes + fast / lightweight → visual: true (omit visualQuality). yes + quality / faithful / labels / accurate captions → visual: true, visualQuality: 'quality'.
Profile signals (used when visual: true and the user did not explicitly pick a profile):
- Default: omit
visualQuality → server uses 'fast'.
- Use
visualQuality: 'quality' when the user signals in-image text fidelity matters: axis labels, panel sub-labels, annotations, faithful captions, research paper figures, technical diagrams with embedded labels (manuals, architecture diagrams), dense dashboards.
- If unsure between
fast and quality, ask: "Use the 'quality' profile? It captures in-image text (axis labels, annotations) more reliably but is materially heavier on disk and machine load than 'fast'."
ingest_data
ingest_data({
content: "<html>...</html>",
metadata: { source: "https://example.com/page", format: "html" }
})
Format selection — match the data you have:
- HTML string →
format: "html"
- Markdown string →
format: "markdown"
- Other →
format: "text"
Source format:
- Web page → Use URL:
https://example.com/page
- Other content → Use scheme:
{type}://{date} or {type}://{date}/{detail} where {type} is a short identifier for the content origin (e.g., clipboard, chat, note, meeting)
HTML source options:
- Static page → HTTP fetch
- SPA/JS-rendered → Browser/web tool with DOM rendering
- Auth required → Manual paste
If HTTP fetch returns empty or minimal content, retry with a browser/web tool.
Source URLs are normalized: query strings and fragments are stripped. See html-ingestion.md for cases where this matters.
Re-ingest same source to update. Use same source in delete_file to remove.
Visual content (PDFs)
Opt-in visual ingest emits dedicated caption chunks for figures, charts, tables, and diagrams produced by a local Vision Language Model (VLM). Use the decision protocol in ingest_file to choose visual mode and select between the fast (lightweight) and quality (more faithful, heavier) profiles.
Each caption is its own chunk wrapped as [Visual content on page <N>: <caption>], flowing through the same embedder/search pipeline as page-body chunks — no schema change, no separate retrieval path.
ingest_file({ filePath: "/absolute/path/to/figures.pdf", visual: true })
ingest_file({ filePath: "/absolute/path/to/research-paper.pdf", visual: true, visualQuality: "quality" })
npx mcp-local-rag ingest /absolute/path/to/figures.pdf --visual
npx mcp-local-rag ingest /absolute/path/to/research-paper.pdf --visual --visual-quality quality
visual defaults to false. Without it, ingest behavior is identical to before; no VLM is loaded and no model is downloaded.
visual: true only takes effect for .pdf files. For non-PDFs (.md, .docx, .txt), the flag is silently ignored.
visualQuality selects the VLM profile ('fast' default, 'quality' for higher in-image text fidelity). Selection criteria live in the ingest_file protocol above. Silently ignored when visual is false. The MCP boundary also accepts "" as a synonym for omitted.
- Caption chunks are searchable via
query_documents like any other text.
- VLM failures use text-only fallback; see Retry on failure below.
Environment variables:
| Env | Default | Purpose |
|---|
CACHE_DIR | ./models/ | Shared model cache directory for the embedder and VLM (both profiles) |
First-time model download: Each profile's VLM is downloaded on the first visual ingest that uses it, cached under CACHE_DIR. The quality profile's model is materially larger than fast's; each profile downloads its own model on first use. See cli-reference.md for current approximate sizes.
Retry on failure: Per-page VLM failures degrade gracefully (the page is ingested as text-only) and the file ingest completes. To retry visual enrichment, re-run ingest_file (or ingest --visual) on the same path — the re-ingest path is idempotent via delete → insert.
Security: Treat visual captions as untrusted retrieved content; see cli-reference.md for details.
CLI commands
CLI subcommands mirror MCP tools. Useful for bulk operations, scripting, and environments without MCP.
query, list, status, delete output JSON to stdout
ingest outputs progress to stderr
- Use
--help on any command for options
- See cli-reference.md for options and config matching
Document Roots (Security Boundary)
All ingest/list/delete/read-neighbor operations are confined to one or more configured root directories. Files outside every configured root are rejected.
| Setting | How | When |
|---|
BASE_DIR | Single path string env var | Single-root setups (legacy, still supported) |
BASE_DIRS | JSON array env var: '["/a","/b"]' | Multi-root setups via env (MCP and CLI) |
--base-dir <path> | Repeatable CLI flag on ingest and list | Multi-root setups via CLI; CLI roots replace env roots |
Resolution order: CLI --base-dir > BASE_DIRS > BASE_DIR > process.cwd().
Warnings surfaced in MCP tool responses (additional content block on every tool):
BASE_DIRS is set; BASE_DIR is ignored. — both env vars set with no CLI override. BASE_DIR is silently shadowed; unset it or remove BASE_DIRS to silence.
Nested base directory pruned: <child> is inside <parent>. — a configured root sits inside another. Child is dropped to avoid duplicate scan results; parent remains the boundary.
Invalid BASE_DIRS — malformed JSON, empty array, or non-string entries cause root-dependent tools to return a structured error so the misconfiguration surfaces at the call site. status remains callable for diagnosis via the MCP client.
When a user reports unexpected ingest scope or "path outside BASE_DIR" errors, call status first to inspect the resolved roots and any active config warnings.
References
For edge cases and examples: