name	mcp-semantic-search
description	Intent-based code discovery for CLI AI agents using semantic search MCP tools. Use when finding code by what it does (not what it's called), exploring unfamiliar areas, or understanding feature implementations. Mandatory for code discovery tasks when you have MCP access.
allowed-tools	["Grep","Read","Glob"]
version	1.0.0

MCP Semantic Search - Intent-Based Code Discovery

Semantic code search for CLI AI agents that enables AI-powered codebase exploration using natural language queries instead of keyword searches. Available exclusively for CLI AI agents with MCP (Model Context Protocol) support.

1. 🎯 WHEN TO USE

Navigation Guide

This file (SKILL.md): Core workflow and essential guidance

Reference Files (detailed documentation):

tool_comparison.md – Semantic vs Grep vs Glob decision framework
architecture.md – System components and data flow
query_patterns.md – Effective query writing guide

Assets (examples):

query_examples.md – Categorized example queries

Primary Use Cases

Use this skill when:

Exploring unfamiliar code
- You don't know where functionality lives
- You need to understand how features work
- You're new to the codebase
Finding by behavior/intent
- "Find code that validates email addresses"
- "Show me where we handle form submissions"
- "Locate animation initialization logic"
Understanding patterns
- "How do we use Motion.dev library?"
- "Find all modal implementations"
- "Show me cookie consent patterns"
Discovering cross-file relationships
- "How does navigation interact with page transitions?"
- "What code depends on the video player?"
- "Find related components across files"
Code discovery tasks for CLI AI agents
- Any task requiring intent-based code search
- When grep/glob don't provide enough context
- When you know what code does, not where it is

When NOT to Use

Use different tools instead:

Known exact file paths → Use Read tool

❌ search_codebase("Find hero_video.js content")
✅ Read("src/hero/hero_video.js")

Specific symbol searches → Use Grep tool

❌ search_codebase("Find all calls to initVideoPlayer")
✅ Grep("initVideoPlayer", output_mode="content")

Simple keyword searches → Use Grep tool

❌ search_codebase("Find all TODO comments")
✅ Grep("TODO:", output_mode="content")

File structure exploration → Use Glob tool

❌ search_codebase("Show me all JavaScript files")
✅ Glob("**/*.js")

IDE integrations → NOT SUPPORTED
- This skill is ONLY for CLI AI agents
- IDE autocomplete (GitHub Copilot in VS Code) uses different systems
- IDE-embedded chat (no MCP support as of 2025)

Activation Triggers

Activate this skill when user asks:

"Find code that handles [feature/behavior]"
"Where do we implement [functionality]?"
"Show me how [feature] works"
"How do we handle [behavior]?"
"What code [performs action]?"
"Find [pattern] implementation"
"Show me [component/module] code"

Do NOT activate for:

Known file paths
Exact symbol/function name searches
File pattern matching requests
IDE autocomplete questions

2. 🗂️ REFERENCES

Core Framework

Document	Purpose	Key Insight
MCP Semantic Search - Intent-Based Code Discovery	Enable CLI AI agents to search codebases by intent using natural language queries	Finds code by what it does, not what it's called

Bundled Resources

Document	Purpose	Key Insight
references/tool_comparison.md	Decision framework for semantic search vs grep vs glob	When to use each tool based on knowledge and intent
references/architecture.md	System architecture and data flow	Two-component system: Indexer + MCP Server + Vector DB
references/query_patterns.md	Effective query writing guide	Describe behavior in natural language for best results
assets/query_examples.md	Categorized example queries	9 categories of real-world query patterns

Smart Routing Diagram

User Request
    ↓
[Detect Intent]
    ↓
Know exact file path? ──── YES ──→ [Use Read tool] ──→ DONE
    │
    NO
    ↓
Know what code does? ──── YES ──→ [Use search_codebase] ──→ Parse Results
    │                                     ↓
    NO                                Return ranked code snippets
    ↓                                     ↓
Know exact symbol? ──── YES ──→ [Use Grep tool] ──→ DONE
    │
    NO
    ↓
Exploring file structure? ──── YES ──→ [Use Glob tool] ──→ DONE
    │
    NO
    ↓
[Default: Use search_codebase]
    ↓
COMPLETE

3. 🛠️ HOW IT WORKS

Tool Overview

Three semantic search MCP tools available:

search_codebase - Search current project semantically
- Primary tool for code discovery
- Finds code by intent and behavior
- Returns ranked code snippets with file paths
search_commits - Search git commit history
- Understanding why code was changed
- Finding when features were added
- Locating bug fixes
search_other_workspace - Search other indexed projects
- Finding similar patterns in other codebases
- Reusing code from other projects
- Cross-project comparisons

Basic Usage Pattern

Query structure - describe what code does:

// Good: Natural language, behavior-focused
search_codebase("Find code that validates email addresses in contact forms")

// Good: Question format
search_codebase("How do we handle page transitions?")

// Good: Feature discovery
search_codebase("Find cookie consent implementation")

// Bad: Grep syntax
search_codebase("grep validateEmail")  // ❌ Use grep tool instead

// Bad: Known file path
search_codebase("Show me hero_video.js")  // ❌ Use Read tool instead

Example 1: Feature Discovery

Goal: Find email validation logic

// Step 1: Use semantic search
search_codebase("Find code that validates email addresses in contact forms")

// Expected results:
// - src/form/form_validation.js (ranked #1)
// - src/utils/email_validator.js (ranked #2)
// - Code snippets with validation logic

// Step 2: Read full context
Read("src/form/form_validation.js")

// Step 3: Analyze and make changes
Edit(...) or Write(...)

Why it works: Query describes behavior (validates email), context (contact forms), allowing semantic search to find relevant code.

Example 2: Understanding Relationships

Goal: Find what code depends on video player

// Use relationship query
search_codebase("What code depends on the video player?")

// Expected results:
// - src/components/hero_section.js (uses video player)
// - src/animations/hero_animations.js (triggers on video events)
// - Code snippets showing imports and usage

// Follow up: Read specific files
Read("src/components/hero_section.js")

Why it works: Semantic search understands dependencies and can find related code across files.

Query Best Practices

Do:

✅ Use natural language
✅ Describe what code does (behavior)
✅ Add context ("in forms", "for video player")
✅ Ask about relationships ("What code depends on...")
✅ Be specific about intent

Don't:

❌ Use grep/find syntax
❌ Search for exact symbols (use Grep instead)
❌ Request known file paths (use Read instead)
❌ Be too generic ("Find code")

For more query patterns, see: query_patterns.md

Trust the Judge Model

Results are reranked for relevance:

Top results are usually most relevant
Judge model (voyage-3) understands intent
If results seem off, rephrase query more specifically
Add context: "in [component]" or "for [feature]"

4. 📋 RULES

✅ ALWAYS Rules

ALWAYS use for intent-based discovery
- When you know what code does, not where it is
- Exploring unfamiliar codebase areas
- Understanding feature implementations
ALWAYS use natural language
- Describe behavior in conversational tone
- "Find code that validates email addresses"
- NOT grep syntax or code symbols
ALWAYS provide context in queries
- Include "in [component]" or "for [feature]"
- Improves result relevance significantly
- "Find validation in contact forms" beats "Find validation"
ALWAYS combine with Read tool
- Semantic search discovers files
- Read tool provides full context
- Workflow: search_codebase → Read → Edit
ALWAYS check for MCP availability
- This skill requires MCP access
- Only works for CLI AI agents
- Verify semantic-search MCP server is running

❌ NEVER Rules

NEVER use for known file paths
- If you know the path, use Read tool
- Faster, no API latency
- Example: Read("src/hero/hero_video.js")
NEVER use for exact symbol searches
- If you know the symbol name, use Grep
- More precise for literal text matching
- Example: Grep("initVideoPlayer", output_mode="content")
NEVER use grep/find syntax
- Semantic search uses natural language
- NOT command-line syntax
- "Find code that..." NOT "grep pattern"
NEVER skip validation of MCP access
- Verify you have MCP support
- Only CLI AI agents can use this
- IDE integrations use different systems
NEVER use for file structure exploration
- Use Glob for file pattern matching
- Glob is faster for file navigation
- Example: Glob("**/*.js")

⚠️ ESCALATE IF

ESCALATE IF MCP server unavailable
- Inform user of missing dependency
- Suggest fallback to Grep/Glob tools
- Provide setup guide reference
ESCALATE IF results consistently irrelevant
- After 2-3 query rephrases still not relevant
- May indicate indexing issue
- Ask user to verify codebase is indexed
ESCALATE IF uncertain about tool selection
- If confidence < 80% on semantic vs grep vs glob
- Ask user for clarification
- Provide tool comparison context
ESCALATE IF IDE integration requested
- This skill does NOT work with IDE autocomplete
- Clarify scope: CLI AI agents only
- Explain system separation

5. 🎓 SUCCESS CRITERIA

Task complete when:

✅ Found relevant code by intent/behavior
✅ Used correct tool (semantic vs grep vs glob)
✅ Provided natural language query (not grep syntax)
✅ Combined with Read tool for full context
✅ Avoided using semantic search for known paths
✅ Added context to query when needed ("in forms", "for feature")
✅ Trusted judge model reranking (top results checked first)

6. 🔗 INTEGRATION POINTS

MCP Dependency

Required MCP tools:

search_codebase - Semantic code search
search_commits - Semantic commit history search
search_other_workspace - Cross-project search

MCP server: semantic-search (Python)

Availability: CLI AI agents only (Claude Code AI, GitHub Copilot CLI, Opencode, Kilo CLI)

NOT available: IDE integrations (GitHub Copilot in VS Code/IDEs)

Pairs With

Read tool:

Semantic search discovers files
Read provides full file context
Workflow: search_codebase → Read → Edit

Grep tool:

Semantic search for discovery
Grep for specific symbol usage
Workflow: search_codebase → Grep("symbol")

Glob tool:

Glob for file structure
Semantic search for understanding
Workflow: Glob("**/*.js") → search_codebase("How does [component] work?")

Related Skills

mcp-code-mode:

Use for calling semantic search MCP tools programmatically
Enables complex workflows combining semantic search with other MCP tools
See .claude/skills/mcp-code-mode/SKILL.md for TypeScript execution patterns

External Dependencies

Indexer: codebase-index-cli (Node.js)

Creates vector embeddings from code
Watches files for real-time updates
Stores in .codebase/vectors.db

Vector Database: SQLite (.codebase/vectors.db)

1024-dimensional vectors
Real-time file watching
Project-specific index

Voyage AI API:

voyage-code-3 model (embeddings)
voyage-3 model (judge/reranking)
API key required

Project Indexing Requirements

Must be indexed first:

Run codesql index to create .codebase/vectors.db
Indexer watches files for automatic updates
See specs/025-semantic-search-setup/README.md for setup

Current anobel.com index:

249 files indexed
496 code blocks
Languages: JavaScript, CSS, HTML, Markdown

Scope and Compatibility

✅ Works with (CLI AI agents):

Claude Code AI
GitHub Copilot CLI
Opencode
Kilo CLI
Any MCP-compatible CLI AI agent

❌ Does NOT work with (IDE integrations):

GitHub Copilot in VS Code/IDEs
GitHub Copilot Chat in IDE
Any IDE-embedded autocomplete systems

Reason: Different systems - semantic search is for CLI AI agents helping you via chat, not autocomplete while typing.

7. 📚 QUICK REFERENCE

Common Query Patterns

// Feature discovery
search_codebase("Find cookie consent implementation")

// Behavior search
search_codebase("Find code that validates email addresses")

// Relationship discovery
search_codebase("What code depends on Motion.dev library?")

// Problem solving
search_codebase("How do we prevent duplicate form submissions?")

// Commit history
search_commits("Find commits related to contact form")

Decision Helper

Ask yourself:

Do I know the exact file path?
- YES → Use Read(path)
- NO → Continue
Do I know what the code does?
- YES → Use search_codebase("what it does")
- NO → Continue
Am I searching for exact text/symbol?
- YES → Use Grep("symbol")
- NO → Continue
Am I exploring file structure?
- YES → Use Glob("**/*.js")
- NO → Default to search_codebase()

Tool Selection Quick Reference

Scenario	Use This Tool	Example
Know file path	`Read()`	`Read("src/hero/hero_video.js")`
Find by behavior	`search_codebase()`	`search_codebase("Find video playback code")`
Find function calls	`Grep()`	`Grep("initVideoPlayer", output_mode="content")`
Find all files of type	`Glob()`	`Glob("*/.js")`
Understand feature	`search_codebase()`	`search_codebase("How do we handle forms?")`

Key Principles

Describe behavior, not symbols - "what it does" not "what it's called"
Use natural language - Conversational queries work best
Add context - "in forms", "for feature" improves relevance
Trust the judge - Top results are reranked for relevance
Combine with Read - Discover with search, understand with Read

8. 📖 ADDITIONAL RESOURCES

External Documentation

Indexer repository: https://github.com/dudufcb1/codebase-index-cli
MCP server repository: https://github.com/dudufcb1/semantic-search
Voyage AI documentation: https://docs.voyageai.com/
Setup guide: specs/025-semantic-search-setup/README.md

Architecture Overview

Two-component system:

codebase-index-cli (Node.js) - Creates and maintains vector embeddings
semantic-search MCP (Python) - Provides search tools to CLI AI agents

Data flow:

Indexer watches files → Parses code → Sends to Voyage AI → Stores vectors in SQLite
AI sends query → MCP converts to vector → Searches database → Reranks with judge → Returns results

For detailed architecture, see: architecture.md

Performance Characteristics

Metric	Value	Notes
Search latency	~200-400ms	End-to-end including reranking
Indexed files	249 files	anobel.com project
Code blocks	496 chunks	Semantic units
Vector dimensions	1024	Voyage AI embeddings

Core principle: Semantic search is for understanding, not navigation. Use it to find what you don't know exists. When you know the file path, use Read. When you know the exact text, use Grep. When you know the intent but not the location, use semantic search.

name	mcp-semantic-search
description	Intent-based code discovery for CLI AI agents using semantic search MCP tools. Use when finding code by what it does (not what it's called), exploring unfamiliar areas, or understanding feature implementations. Mandatory for code discovery tasks when you have MCP access.
allowed-tools	["Grep","Read","Glob"]
version	1.0.0