// Expert guidance on using the claude-context MCP for semantic code search. Provides best practices for indexing large codebases, formulating effective search queries, optimizing performance, and integrating vector-based code retrieval into investigation workflows. Use when working with large codebases, optimizing token usage, or when grep/ripgrep searches are insufficient.
| name | semantic-code-search |
| description | Expert guidance on using the claude-context MCP for semantic code search. Provides best practices for indexing large codebases, formulating effective search queries, optimizing performance, and integrating vector-based code retrieval into investigation workflows. Use when working with large codebases, optimizing token usage, or when grep/ripgrep searches are insufficient. |
| allowed-tools | Task |
This Skill provides comprehensive guidance on leveraging the claude-context MCP server for efficient, semantic code search across large codebases using hybrid vector retrieval (BM25 + dense embeddings).
Claude should invoke this Skill when:
Use Claude-Context When:
โ Codebase is large (10k+ lines) โ Need to find functionality by concept ("authentication logic", "payment processing") โ Working with unfamiliar codebase โ Token budget is limited โ Need to search across multiple languages/frameworks โ grep returns hundreds of matches and you need the most relevant ones โ Investigation requires understanding semantic relationships
DON'T Use Claude-Context When:
โ Searching for exact string matches (use grep/ripgrep instead) โ Codebase is small (<5k lines) - overhead not worth it โ Looking for specific file names (use find/glob instead) โ Searching within 2-3 known files (use Read tool instead) โ Need regex pattern matching (use grep/ripgrep instead) โ Time-sensitive quick lookup (indexing takes time)
Standard Indexing (Recommended):
mcp__claude-context__index_codebase with:
{
path: "/absolute/path/to/project",
splitter: "ast", // Syntax-aware with automatic fallback
force: false // Don't re-index if already indexed
}
Why AST Splitter?
When to Use LangChain Splitter:
mcp__claude-context__index_codebase with:
{
path: "/absolute/path/to/project",
splitter: "langchain", // Character-based splitting
force: false
}
Use LangChain when:
Include Additional Extensions:
mcp__claude-context__index_codebase with:
{
path: "/absolute/path/to/project",
splitter: "ast",
customExtensions: [".vue", ".svelte", ".astro", ".prisma", ".proto"]
}
Common Custom Extensions by Framework:
[".vue"][".svelte"][".astro"][".prisma"][".graphql", ".gql"][".proto"][".tf", ".tfvars"]Default Ignored (Automatic):
node_modules/, dist/, build/, .git/vendor/, target/, __pycache__/Add Custom Ignore Patterns:
mcp__claude-context__index_codebase with:
{
path: "/absolute/path/to/project",
splitter: "ast",
ignorePatterns: [
"generated/**", // Generated code
"*.min.js", // Minified files
"*.bundle.js", // Bundled files
"test-data/**", // Large test fixtures
"docs/api/**", // Auto-generated docs
".storybook/**", // Storybook config
"*.lock", // Lock files
"static/vendor/**" // Third-party static files
]
}
When to Use ignorePatterns:
โ ๏ธ IMPORTANT: Only use ignorePatterns when user explicitly requests custom filtering. Don't add it by default.
When to Force Re-Index:
mcp__claude-context__index_codebase with:
{
path: "/absolute/path/to/project",
splitter: "ast",
force: true // โ ๏ธ Overwrites existing index
}
Use force: true when:
Conflict Handling: If indexing is attempted on an already indexed path, ALWAYS:
force: true if user confirmsCheck Status:
mcp__claude-context__get_indexing_status with:
{
path: "/absolute/path/to/project"
}
Status Indicators:
Indexing... (45%) - Still processingIndexed: 1,234 chunks from 567 files - CompleteNot indexed - Never indexed or clearedBest Practice: For large codebases (100k+ lines), check status every 30 seconds to provide user updates.
Concept-Based Queries (Best for Claude-Context):
// โ
GOOD - Semantic concepts
search_code with query: "user authentication login flow with JWT tokens"
search_code with query: "database connection pooling initialization"
search_code with query: "error handling middleware for HTTP requests"
search_code with query: "WebSocket connection establishment and message handling"
search_code with query: "payment processing with Stripe integration"
Why These Work:
Keyword Queries (Better for grep):
// โ ๏ธ OKAY - Works but not optimal
search_code with query: "authenticateUser function"
search_code with query: "UserRepository class"
Why Less Optimal:
Avoid These:
// โ BAD - Too generic
search_code with query: "user"
search_code with query: "function"
// โ BAD - Too specific/technical
search_code with query: "express.Router().post('/api/users')"
search_code with query: "class UserService extends BaseService implements IUserService"
// โ BAD - Regex patterns (use grep instead)
search_code with query: "func.*Handler|HandlerFunc"
Finding Authentication/Authorization:
"user login authentication with password validation and session creation"
"JWT token generation and validation middleware"
"OAuth2 authentication flow with Google provider"
"role-based access control permission checking"
"API key authentication verification"
Finding Database Operations:
"user data persistence save to database"
"SQL query execution with prepared statements"
"MongoDB collection find and update operations"
"database transaction commit and rollback handling"
"ORM model definition for user entity"
Finding API Endpoints:
"HTTP POST endpoint for creating new users"
"GraphQL resolver for user queries and mutations"
"REST API handler for updating user profile"
"WebSocket event handler for chat messages"
Finding Business Logic:
"shopping cart calculation with tax and discounts"
"email notification sending after user registration"
"file upload processing with virus scanning"
"report generation with PDF export"
Finding Configuration:
"environment variable configuration loading"
"database connection string setup"
"API rate limiting configuration"
"CORS policy definition for cross-origin requests"
Finding Error Handling:
"global error handler for uncaught exceptions"
"validation error formatting for API responses"
"retry logic for failed HTTP requests"
"logging critical errors to monitoring service"
Filter by File Type:
// Only search TypeScript files
search_code with:
{
path: "/absolute/path/to/project",
query: "user authentication",
extensionFilter: [".ts", ".tsx"]
}
// Only search Go files
search_code with:
{
path: "/absolute/path/to/project",
query: "HTTP handler implementation",
extensionFilter: [".go"]
}
// Search configs only
search_code with:
{
path: "/absolute/path/to/project",
query: "database connection settings",
extensionFilter: [".json", ".yaml", ".env"]
}
When to Use Extension Filters:
Default Limit:
search_code with:
{
path: "/absolute/path/to/project",
query: "authentication logic",
limit: 10 // Default: 10 results
}
Adjust Based on Use Case:
// Quick overview - fewest results
limit: 5
// Standard investigation - balanced
limit: 10 // Recommended default
// Comprehensive search - more results
limit: 20
// Exhaustive - find everything
limit: 50 // Maximum allowed
Guideline:
Technique 1: Targeted Searches vs Full Directory Reads
// โ WASTEFUL - Loads entire directory into context
Read with path: "/project/src/**/*.ts"
// โ
EFFICIENT - Returns only relevant snippets
search_code with:
{
query: "user authentication flow",
extensionFilter: [".ts"],
limit: 10
}
Token Savings:
Technique 2: Iterative Refinement
// First search - broad
search_code with query: "user authentication"
// Returns 10 results, review them
// Second search - refined based on findings
search_code with query: "JWT token generation in authentication service"
// Returns more specific results
Why This Works:
Technique 3: Combine with Targeted Reads
// 1. Semantic search to find relevant files
search_code with query: "payment processing logic"
// Returns: src/services/paymentService.ts:45-89
// 2. Read specific file for full context
Read with path: "/project/src/services/paymentService.ts"
Workflow:
Optimize Indexing Time:
Index Once, Search Many
Use Appropriate Splitter
Strategic Ignore Patterns
Incremental Approach
src/, lib/, api/ separatelyIndexing Time Expectations:
| Codebase Size | Splitter | Expected Time |
|---|---|---|
| 10k lines | AST | 30-60 sec |
| 50k lines | AST | 2-5 min |
| 100k lines | AST | 5-10 min |
| 500k lines | AST | 20-30 min |
| 10k lines | LangChain | 15-30 sec |
| 100k lines | LangChain | 2-4 min |
Recommended Workflow:
# When user asks: "How does authentication work?"
## Step 1: Index (if not already indexed)
mcp__claude-context__index_codebase
## Step 2: Semantic Search
search_code with query: "user authentication login flow"
search_code with query: "password validation and hashing"
search_code with query: "session token generation and storage"
## Step 3: Launch Codebase-Detective
Task tool with subagent_type: "code-analysis:detective"
Provide detective with:
- Search results (file locations)
- User's question
- Specific files to investigate
## Step 4: Deep Dive
Detective uses semantic search results as starting points
Reads specific files
Traces code flow
Provides comprehensive analysis
Why This Workflow?
For Complex Investigations:
// 1. Semantic search for general area
search_code with query: "HTTP request middleware authentication"
// Results: 10 files in middleware/
// 2. Grep for specific patterns in those files
Grep with pattern: "req\.user|req\.auth" in middleware/
// 3. Read exact implementations
Read specific files identified above
When to Use This Pattern:
Problem: "Indexing stuck at 0%"
Solutions:
Problem: "Indexing failed halfway through"
Solutions:
clear_indexforce: trueProblem: "Already indexed but want to update"
Solution:
force: true if confirmedProblem: "Search returns irrelevant results"
Solutions:
Problem: "Search misses relevant code"
Solutions:
Problem: "Too many results, all seem relevant"
Solutions:
Problem: "Indexing takes too long"
Solutions:
Problem: "Search is slow"
Solutions:
Problem: "Using too many tokens"
Solutions:
User: "I'm new to this project, help me understand the architecture"
## Workflow:
1. Index the codebase
mcp__claude-context__index_codebase with path: "/project"
2. Search for entry points
search_code with query: "application startup initialization main function"
3. Search for architecture patterns
search_code with query: "dependency injection container service registration"
search_code with query: "routing configuration API endpoint definitions"
search_code with query: "database connection setup and migrations"
4. Search for domain models
search_code with query: "core business entities data models"
5. Launch codebase-detective with findings
Task tool with all search results as context
6. Provide architecture overview to user
User: "Users can't reset their passwords, investigate"
## Workflow:
1. Ensure codebase is indexed
get_indexing_status with path: "/project"
2. Search for password reset functionality
search_code with query: "password reset request token generation email"
search_code with query: "password reset verification token validation"
search_code with query: "update user password after reset"
3. Find related error handling
search_code with query: "password reset error handling validation"
4. Narrow down to specific files
extensionFilter: [".ts", ".tsx"] to focus on TypeScript
5. Read specific implementations
Read files identified in search
6. Identify bug and propose fix
7. Search for tests
search_code with query: "password reset test cases" to find where to add tests
User: "Add two-factor authentication to login"
## Workflow:
1. Index codebase (if needed)
2. Find existing authentication
search_code with query: "user login authentication password verification"
3. Find similar security features
search_code with query: "token generation validation security verification"
4. Find where to integrate
search_code with query: "login flow user session creation after authentication"
5. Find database models
search_code with query: "user model schema database table"
6. Find configuration patterns
search_code with query: "feature flags configuration settings"
7. Launch codebase-detective with context
Provide all search results to guide implementation
8. Implement 2FA based on existing patterns
User: "Audit the codebase for security issues"
## Workflow:
1. Index entire codebase
2. Search for authentication weaknesses
search_code with query: "password storage hashing bcrypt authentication"
search_code with query: "SQL query construction user input database"
3. Search for authorization issues
search_code with query: "access control permission checking authorization"
search_code with query: "API endpoint authentication middleware protection"
4. Search for input validation
search_code with query: "user input validation sanitization XSS prevention"
search_code with query: "file upload handling validation security"
5. Search for sensitive data handling
search_code with query: "environment variables secrets API keys configuration"
search_code with query: "logging sensitive data personal information"
6. Launch codebase-detective for deep analysis
Investigate each suspicious finding
7. Generate security report
User: "Plan migration from Express to Fastify"
## Workflow:
1. Index codebase
2. Find all Express usage
search_code with query: "Express router middleware application setup"
search_code with extensionFilter: [".ts", ".js"], limit: 50
3. Find route definitions
search_code with query: "HTTP route handlers GET POST PUT DELETE endpoints"
4. Find middleware usage
search_code with query: "middleware authentication error handling CORS"
5. Find specific Express features
search_code with query: "express static file serving"
search_code with query: "express session management"
search_code with query: "express body parser request parsing"
6. Document all findings
Create migration checklist with file locations
7. Estimate effort
Count occurrences, identify complex migrations
โ DO:
โ DON'T:
โ DO:
โ DON'T:
โ DO:
โ DON'T:
โ DO:
โ DON'T:
This Skill works seamlessly with:
Codebase-Detective Agent (plugins/code-analysis/agents/codebase-detective.md)
Deep Analysis Skill (plugins/code-analysis/skills/deep-analysis/SKILL.md)
Analyze Command (plugins/code-analysis/commands/analyze.md)
This Skill is successful when:
Before completing a semantic search workflow, ensure:
Maintained by: Jack Rudenko @ MadAppGang Plugin: code-analysis v1.0.0 Last Updated: November 5, 2024