en un clic
kortix-semantic-search
// Full semantic search over Desktop files, agent memory, and knowledge. Use when the agent needs to find relevant files or search knowledge semantically.
// Full semantic search over Desktop files, agent memory, and knowledge. Use when the agent needs to find relevant files or search knowledge semantically.
| name | kortix-semantic-search |
| description | Full semantic search over Desktop files, agent memory, and knowledge. Use when the agent needs to find relevant files or search knowledge semantically. |
You have a full semantic search engine running on this machine, powered by lss (Local Semantic Search). It indexes text files, code, PDFs, DOCX, XLSX, PPTX, HTML, EML, JSON, CSV — virtually everything.
A background file-watcher daemon (lss-sync) detects file changes in real time via FSEvents/inotify and re-indexes within seconds.
lss combines BM25 full-text search (keyword matching with custom re-scoring) and embedding similarity (semantic meaning via OpenAI or local fastembed) using Reciprocal Rank Fusion. Queries can be natural language — you don't need exact keywords.
The database lives at /config/.lss/lss.db.
| Source | Path | Content |
|---|---|---|
| Desktop files | /config/Desktop | Code, docs, PDFs, DOCX, XLSX, PPTX, HTML, JSON, CSV — all text-like files recursively |
| Agent memory | /config/workspace/.kortix/ | mem/.md (observations), journal/.md, memory/.md, knowledge/.md |
Indexed formats: ~80 known extensions (code, markup, config, documents). Unknown extensions are skipped. .gitignore patterns are respected.
# Search EVERYTHING indexed
lss "your natural language query" -p /config/Desktop -k 10 --json
# Search only agent memory + knowledge
lss "user deployment preferences" -p /config/workspace/.kortix/ -k 5 --json
# Search a specific project directory
lss "database migration strategy" -p /config/Desktop/myproject/ -k 5 --json
# Search without triggering re-indexing (faster, uses existing index)
lss "query" -p /config/Desktop --no-index -k 10 --json
# Filter by file type
lss "auth logic" -p /config/Desktop -e .py -e .ts -k 10 --json
# Exclude file types
lss "config" -p /config/Desktop -E .json -E .yaml -k 10 --json
# Exclude content patterns
lss "user data" -p /config/Desktop -x '\d{4}-\d{2}-\d{2}' -k 10 --json
# Force re-index a path immediately
lss index /config/Desktop/important-file.md
# List all indexed files
lss ls
# Check index stats and configuration
lss status
Narrow results at query time without re-indexing.
| Flag | Meaning | Applied | Example |
|---|---|---|---|
-e EXT / --ext | Include only these extensions (repeatable) | In SQL, pre-scoring | -e .py -e .ts |
-E EXT / --exclude-ext | Exclude these extensions (repeatable) | In SQL, pre-scoring | -E .json -E .yaml |
-x REGEX / --exclude-pattern | Exclude chunks matching regex (repeatable) | Post-scoring | -x 'test_' -x 'TODO' |
Narrow first, then broaden. Start with tight extension filters, then remove them if results are insufficient:
# 1. Try narrow: only Python source files
lss "authentication flow" -p /config/Desktop/project -e .py -k 10 --json
# 2. If too few results, broaden to all code
lss "authentication flow" -p /config/Desktop/project -e .py -e .ts -e .go -e .rs -k 10 --json
# 3. If still insufficient, search everything
lss "authentication flow" -p /config/Desktop/project -k 10 --json
Exclude noise, not signal:
# Exclude generated/test code when looking for implementations
lss "rate limiting" -e .py -x "test_" -x "mock_" -x "fixture" --json -k 10
# Exclude logs and config when looking for code
lss "database connection" -E .log -E .yaml -E .json -E .toml --json -k 10
# Exclude dates/timestamps from data searches
lss "customer report" -x '\d{4}-\d{2}-\d{2}' --json -k 10
Always use --json for programmatic parsing.
lss "query" -p /config/Desktop --json -k 10
Returns an array of result arrays (one per query):
[
{
"query": "authentication flow",
"hits": [
{
"file_path": "/config/Desktop/project/auth.py",
"score": 0.0345,
"snippet": "def authenticate(user, password):\n \"\"\"Authenticate user with JWT...",
"rank_stage": "S3_MMR",
"indexed_at": 1738900000.0
}
]
}
]
Key fields:
file_path — Full path to the source filescore — Relevance score (higher is better)snippet — Best-matching text excerpt (~280 chars)rank_stage — S1=BM25 only, S3=fusion, S3_MMR=fusion+diversityFor complex questions, decompose into multiple specific queries:
# BAD: single vague query
lss "how does the system work" --json -k 10
# GOOD: decomposed into specific queries
lss "system architecture overview" "API endpoint design" "database schema" "authentication flow" --json -k 5
Or use a query file:
echo "system architecture overview" > /tmp/queries.txt
echo "API endpoint design" >> /tmp/queries.txt
echo "database schema" >> /tmp/queries.txt
lss -Q /tmp/queries.txt -p /config/Desktop/project --json -k 5
| Use lss | Use grep |
|---|---|
| Conceptual queries ("how to handle errors") | Exact string ("ERROR_CODE_429") |
| Fuzzy matching ("something like the email template") | Specific variable name (userSessionToken) |
| Cross-file discovery ("files about API design") | Known file + line search |
| Natural language ("what's the deploy process") | Regex pattern matching |
For large codebases, use an iterative approach:
# Step 1: Broad discovery — find relevant areas
lss "payment processing" -p /config/Desktop/project -k 20 --json
# Step 2: Narrow by extension — focus on implementation
lss "payment processing" -p /config/Desktop/project -e .py -k 10 --json
# Step 3: Narrow by path — focus on specific module
lss "payment processing" -p /config/Desktop/project/src/payments/ -k 10 --json
# Step 4: Read the actual files for full context
cat /config/Desktop/project/src/payments/processor.py
# Index a directory (auto-triggered on first search)
lss index /config/Desktop/project/
# Index a single file
lss index /config/Desktop/important.pdf
# The daemon handles real-time updates automatically
# Use manual indexing only for immediate needs
# Sweep stale entries (files deleted from disk)
lss sweep --retention-days 90
# Clear all embeddings (forces re-embedding on next search)
lss sweep --clear-embeddings 0
# Full reset
lss sweep --clear-all
--json flag when searching programmatically. Parse the JSON output.-p <path> to scope searches. Never search without a path scope.-k to control result count. -k 5 for focused, -k 20 for broad exploration.-e to target file types before broadening.lss index for immediate needs. The daemon handles the rest.