with one click
pgvector-hybrid-search
// Combine pgvector semantic search with PostgreSQL full-text search using Reciprocal Rank Fusion or cross-encoder re-ranking.
// Combine pgvector semantic search with PostgreSQL full-text search using Reciprocal Rank Fusion or cross-encoder re-ranking.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | pgvector-hybrid-search |
| description | Combine pgvector semantic search with PostgreSQL full-text search using Reciprocal Rank Fusion or cross-encoder re-ranking. |
| tech_stack | ["postgresql"] |
| language | ["python","sql"] |
| capability | ["search-engine","rag"] |
| version | pgvector v0.8.2 |
| collected_at | "2025-07-17T00:00:00.000Z" |
Source: https://github.com/pgvector/pgvector/blob/master/README.md, https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search/
Combine pgvector semantic (vector) search with PostgreSQL full-text keyword search to produce a single ranked result set. Two fusion strategies are supported: Reciprocal Rank Fusion (RRF) in pure SQL, and cross-encoder re-ranking in Python.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text,
embedding vector(384)
);
-- Full-text search index
CREATE INDEX ON documents USING GIN (to_tsvector('english', content));
-- Optional: vector index for ANN (adds speed at cost of some recall)
-- CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
Best when latency matters and you don't need per-pair relevance scoring. The RRF formula is 1 / (k + rank) where k is typically 60.
WITH semantic_search AS (
SELECT id, RANK() OVER (ORDER BY embedding <=> %(embedding)s) AS rank
FROM documents
ORDER BY embedding <=> %(embedding)s
LIMIT 20
),
keyword_search AS (
SELECT id, RANK() OVER (ORDER BY ts_rank_cd(to_tsvector('english', content), query) DESC)
FROM documents, plainto_tsquery('english', %(query)s) query
WHERE to_tsvector('english', content) @@ query
ORDER BY ts_rank_cd(to_tsvector('english', content), query) DESC
LIMIT 20
)
SELECT
COALESCE(s.id, k.id) AS id,
COALESCE(1.0 / (%(k)s + s.rank), 0.0) +
COALESCE(1.0 / (%(k)s + k.rank), 0.0) AS score
FROM semantic_search s
FULL OUTER JOIN keyword_search k ON s.id = k.id
ORDER BY score DESC
LIMIT 5;
From Python with psycopg:
from pgvector.psycopg import register_vector
import psycopg
from sentence_transformers import SentenceTransformer
conn = psycopg.connect(dbname='pgvector_example', autocommit=True)
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
register_vector(conn)
model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
query = 'growling bear'
embedding = model.encode(query)
results = conn.execute(rrf_sql, {
'query': query, 'embedding': embedding, 'k': 60
}).fetchall()
Best when relevance quality is critical and you can afford model inference. Runs semantic and keyword searches in parallel, deduplicates, then scores each (query, document) pair with a cross-encoder.
import asyncio
import itertools
from pgvector.psycopg import register_vector_async
import psycopg
from sentence_transformers import CrossEncoder, SentenceTransformer
async def semantic_search(conn, query):
model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
embedding = model.encode(query)
async with conn.cursor() as cur:
await cur.execute(
'SELECT id, content FROM documents ORDER BY embedding <=> %s LIMIT 20',
(embedding,))
return await cur.fetchall()
async def keyword_search(conn, query):
async with conn.cursor() as cur:
await cur.execute(
"SELECT id, content FROM documents, plainto_tsquery('english', %s) q "
"WHERE to_tsvector('english', content) @@ q "
"ORDER BY ts_rank_cd(to_tsvector('english', content), q) DESC LIMIT 20",
(query,))
return await cur.fetchall()
def rerank(query, results):
results = set(itertools.chain(*results)) # deduplicate by id
encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
scores = encoder.predict([(query, item[1]) for item in results])
return [v for _, v in sorted(zip(scores, results), reverse=True)]
async def hybrid_search(conn, query):
results = await asyncio.gather(
semantic_search(conn, query),
keyword_search(conn, query))
return rerank(query, results)
| Component | Role |
|---|---|
to_tsvector('english', content) | Tokenizes text for full-text search |
plainto_tsquery('english', query) | Converts raw query to tsquery (ignores operators) |
phraseto_tsquery('english', query) | Alternative: preserves word order |
ts_rank_cd(tsvector, tsquery) | Ranks by cover density (preferred for hybrid) |
RANK() OVER (ORDER BY ...) | Produces ordinal ranks for RRF |
FULL OUTER JOIN | Merges semantic + keyword sides in RRF |
CrossEncoder.predict(pairs) | Scores (query, doc) pairs for re-ranking |
k=60 is a heuristic — tune for your data distributionLIMIT 20 — results beyond this window are lost. Increase for higher recall at cost of speedcross-encoder/ms-marco-MiniLM-L-6-v2 is lightweight; production may need larger models (Cohere, Jina, Voyage rerankers)tsvector default to English stemming; other languages need dictionary configurationplainto_tsquery or phraseto_tsquery — literal strings won't match<=>) and ts_rank_cd produce incomparable scores — hence RRF or cross-encoder fusion is requireddocuments table using SQLAlchemy models with a VECTOR column; use raw SQL or text() for the full-text GIN index and RRF queryhnsw.ef_search higher for better recall under the LIMIT