| name | qdrant-vector-search |
| description | High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance. |
| version | 1.0.0 |
| author | Orchestra Research |
| license | MIT |
| tags | ["RAG","Vector Search","Qdrant","Semantic Search","Embeddings","Similarity Search","HNSW","Production","Distributed"] |
| dependencies | ["qdrant-client>=1.12.0"] |
Qdrant - Vector Similarity Search Engine
High-performance vector database written in Rust for production RAG and semantic search.
When to use Qdrant
Use Qdrant when:
- Building production RAG systems requiring low latency
- Need hybrid search (vectors + metadata filtering)
- Require horizontal scaling with sharding/replication
- Want on-premise deployment with full data control
- Need multi-vector storage per record (dense + sparse)
- Building real-time recommendation systems
Key features:
- Rust-powered: Memory-safe, high performance
- Rich filtering: Filter by any payload field during search
- Multiple vectors: Dense, sparse, multi-dense per point
- Quantization: Scalar, product, binary for memory efficiency
- Distributed: Raft consensus, sharding, replication
- REST + gRPC: Both APIs with full feature parity
Use alternatives instead:
- Chroma: Simpler setup, embedded use cases
- FAISS: Maximum raw speed, research/batch processing
- Pinecone: Fully managed, zero ops preferred
- Weaviate: GraphQL preference, built-in vectorizers
Quick start
Installation
pip install qdrant-client
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
Basic usage
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=[0.1, 0.2, ...],
payload={"title": "Doc 1", "category": "tech"}
),
PointStruct(
id=2,
vector=[0.3, 0.4, ...],
payload={"title": "Doc 2", "category": "science"}
)
]
)
results = client.search(
collection_name="documents",
query_vector=[0.15, 0.25, ...],
query_filter={
"must": [{"key": "category", "match": {"value": "tech"}}]
},
limit=10
)
for point in results:
print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
Core concepts
Points - Basic data unit
from qdrant_client.models import PointStruct
point = PointStruct(
id=123,
vector=[0.1, 0.2, 0.3, ...],
payload={
"title": "Document title",
"category": "tech",
"timestamp": 1699900000,
"tags": ["python", "ml"]
}
)
client.upsert(
collection_name="documents",
points=[point1, point2, point3],
wait=True
)
Collections - Vector containers
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=384,
distance=Distance.COSINE
),
hnsw_config=HnswConfigDiff(
m=16,
ef_construct=100,
full_scan_threshold=10000
),
on_disk_payload=True
)
info = client.get_collection("documents")
print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
Distance metrics
| Metric | Use Case | Range |
|---|
COSINE | Text embeddings, normalized vectors | 0 to 2 |
EUCLID | Spatial data, image features | 0 to ∞ |
DOT | Recommendations, unnormalized | -∞ to ∞ |
MANHATTAN | Sparse features, discrete data | 0 to ∞ |
Search operations
Basic search
results = client.search(
collection_name="documents",
query_vector=[0.1, 0.2, ...],
limit=10,
with_payload=True,
with_vectors=False
)
Filtered search
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="tech")),
FieldCondition(key="timestamp", range=Range(gte=1699000000))
],
must_not=[
FieldCondition(key="status", match=MatchValue(value="archived"))
]
),
limit=10
)
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter={
"must": [
{"key": "category", "match": {"value": "tech"}},
{"key": "price", "range": {"gte": 10, "lte": 100}}
]
},
limit=10
)
Batch search
from qdrant_client.models import SearchRequest
results = client.search_batch(
collection_name="documents",
requests=[
SearchRequest(vector=[0.1, ...], limit=5),
SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),
SearchRequest(vector=[0.3, ...], limit=10)
]
)
RAG integration
With sentence-transformers
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
encoder = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
collection_name="knowledge_base",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
documents = [
{"id": 1, "text": "Python is a programming language", "source": "wiki"},
{"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},
]
points = [
PointStruct(
id=doc["id"],
vector=encoder.encode(doc["text"]).tolist(),
payload={"text": doc["text"], "source": doc["source"]}
)
for doc in documents
]
client.upsert(collection_name="knowledge_base", points=points)
def retrieve(query: str, top_k: int = 5) -> list[dict]:
query_vector = encoder.encode(query).tolist()
results = client.search(
collection_name="knowledge_base",
query_vector=query_vector,
limit=top_k
)
return [{"text": r.payload["text"], "score": r.score} for r in results]
context = retrieve("What is Python?")
prompt = f"Context: {context}\n\nQuestion: What is Python?"
With LangChain
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
With LlamaIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
Multi-vector support
Named vectors (different embedding models)
from qdrant_client.models import VectorParams, Distance
client.create_collection(
collection_name="hybrid_search",
vectors_config={
"dense": VectorParams(size=384, distance=Distance.COSINE),
"sparse": VectorParams(size=30000, distance=Distance.DOT)
}
)
client.upsert(
collection_name="hybrid_search",
points=[
PointStruct(
id=1,
vector={
"dense": dense_embedding,
"sparse": sparse_embedding
},
payload={"text": "document text"}
)
]
)
results = client.search(
collection_name="hybrid_search",
query_vector=("dense", query_dense),
limit=10
)
Sparse vectors (BM25, SPLADE)
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
client.create_collection(
collection_name="sparse_search",
vectors_config={},
sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}
)
client.upsert(
collection_name="sparse_search",
points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]
)
Quantization (memory optimization)
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
client.create_collection(
collection_name="quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99,
always_ram=True
)
)
)
results = client.search(
collection_name="quantized",
query_vector=query,
search_params={"quantization": {"rescore": True}},
limit=10
)
Payload indexing
from qdrant_client.models import PayloadSchemaType
client.create_payload_index(
collection_name="documents",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
client.create_payload_index(
collection_name="documents",
field_name="timestamp",
field_schema=PayloadSchemaType.INTEGER
)
Production deployment
Qdrant Cloud
from qdrant_client import QdrantClient
client = QdrantClient(
url="https://your-cluster.cloud.qdrant.io",
api_key="your-api-key"
)
Performance tuning
client.update_collection(
collection_name="documents",
hnsw_config=HnswConfigDiff(ef_construct=200, m=32)
)
client.update_collection(
collection_name="documents",
optimizer_config={"indexing_threshold": 20000}
)
Best practices
- Batch operations - Use batch upsert/search for efficiency
- Payload indexing - Index fields used in filters
- Quantization - Enable for large collections (>1M vectors)
- Sharding - Use for collections >10M vectors
- On-disk storage - Enable
on_disk_payload for large payloads
- Connection pooling - Reuse client instances
Common issues
Slow search with filters:
client.create_payload_index(
collection_name="docs",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
Out of memory:
client.create_collection(
collection_name="large_collection",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(...),
on_disk_payload=True
)
Connection issues:
client = QdrantClient(
host="localhost",
port=6333,
timeout=30,
prefer_grpc=True
)
References
Resources