| name | pinecone |
| description | Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure. |
| version | 1.0.0 |
| author | Orchestra Research |
| license | MIT |
| dependencies | ["pinecone-client"] |
| platforms | ["linux","macos","windows"] |
| metadata | {"hermes":{"tags":["RAG","Pinecone","Vector Database","Managed Service","Serverless","Hybrid Search","Production","Auto-Scaling","Low Latency","Recommendations"]}} |
Pinecone - Managed Vector Database
The vector database for production AI applications.
When to use Pinecone
Use when:
- Need managed, serverless vector database
- Production RAG applications
- Auto-scaling required
- Low latency critical (<100ms)
- Don't want to manage infrastructure
- Need hybrid search (dense + sparse vectors)
Metrics:
- Fully managed SaaS
- Auto-scales to billions of vectors
- p95 latency <100ms
- 99.9% uptime SLA
Use alternatives instead:
- Chroma: Self-hosted, open-source
- FAISS: Offline, pure similarity search
- Weaviate: Self-hosted with more features
Quick start
Installation
pip install pinecone-client
Basic usage
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-api-key")
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("my-index")
index.upsert(vectors=[
{"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
{"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
])
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
include_metadata=True
)
print(results["matches"])
Core operations
Create index
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
from pinecone import PodSpec
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=PodSpec(
environment="us-east1-gcp",
pod_type="p1.x1"
)
)
Upsert vectors
index.upsert(vectors=[
{
"id": "doc1",
"values": [0.1, 0.2, ...],
"metadata": {
"text": "Document content",
"category": "tutorial",
"timestamp": "2025-01-01"
}
}
])
vectors = [
{"id": f"vec{i}", "values": embedding, "metadata": metadata}
for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
]
index.upsert(vectors=vectors, batch_size=100)
Query vectors
results = index.query(
vector=[0.1, 0.2, ...],
top_k=10,
include_metadata=True,
include_values=False
)
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
filter={"category": {"$eq": "tutorial"}}
)
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
namespace="production"
)
for match in results["matches"]:
print(f"ID: {match['id']}")
print(f"Score: {match['score']}")
print(f"Metadata: {match['metadata']}")
Metadata filtering
filter = {"category": "tutorial"}
filter = {"price": {"$gte": 100}}
filter = {
"$and": [
{"category": "tutorial"},
{"difficulty": {"$lte": 3}}
]
}
filter = {"tags": {"$in": ["python", "ml"]}}
Namespaces
index.upsert(
vectors=[{"id": "vec1", "values": [...]}],
namespace="user-123"
)
results = index.query(
vector=[...],
namespace="user-123",
top_k=5
)
stats = index.describe_index_stats()
print(stats['namespaces'])
Hybrid search (dense + sparse)
index.upsert(vectors=[
{
"id": "doc1",
"values": [0.1, 0.2, ...],
"sparse_values": {
"indices": [10, 45, 123],
"values": [0.5, 0.3, 0.8]
},
"metadata": {"text": "..."}
}
])
results = index.query(
vector=[0.1, 0.2, ...],
sparse_vector={
"indices": [10, 45],
"values": [0.5, 0.3]
},
top_k=5,
alpha=0.5
)
LangChain integration
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
vectorstore = PineconeVectorStore.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(),
index_name="my-index"
)
results = vectorstore.similarity_search("query", k=5)
results = vectorstore.similarity_search(
"query",
k=5,
filter={"category": "tutorial"}
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
LlamaIndex integration
from llama_index.vector_stores.pinecone import PineconeVectorStore
pc = Pinecone(api_key="your-key")
pinecone_index = pc.Index("my-index")
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
from llama_index.core import StorageContext, VectorStoreIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
Index management
indexes = pc.list_indexes()
index_info = pc.describe_index("my-index")
print(index_info)
stats = index.describe_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")
print(f"Namespaces: {stats['namespaces']}")
pc.delete_index("my-index")
Delete vectors
index.delete(ids=["vec1", "vec2"])
index.delete(filter={"category": "old"})
index.delete(delete_all=True, namespace="test")
index.delete(delete_all=True)
Best practices
- Use serverless - Auto-scaling, cost-effective
- Batch upserts - More efficient (100-200 per batch)
- Add metadata - Enable filtering
- Use namespaces - Isolate data by user/tenant
- Monitor usage - Check Pinecone dashboard
- Optimize filters - Index frequently filtered fields
- Test with free tier - 1 index, 100K vectors free
- Use hybrid search - Better quality
- Set appropriate dimensions - Match embedding model
- Regular backups - Export important data
Performance
| Operation | Latency | Notes |
|---|
| Upsert | ~50-100ms | Per batch |
| Query (p50) | ~50ms | Depends on index size |
| Query (p95) | ~100ms | SLA target |
| Metadata filter | ~+10-20ms | Additional overhead |
Pricing (as of 2025)
Serverless:
- $0.096 per million read units
- $0.06 per million write units
- $0.06 per GB storage/month
Free tier:
- 1 serverless index
- 100K vectors (1536 dimensions)
- Great for prototyping
Resources