ワンクリックで
gemini-embeddings
Generate text embeddings for semantic search, RAG, and vector database integration
メニュー
Generate text embeddings for semantic search, RAG, and vector database integration
Production patterns, API key security, cost optimization, performance tuning, and monitoring
Reduce costs and latency with context caching - implicit and explicit cache management with TTL configuration
Execute Python code in Gemini's secure sandbox for data analysis, visualization, and file processing
Implement robust error handling with retry logic, rate limiting, and circuit breaker patterns
Implement tool use with Gemini - function declarations, tool modes, parallel/compositional calling, and MCP integration
Implement Google Search grounding for real-time information with citation parsing and attribution handling
| name | gemini-embeddings |
| description | Generate text embeddings for semantic search, RAG, and vector database integration |
| argument-hint | <embedding use case or vector DB> |
| allowed-tools | Read, Write, Bash(pip install, npm install, go get) |
Generate text embeddings for semantic search and RAG: $ARGUMENTS
You are a Gemini API specialist with expertise in:
from google import genai
client = genai.Client()
# Generate embedding for a single text
result = client.models.embed_content(
model="gemini-embedding-001",
contents="What is the meaning of life?"
)
embedding = result.embeddings[0].values
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY });
const result = await ai.models.embedContent({
model: "gemini-embedding-001",
contents: "What is the meaning of life?"
});
const embedding = result.embeddings[0].values;
console.log(`Embedding dimension: ${embedding.length}`);
console.log(`First 5 values: ${embedding.slice(0, 5)}`);
| Model | Default | Available Dimensions |
|---|---|---|
gemini-embedding-001 | 3072 | 768, 1536, 3072 |
from google.genai.types import EmbedContentConfig
# Use smaller dimension for efficiency
result = client.models.embed_content(
model="gemini-embedding-001",
contents="Your text here",
config=EmbedContentConfig(
output_dimensionality=768 # 768, 1536, or 3072
)
)
embedding = result.embeddings[0].values
print(f"Dimension: {len(embedding)}") # 768
Optimize embeddings for specific use cases:
| Task Type | Use Case |
|---|---|
RETRIEVAL_DOCUMENT | Documents to be retrieved |
RETRIEVAL_QUERY | Search queries |
SEMANTIC_SIMILARITY | Measuring text similarity |
CLASSIFICATION | Text classification input |
CLUSTERING | Text clustering input |
QUESTION_ANSWERING | Q&A systems |
FACT_VERIFICATION | Fact checking |
CODE_RETRIEVAL_QUERY | Code search queries |
from google.genai.types import EmbedContentConfig, TaskType
# For documents to be indexed
doc_embedding = client.models.embed_content(
model="gemini-embedding-001",
contents="This is a document about machine learning...",
config=EmbedContentConfig(
task_type=TaskType.RETRIEVAL_DOCUMENT
)
)
# For search queries
query_embedding = client.models.embed_content(
model="gemini-embedding-001",
contents="How does machine learning work?",
config=EmbedContentConfig(
task_type=TaskType.RETRIEVAL_QUERY
)
)
Process multiple texts efficiently:
texts = [
"First document about AI",
"Second document about ML",
"Third document about NLP",
"Fourth document about computer vision"
]
# Batch embed
result = client.models.embed_content(
model="gemini-embedding-001",
contents=texts,
config=EmbedContentConfig(
task_type=TaskType.RETRIEVAL_DOCUMENT,
output_dimensionality=1536
)
)
embeddings = [emb.values for emb in result.embeddings]
print(f"Generated {len(embeddings)} embeddings")
For high-volume processing, use the Batch API:
# Create batch job
batch_job = client.batches.create(
model="gemini-embedding-001",
requests=[
{"contents": text, "config": {"task_type": "RETRIEVAL_DOCUMENT"}}
for text in large_text_list
]
)
# Check status
while batch_job.state == "PROCESSING":
time.sleep(60)
batch_job = client.batches.get(name=batch_job.name)
# Get results
results = client.batches.get_results(name=batch_job.name)
Gemini embeddings are normalized by default. For custom normalization:
import numpy as np
def normalize_l2(embedding):
"""L2 normalize an embedding vector."""
norm = np.linalg.norm(embedding)
return embedding / norm if norm > 0 else embedding
def cosine_similarity(a, b):
"""Calculate cosine similarity between two vectors."""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Calculate similarity
embedding1 = np.array(result1.embeddings[0].values)
embedding2 = np.array(result2.embeddings[0].values)
similarity = cosine_similarity(embedding1, embedding2)
print(f"Similarity: {similarity:.4f}")
import pinecone
from google import genai
# Initialize
client = genai.Client()
pinecone.init(api_key="your-pinecone-key", environment="us-east1")
index = pinecone.Index("your-index")
# Index documents
def index_documents(documents):
for i, doc in enumerate(documents):
result = client.models.embed_content(
model="gemini-embedding-001",
contents=doc["text"],
config=EmbedContentConfig(task_type=TaskType.RETRIEVAL_DOCUMENT)
)
index.upsert([
(f"doc-{i}", result.embeddings[0].values, {"text": doc["text"]})
])
# Search
def search(query, top_k=5):
result = client.models.embed_content(
model="gemini-embedding-001",
contents=query,
config=EmbedContentConfig(task_type=TaskType.RETRIEVAL_QUERY)
)
return index.query(result.embeddings[0].values, top_k=top_k)
import chromadb
from google import genai
client = genai.Client()
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("documents")
def get_embedding(text, task_type=TaskType.RETRIEVAL_DOCUMENT):
result = client.models.embed_content(
model="gemini-embedding-001",
contents=text,
config=EmbedContentConfig(task_type=task_type)
)
return result.embeddings[0].values
# Add documents
collection.add(
documents=["Document 1 text", "Document 2 text"],
embeddings=[
get_embedding("Document 1 text"),
get_embedding("Document 2 text")
],
ids=["doc1", "doc2"]
)
# Query
query_embedding = get_embedding("Search query", TaskType.RETRIEVAL_QUERY)
results = collection.query(query_embeddings=[query_embedding], n_results=5)
from supabase import create_client
from google import genai
client = genai.Client()
supabase = create_client("your-url", "your-key")
def embed_and_store(text, metadata):
result = client.models.embed_content(
model="gemini-embedding-001",
contents=text,
config=EmbedContentConfig(
task_type=TaskType.RETRIEVAL_DOCUMENT,
output_dimensionality=1536
)
)
supabase.table("documents").insert({
"content": text,
"embedding": result.embeddings[0].values,
"metadata": metadata
}).execute()
def search(query, limit=5):
result = client.models.embed_content(
model="gemini-embedding-001",
contents=query,
config=EmbedContentConfig(task_type=TaskType.RETRIEVAL_QUERY)
)
return supabase.rpc("match_documents", {
"query_embedding": result.embeddings[0].values,
"match_count": limit
}).execute()
Complete RAG pipeline with Gemini:
from google import genai
from google.genai.types import EmbedContentConfig, TaskType, GenerateContentConfig
client = genai.Client()
class RAGSystem:
def __init__(self, documents):
self.documents = documents
self.embeddings = self._embed_documents()
def _embed_documents(self):
"""Embed all documents for retrieval."""
embeddings = []
for doc in self.documents:
result = client.models.embed_content(
model="gemini-embedding-001",
contents=doc,
config=EmbedContentConfig(task_type=TaskType.RETRIEVAL_DOCUMENT)
)
embeddings.append(result.embeddings[0].values)
return embeddings
def _cosine_similarity(self, a, b):
import numpy as np
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def retrieve(self, query, top_k=3):
"""Retrieve relevant documents for a query."""
query_result = client.models.embed_content(
model="gemini-embedding-001",
contents=query,
config=EmbedContentConfig(task_type=TaskType.RETRIEVAL_QUERY)
)
query_embedding = query_result.embeddings[0].values
# Calculate similarities
similarities = [
(i, self._cosine_similarity(query_embedding, emb))
for i, emb in enumerate(self.embeddings)
]
similarities.sort(key=lambda x: x[1], reverse=True)
return [self.documents[i] for i, _ in similarities[:top_k]]
def query(self, question, top_k=3):
"""Answer a question using RAG."""
relevant_docs = self.retrieve(question, top_k)
context = "\n\n".join([
f"Document {i+1}:\n{doc}"
for i, doc in enumerate(relevant_docs)
])
prompt = f"""Based on the following documents, answer the question.
{context}
Question: {question}
Answer:"""
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=prompt
)
return {
"answer": response.text,
"sources": relevant_docs
}
# Usage
documents = [
"Document about machine learning...",
"Document about neural networks...",
"Document about data science..."
]
rag = RAGSystem(documents)
result = rag.query("What is machine learning?")
print(result["answer"])
| Dimension | Use Case | Trade-off |
|---|---|---|
| 768 | High volume, simple similarity | Lower accuracy |
| 1536 | Balanced performance | Good trade-off |
| 3072 | Maximum accuracy needed | Higher storage/compute |
RETRIEVAL_DOCUMENT for indexing, RETRIEVAL_QUERY for search| Pattern | Description |
|---|---|
| Semantic search | Query -> Embed -> Vector search -> Results |
| RAG | Query -> Retrieve -> Generate with context |
| Document clustering | Embed all -> Cluster -> Label clusters |
| Deduplication | Embed -> Find high similarity pairs -> Dedupe |
| Classification | Embed -> Train classifier -> Predict |
For: $ARGUMENTS
Provide: