بنقرة واحدة
chat-memory
// Use when user needs long-term memory for chatbots. Triggers on: chat memory, conversation history, long-term memory, chatbot memory, memory retrieval, persistent memory, remember conversations.
// Use when user needs long-term memory for chatbots. Triggers on: chat memory, conversation history, long-term memory, chatbot memory, memory retrieval, persistent memory, remember conversations.
| name | chat-memory |
| description | Use when user needs long-term memory for chatbots. Triggers on: chat memory, conversation history, long-term memory, chatbot memory, memory retrieval, persistent memory, remember conversations. |
Implement long-term memory for chatbots — remember and retrieve relevant conversation history across sessions.
Activate this skill when:
Do NOT activate when:
rag-toolkit:ragrec-system"What should the chatbot remember?"
A) User preferences (favorite topics, communication style)
B) Conversation context (discussed topics, mentioned names)
C) Factual information (user-provided facts)
D) All of the above
Which types matter most?
"How long should memories last?"
| Type | Retention | Decay |
|---|---|---|
| Preferences | Permanent | None |
| Recent context | Days-weeks | 5% per day |
| Old context | Compressed | Summarized |
"Based on your requirements:
Proceed? (yes / adjust [what])"
Think of chat memory like human memory:
┌─────────────────────────────────────────────────────────┐
│ Chat Memory System │
│ │
│ User Message: "How's my Python project going?" │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Recent │ │ Memory │ │ Memory │ │
│ │ Context │ │ Retrieval │ │ Retrieval │ │
│ │(session)│ │ (semantic) │ │ (keyword) │ │
│ └────┬────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Combined Context │ │
│ │ │ │
│ │ Recent: "We discussed Python yesterday" │ │
│ │ Memory: "User started Python project 2 weeks │ │
│ │ ago, learning Flask for web app" │ │
│ │ Memory: "User mentioned deadline is next month" │ │
│ └──────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ LLM Response │ │
│ │ "Based on our previous conversations, I know │ │
│ │ you're building a Flask web app. Since your │ │
│ │ deadline is next month, let me help you..." │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ [Store: "User asked about Python project progress"] │
└─────────────────────────────────────────────────────────┘
| Aspect | Chat Memory | RAG |
|---|---|---|
| Source | Past conversations | Documents |
| Updates | Every conversation | Batch indexing |
| Personal | Per-user | Shared |
| Decay | Time-based | Usually none |
from pymilvus import MilvusClient, DataType
from openai import OpenAI
import time
import uuid
class ChatMemory:
def __init__(self, uri: str = "./milvus.db"):
self.client = MilvusClient(uri=uri)
self.openai = OpenAI()
self.collection_name = "chat_memory"
self._init_collection()
def _embed(self, text: str) -> list:
"""Generate embedding"""
response = self.openai.embeddings.create(
model="text-embedding-3-small",
input=[text]
)
return response.data[0].embedding
def _init_collection(self):
if self.client.has_collection(self.collection_name):
return
schema = self.client.create_schema()
schema.add_field("id", DataType.VARCHAR, is_primary=True, max_length=64)
schema.add_field("user_id", DataType.VARCHAR, max_length=64)
schema.add_field("role", DataType.VARCHAR, max_length=16) # user/assistant
schema.add_field("content", DataType.VARCHAR, max_length=65535)
schema.add_field("timestamp", DataType.INT64)
schema.add_field("session_id", DataType.VARCHAR, max_length=64)
schema.add_field("importance", DataType.FLOAT) # 0-1, for prioritization
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=1536)
index_params = self.client.prepare_index_params()
index_params.add_index(field_name="embedding", index_type="AUTOINDEX", metric_type="COSINE")
index_params.add_index(field_name="user_id", index_type="TRIE")
index_params.add_index(field_name="timestamp", index_type="STL_SORT")
self.client.create_collection(
collection_name=self.collection_name,
schema=schema,
index_params=index_params
)
def store_message(self, user_id: str, role: str, content: str,
session_id: str = "", importance: float = 0.5):
"""Store a message in memory"""
embedding = self._embed(content)
self.client.insert(
collection_name=self.collection_name,
data=[{
"id": str(uuid.uuid4()),
"user_id": user_id,
"role": role,
"content": content,
"timestamp": int(time.time()),
"session_id": session_id,
"importance": importance,
"embedding": embedding
}]
)
def retrieve_relevant(self, user_id: str, query: str, limit: int = 10,
apply_decay: bool = True) -> list:
"""Retrieve relevant memories for a query"""
embedding = self._embed(query)
results = self.client.search(
collection_name=self.collection_name,
data=[embedding],
filter=f'user_id == "{user_id}"',
limit=limit * 2, # Get extra for decay filtering
output_fields=["role", "content", "timestamp", "importance"]
)
memories = []
for hit in results[0]:
memory = {
"role": hit["entity"]["role"],
"content": hit["entity"]["content"],
"timestamp": hit["entity"]["timestamp"],
"importance": hit["entity"]["importance"],
"similarity": hit["distance"]
}
# Apply time decay
if apply_decay:
days_ago = (time.time() - memory["timestamp"]) / 86400
decay = 0.95 ** days_ago # 5% decay per day
memory["final_score"] = memory["similarity"] * decay * (0.5 + memory["importance"] * 0.5)
else:
memory["final_score"] = memory["similarity"]
memories.append(memory)
# Sort by final score
memories.sort(key=lambda x: x["final_score"], reverse=True)
return memories[:limit]
def get_recent(self, user_id: str, session_id: str = "", limit: int = 10) -> list:
"""Get recent messages (for current session context)"""
filter_expr = f'user_id == "{user_id}"'
if session_id:
filter_expr += f' and session_id == "{session_id}"'
results = self.client.query(
collection_name=self.collection_name,
filter=filter_expr,
output_fields=["role", "content", "timestamp"],
limit=limit
)
results.sort(key=lambda x: x["timestamp"])
return results
def chat(self, user_id: str, message: str, session_id: str = "") -> str:
"""Chat with memory-enhanced response"""
# Build context
messages = [{
"role": "system",
"content": """You are a helpful assistant with long-term memory.
You can recall previous conversations and use them to provide personalized responses.
When relevant, reference past discussions naturally."""
}]
# Retrieve relevant memories
memories = self.retrieve_relevant(user_id, message, limit=5)
if memories:
memory_text = "\n".join([
f"[Past - {m['role']}]: {m['content']}"
for m in memories
])
messages.append({
"role": "system",
"content": f"Relevant past conversations:\n{memory_text}"
})
# Add recent session context
recent = self.get_recent(user_id, session_id, limit=5)
for msg in recent:
messages.append({"role": msg["role"], "content": msg["content"]})
# Add current message
messages.append({"role": "user", "content": message})
# Generate response
response = self.openai.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.7
)
assistant_message = response.choices[0].message.content
# Store conversation
self.store_message(user_id, "user", message, session_id)
self.store_message(user_id, "assistant", assistant_message, session_id)
return assistant_message
def mark_important(self, user_id: str, content_snippet: str):
"""Mark a memory as important (no decay)"""
# Find the memory
results = self.client.query(
collection_name=self.collection_name,
filter=f'user_id == "{user_id}"',
output_fields=["id", "content", "importance"],
limit=100
)
for r in results:
if content_snippet in r["content"]:
self.client.upsert(
collection_name=self.collection_name,
data=[{"id": r["id"], "importance": 1.0}]
)
return True
return False
# Usage
memory = ChatMemory()
user_id = "user001"
session_id = "session_" + str(int(time.time()))
# Day 1 conversation
response = memory.chat(user_id, "I'm learning Python for a web project", session_id)
print(f"Bot: {response}")
response = memory.chat(user_id, "Any Flask tutorials you recommend?", session_id)
print(f"Bot: {response}")
# Days later, new session
new_session = "session_" + str(int(time.time()))
response = memory.chat(user_id, "How should I continue my learning?", new_session)
# Bot will recall the Python/Flask context
print(f"Bot: {response}")
# Combine recent context (window) with relevant memories (retrieval)
recent_messages = get_recent(limit=5) # Last 5 messages
relevant_memories = retrieve_relevant(query, limit=5) # Top 5 relevant
context = recent_messages + relevant_memories
def compress_old_memories(self, user_id: str, days_threshold: int = 30):
"""Summarize and compress old memories"""
cutoff = int(time.time()) - (days_threshold * 86400)
old_memories = self.client.query(
collection_name=self.collection_name,
filter=f'user_id == "{user_id}" and timestamp < {cutoff}',
output_fields=["content"],
limit=100
)
if len(old_memories) > 10:
# Summarize with LLM
content = "\n".join([m["content"] for m in old_memories])
summary = self.llm.summarize(content)
# Store summary, delete originals
self.store_message(user_id, "summary", summary, importance=0.8)
self.delete_old_memories(user_id, cutoff)
def detect_importance(self, content: str) -> float:
"""Automatically detect if content is important"""
# Keywords indicating important info
important_keywords = ["always", "never", "prefer", "hate", "love",
"my name", "birthday", "deadline", "important"]
content_lower = content.lower()
matches = sum(1 for kw in important_keywords if kw in content_lower)
return min(0.5 + (matches * 0.1), 1.0)
Problem: Bot mentions unrelated past conversations
Fix: Increase similarity threshold
memories = [m for m in memories if m["similarity"] > 0.7]
Problem: Too many memories in context, confuses LLM
Fix: Limit memories, summarize if needed
memories = retrieve_relevant(query, limit=3) # Only top 3
Problem: Memory from one user leaks to another
Fix: Always filter by user_id
filter=f'user_id == "{user_id}"' # ALWAYS include
Problem: Bot keeps mentioning outdated information
Fix: Apply time decay
decay = 0.95 ** days_ago
final_score = similarity * decay
| Need | Upgrade To |
|---|---|
| Search documents | rag-toolkit:rag |
| Multi-user shared knowledge | Combine with RAG |
| Real-time streaming | Add message queue |
| Complex memory graphs | Consider Neo4j |
rag-toolkit:ragcore:embeddingverticals/Use when user needs to group similar items together. Triggers on: clustering, group similar, topic modeling, user segmentation, categorization, automatic classification, unsupervised grouping.
Use when user needs to find duplicate or similar content. Triggers on: duplicate, deduplication, plagiarism detection, similar content, near-duplicate, similarity detection, content dedup, find copies.
Use when user wants to build image search or similar image finding. Triggers on: image search, similar image, visual search, image retrieval, CLIP, reverse image search, image matching, find similar photos.
Use when user needs RAG on documents with images and text. Triggers on: multimodal RAG, image-text mixed, document with images, PDF with charts, visual RAG, visual Q&A, documents with figures.
Use when user needs to search images using natural language descriptions. Triggers on: text to image, describe and find, natural language image search, image caption search, find image by description, describe to find.
Use when user needs to search video content by text or image. Triggers on: video search, video retrieval, video clips, meeting recordings, tutorial videos, surveillance playback, find moment in video.