with one click
knowledge-base-cache
// Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries
// Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | knowledge-base-cache |
| description | Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries |
| version | 1 |
A layered knowledge base system with hot/cold/warm cache tiers and intelligent Working Memory for context management. Reduces API costs through multi-tier caching while supporting unlimited knowledge scale.
Use this skill when:
Do NOT use when:
Create a structured knowledge repository with layered architecture (hot/cold/warm) and intelligent context management.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer โ
โ Agent Core โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Working Memory Layer โ
โ โข Context Assembly โข Token Budget Management โ
โ โข Multi-Source Coordination โข LRU Cache โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Standard Interface KnowledgeSource
โโโโโโโโโโโผโโโโโโโโโโ
โผ โผ โผ (Reserved)
โโโโโโโโโ โโโโโโโโโ โโโโโโโโโ
โ Hot โ โ Cold โ โ Warm โ
โ Cache โ โStorageโ โVector โ
โ Layer โ โ Layer โ โ Layer โ
โโโโโฌโโโโ โโโโโฌโโโโ โโโโโฌโโโโ
โ โ โ
Context Repository Vector DB
Cache Files (Future)
| Tier | Technology | Use Case | Status |
|---|---|---|---|
| ๐ฅ Hot | Context Cache (API) | Full document retrieval, 90% cost savings | โ Available |
| โ๏ธ Cold | Repository Files | Keyword search, browsing, discovery | โ Available |
| ๐ก๏ธ Warm | Vector DB | Semantic search, precise Q&A | ๐ฎ Planned |
Layered Knowledge Storage
repository/
โโโ core/ # Core components
โ โโโ __init__.py # Standard interfaces
โ โโโ working_memory.py # Working Memory layer
โโโ adapters/ # Layer adapters
โ โโโ __init__.py
โ โโโ hot_cache_adapter.py
โ โโโ cold_storage_adapter.py
โ โโโ warm_cache_adapter.py (reserved)
โโโ index.json # Knowledge index
โโโ cache-state.json # Cache status
โโโ skills/ # Skill knowledge
โโโ docs/ # Document knowledge
โโโ scripts/
โโโ cache_manager.py # Cache management
โโโ cache_helper.py # Helper utilities
Working Memory Layer
Context Caching (Hot Layer)
File-Based Storage (Cold Layer)
Auto-Refresh
# The repository structure is already created
# If not, run:
python scripts/init_knowledge_base.py
Add markdown files to appropriate directories:
repository/skills/ - Skill documentationrepository/docs/ - General documentationrepository/projects/ - Project-specific knowledgecd repository
# Initialize index
python scripts/cache_manager.py init
# Build hot cache (Context Caching)
python scripts/cache_manager.py build
# Test the system
python test_phase1.py
Modern Approach (Recommended):
from repository.core.working_memory import WorkingMemoryManager
# Initialize once
wm = WorkingMemoryManager({
'max_tokens': 6000,
'allocation': {
'system_prompt': 0.15, # 15%
'conversation': 0.25, # 25%
'retrieved_knowledge': 0.60 # 60%
}
})
# Use in conversations
context = wm.query(
user_query="How do I deploy?",
system_prompt="You are an assistant...",
conversation=history_messages
)
Legacy Approach:
from scripts.cache_helper import get_cache_headers, load_knowledge_context
# Get cache headers for API calls
headers = get_cache_headers()
# Load knowledge context
context = load_knowledge_context()
# Add cron job for daily refresh
# Configure in your agent's cron system
Purpose: Store frequently accessed complete documents
When to Use:
Implementation: adapters/hot_cache_adapter.py
from adapters.hot_cache_adapter import HotCacheAdapter
from core import RetrievalQuery
hot = HotCacheAdapter()
result = hot.retrieve(RetrievalQuery(
query="Docker deployment",
context_budget=2000,
top_k=3
))
Purpose: Keyword-based file retrieval with excerpt generation
When to Use:
Implementation: adapters/cold_storage_adapter.py
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery
cold = ColdStorageAdapter()
result = cold.retrieve(RetrievalQuery(
query="Docker deployment",
context_budget=2000,
top_k=5
))
Purpose: Semantic search with vector embeddings
When to Use:
Implementation: Reserved interface in adapters/warm_cache_adapter.py
Default allocation (customizable):
| Component | Percentage | Tokens (6K total) |
|---|---|---|
| System Prompt | 15% | 900 |
| Conversation | 25% | 1,500 |
| Retrieved Knowledge | 60% | 3,600 |
from repository.core.working_memory import WorkingMemoryManager
from repository.core import MemoryAllocation
wm = WorkingMemoryManager({
'max_tokens': 8000, # Total context window
'lru_cache_size': 10, # LRU cache size
'allocation': {
'system_prompt': 0.20, # 20%
'conversation': 0.20, # 20%
'retrieved_knowledge': 0.60 # 60%
},
'repo_path': 'repository' # Repository path
})
| Command | Description |
|---|---|
cache_manager.py init | Scan repository and update index |
cache_manager.py build | Create/update hot caches |
cache_manager.py status | Show cache status |
cache_manager.py refresh | Refresh expired caches |
cache_manager.py stats | Show statistics |
# Run Phase 1 integration tests
cd repository
python test_phase1.py
# Test individual layers
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().get_stats())"
python -c "from adapters.cold_storage_adapter import ColdStorageAdapter; print(ColdStorageAdapter().get_stats())"
| Metric | Without Cache | With Cache | Savings |
|---|---|---|---|
| Cost per 1000 queries | ~ยฅ150 | ~ยฅ15 | 90% |
| First token latency | ~30s | ~5s | 83% |
| Monthly cost (daily 50 queries) | ~ยฅ450 | ~ยฅ45 | ยฅ405 |
| Metric | Value |
|---|---|
| API Cost | ยฅ0 (no API calls) |
| Latency | ~10-50ms (local files) |
| Best For | Browsing, discovery, keyword search |
| Metric | Value |
|---|---|
| Context Assembly | Automatic |
| Token Budget | Enforced |
| Multi-Source | Hot + Cold (+ Warm in future) |
| LRU Cache | Reduces repeated queries |
# Check if caches are active
python scripts/cache_manager.py status
# Rebuild if needed
python scripts/cache_manager.py build
# Verify hot layer
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().is_available())"
# Debug: Check registered sources
from repository.core.working_memory import WorkingMemoryManager
wm = WorkingMemoryManager()
print(wm.get_stats())
# Debug: Test individual layers
from adapters.hot_cache_adapter import HotCacheAdapter
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery
hot = HotCacheAdapter()
cold = ColdStorageAdapter()
query = RetrievalQuery(query="test", context_budget=2000)
print("Hot:", hot.retrieve(query))
print("Cold:", cold.retrieve(query))
Ensure API key is set in environment or config for hot layer. Cold layer works without API keys.
All paths in generated files are relative (workspace-relative) for portability.
If you were using the old cache system:
cache_helper.py functions unchangedWorkingMemoryManager for better control