| name | knowledge-base-cache |
| description | Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries |
| version | 1 |
Knowledge Base Cache Skill
Overview
A layered knowledge base system with hot/cold/warm cache tiers and intelligent Working Memory for context management. Reduces API costs through multi-tier caching while supporting unlimited knowledge scale.
When to Use
Use this skill when:
- Managing large knowledge bases that exceed context window limits
- Reducing API costs for frequent knowledge queries
- Implementing multi-tier caching (hot/cold/warm) for knowledge retrieval
- Needing intelligent context assembly with token budget management
- Requiring automatic caching with semantic retrieval capabilities
Do NOT use when:
- Simple, small knowledge bases that fit in a single context window
- One-off queries where caching overhead exceeds savings
- Only basic file storage without caching tiers is needed
Create a structured knowledge repository with layered architecture (hot/cold/warm) and intelligent context management.
Architecture Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer โ
โ Agent Core โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Working Memory Layer โ
โ โข Context Assembly โข Token Budget Management โ
โ โข Multi-Source Coordination โข LRU Cache โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Standard Interface KnowledgeSource
โโโโโโโโโโโผโโโโโโโโโโ
โผ โผ โผ (Reserved)
โโโโโโโโโ โโโโโโโโโ โโโโโโโโโ
โ Hot โ โ Cold โ โ Warm โ
โ Cache โ โStorageโ โVector โ
โ Layer โ โ Layer โ โ Layer โ
โโโโโฌโโโโ โโโโโฌโโโโ โโโโโฌโโโโ
โ โ โ
Context Repository Vector DB
Cache Files (Future)
Three-Tier Architecture
| Tier | Technology | Use Case | Status |
|---|
| ๐ฅ Hot | Context Cache (API) | Full document retrieval, 90% cost savings | โ
Available |
| โ๏ธ Cold | Repository Files | Keyword search, browsing, discovery | โ
Available |
| ๐ก๏ธ Warm | Vector DB | Semantic search, precise Q&A | ๐ฎ Planned |
What This Skill Does
-
Layered Knowledge Storage
repository/
โโโ core/ # Core components
โ โโโ __init__.py # Standard interfaces
โ โโโ working_memory.py # Working Memory layer
โโโ adapters/ # Layer adapters
โ โโโ __init__.py
โ โโโ hot_cache_adapter.py
โ โโโ cold_storage_adapter.py
โ โโโ warm_cache_adapter.py (reserved)
โโโ index.json # Knowledge index
โโโ cache-state.json # Cache status
โโโ skills/ # Skill knowledge
โโโ docs/ # Document knowledge
โโโ scripts/
โโโ cache_manager.py # Cache management
โโโ cache_helper.py # Helper utilities
-
Working Memory Layer
- Unified interface for all knowledge sources
- Automatic context assembly with token budgeting
- LRU cache for repeated queries
- Cross-tier result ranking
-
Context Caching (Hot Layer)
- Full document caching via API
- 90% cost reduction
- 83% latency improvement
-
File-Based Storage (Cold Layer)
- Keyword-based retrieval
- Excerpt generation
- No API costs
-
Auto-Refresh
- Configures cron job for daily refresh
- Keeps caches fresh without manual intervention
Quick Start
Step 1: Initialize Repository
python scripts/init_knowledge_base.py
Step 2: Add Knowledge
Add markdown files to appropriate directories:
repository/skills/ - Skill documentation
repository/docs/ - General documentation
repository/projects/ - Project-specific knowledge
Step 3: Build Cache
cd repository
python scripts/cache_manager.py init
python scripts/cache_manager.py build
python test_phase1.py
Step 4: Use in Your Agent
Modern Approach (Recommended):
from repository.core.working_memory import WorkingMemoryManager
wm = WorkingMemoryManager({
'max_tokens': 6000,
'allocation': {
'system_prompt': 0.15,
'conversation': 0.25,
'retrieved_knowledge': 0.60
}
})
context = wm.query(
user_query="How do I deploy?",
system_prompt="You are an assistant...",
conversation=history_messages
)
Legacy Approach:
from scripts.cache_helper import get_cache_headers, load_knowledge_context
headers = get_cache_headers()
context = load_knowledge_context()
Step 5: Configure Auto-Refresh
Layer Details
๐ฅ Hot Cache Layer
Purpose: Store frequently accessed complete documents
When to Use:
- Reading full skill documentation
- API reference lookup
- Deployment guides
Implementation: adapters/hot_cache_adapter.py
from adapters.hot_cache_adapter import HotCacheAdapter
from core import RetrievalQuery
hot = HotCacheAdapter()
result = hot.retrieve(RetrievalQuery(
query="Docker deployment",
context_budget=2000,
top_k=3
))
โ๏ธ Cold Storage Layer
Purpose: Keyword-based file retrieval with excerpt generation
When to Use:
- Browsing knowledge base
- Finding relevant files
- Low-cost retrieval
Implementation: adapters/cold_storage_adapter.py
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery
cold = ColdStorageAdapter()
result = cold.retrieve(RetrievalQuery(
query="Docker deployment",
context_budget=2000,
top_k=5
))
๐ก๏ธ Warm Cache Layer (Planned)
Purpose: Semantic search with vector embeddings
When to Use:
- Precise Q&A
- Semantic similarity matching
- Large knowledge bases
Implementation: Reserved interface in adapters/warm_cache_adapter.py
Working Memory Configuration
Token Budget Allocation
Default allocation (customizable):
| Component | Percentage | Tokens (6K total) |
|---|
| System Prompt | 15% | 900 |
| Conversation | 25% | 1,500 |
| Retrieved Knowledge | 60% | 3,600 |
Configuration Options
from repository.core.working_memory import WorkingMemoryManager
from repository.core import MemoryAllocation
wm = WorkingMemoryManager({
'max_tokens': 8000,
'lru_cache_size': 10,
'allocation': {
'system_prompt': 0.20,
'conversation': 0.20,
'retrieved_knowledge': 0.60
},
'repo_path': 'repository'
})
Cache Management Commands
| Command | Description |
|---|
cache_manager.py init | Scan repository and update index |
cache_manager.py build | Create/update hot caches |
cache_manager.py status | Show cache status |
cache_manager.py refresh | Refresh expired caches |
cache_manager.py stats | Show statistics |
Testing Commands
cd repository
python test_phase1.py
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().get_stats())"
python -c "from adapters.cold_storage_adapter import ColdStorageAdapter; print(ColdStorageAdapter().get_stats())"
Cost Benefits
Hot Layer (Context Cache)
| Metric | Without Cache | With Cache | Savings |
|---|
| Cost per 1000 queries | ~ยฅ150 | ~ยฅ15 | 90% |
| First token latency | ~30s | ~5s | 83% |
| Monthly cost (daily 50 queries) | ~ยฅ450 | ~ยฅ45 | ยฅ405 |
Cold Layer (File Storage)
| Metric | Value |
|---|
| API Cost | ยฅ0 (no API calls) |
| Latency | ~10-50ms (local files) |
| Best For | Browsing, discovery, keyword search |
Working Memory Layer
| Metric | Value |
|---|
| Context Assembly | Automatic |
| Token Budget | Enforced |
| Multi-Source | Hot + Cold (+ Warm in future) |
| LRU Cache | Reduces repeated queries |
Troubleshooting
Cache Not Working
python scripts/cache_manager.py status
python scripts/cache_manager.py build
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().is_available())"
Working Memory Not Finding Knowledge
from repository.core.working_memory import WorkingMemoryManager
wm = WorkingMemoryManager()
print(wm.get_stats())
from adapters.hot_cache_adapter import HotCacheAdapter
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery
hot = HotCacheAdapter()
cold = ColdStorageAdapter()
query = RetrievalQuery(query="test", context_budget=2000)
print("Hot:", hot.retrieve(query))
print("Cold:", cold.retrieve(query))
API Key Issues
Ensure API key is set in environment or config for hot layer.
Cold layer works without API keys.
Path Issues
All paths in generated files are relative (workspace-relative) for portability.
Migration from v1
If you were using the old cache system:
- Old way still works:
cache_helper.py functions unchanged
- New way recommended: Use
WorkingMemoryManager for better control
- Same repository structure: No migration needed
References
- Context Caching documentation
- Component architecture design