ワンクリックで
knowledge-base-cache
Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
Use when the current Agent LLM cannot process images directly and visual analysis is needed — bridges images through KimiCode CLI print mode to a multimodal Kimi model for text description
Use when building HUDs, menus, inventory screens, settings panels, or any widget-based interface in Unreal Engine 5. Also use when connecting C++ logic to UMG Blueprint visuals, handling gamepad or keyboard focus navigation, managing UI state, creating widget animations, or troubleshooting UMG performance issues like frame drops, hitches, or widget memory leaks.
Use when working in a DevFlow project with .devflow/ directory and gate-based step-by-step workflows
Use when contributing new skills to the skill-lib repository, installing skills locally, or verifying skill compliance with repository standards
Use when analyzing unfamiliar code modules, understanding system architecture, or preparing for refactoring
Use when implementing new modules from design documents, adding features to existing code, or generating structured implementations
| name | knowledge-base-cache |
| description | Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries |
| version | 1 |
A layered knowledge base system with hot/cold/warm cache tiers and intelligent Working Memory for context management. Reduces API costs through multi-tier caching while supporting unlimited knowledge scale.
Use this skill when:
Do NOT use when:
Create a structured knowledge repository with layered architecture (hot/cold/warm) and intelligent context management.
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ Agent Core │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Working Memory Layer │
│ • Context Assembly • Token Budget Management │
│ • Multi-Source Coordination • LRU Cache │
└─────────────┬───────────────────────────────────────────────┘
│ Standard Interface KnowledgeSource
┌─────────┼─────────┐
▼ ▼ ▼ (Reserved)
┌───────┐ ┌───────┐ ┌───────┐
│ Hot │ │ Cold │ │ Warm │
│ Cache │ │Storage│ │Vector │
│ Layer │ │ Layer │ │ Layer │
└───┬───┘ └───┬───┘ └───┬───┘
│ │ │
Context Repository Vector DB
Cache Files (Future)
| Tier | Technology | Use Case | Status |
|---|---|---|---|
| 🔥 Hot | Context Cache (API) | Full document retrieval, 90% cost savings | ✅ Available |
| ❄️ Cold | Repository Files | Keyword search, browsing, discovery | ✅ Available |
| 🌡️ Warm | Vector DB | Semantic search, precise Q&A | 🔮 Planned |
Layered Knowledge Storage
repository/
├── core/ # Core components
│ ├── __init__.py # Standard interfaces
│ └── working_memory.py # Working Memory layer
├── adapters/ # Layer adapters
│ ├── __init__.py
│ ├── hot_cache_adapter.py
│ ├── cold_storage_adapter.py
│ └── warm_cache_adapter.py (reserved)
├── index.json # Knowledge index
├── cache-state.json # Cache status
├── skills/ # Skill knowledge
├── docs/ # Document knowledge
└── scripts/
├── cache_manager.py # Cache management
└── cache_helper.py # Helper utilities
Working Memory Layer
Context Caching (Hot Layer)
File-Based Storage (Cold Layer)
Auto-Refresh
# The repository structure is already created
# If not, run:
python scripts/init_knowledge_base.py
Add markdown files to appropriate directories:
repository/skills/ - Skill documentationrepository/docs/ - General documentationrepository/projects/ - Project-specific knowledgecd repository
# Initialize index
python scripts/cache_manager.py init
# Build hot cache (Context Caching)
python scripts/cache_manager.py build
# Test the system
python test_phase1.py
Modern Approach (Recommended):
from repository.core.working_memory import WorkingMemoryManager
# Initialize once
wm = WorkingMemoryManager({
'max_tokens': 6000,
'allocation': {
'system_prompt': 0.15, # 15%
'conversation': 0.25, # 25%
'retrieved_knowledge': 0.60 # 60%
}
})
# Use in conversations
context = wm.query(
user_query="How do I deploy?",
system_prompt="You are an assistant...",
conversation=history_messages
)
Legacy Approach:
from scripts.cache_helper import get_cache_headers, load_knowledge_context
# Get cache headers for API calls
headers = get_cache_headers()
# Load knowledge context
context = load_knowledge_context()
# Add cron job for daily refresh
# Configure in your agent's cron system
Purpose: Store frequently accessed complete documents
When to Use:
Implementation: adapters/hot_cache_adapter.py
from adapters.hot_cache_adapter import HotCacheAdapter
from core import RetrievalQuery
hot = HotCacheAdapter()
result = hot.retrieve(RetrievalQuery(
query="Docker deployment",
context_budget=2000,
top_k=3
))
Purpose: Keyword-based file retrieval with excerpt generation
When to Use:
Implementation: adapters/cold_storage_adapter.py
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery
cold = ColdStorageAdapter()
result = cold.retrieve(RetrievalQuery(
query="Docker deployment",
context_budget=2000,
top_k=5
))
Purpose: Semantic search with vector embeddings
When to Use:
Implementation: Reserved interface in adapters/warm_cache_adapter.py
Default allocation (customizable):
| Component | Percentage | Tokens (6K total) |
|---|---|---|
| System Prompt | 15% | 900 |
| Conversation | 25% | 1,500 |
| Retrieved Knowledge | 60% | 3,600 |
from repository.core.working_memory import WorkingMemoryManager
from repository.core import MemoryAllocation
wm = WorkingMemoryManager({
'max_tokens': 8000, # Total context window
'lru_cache_size': 10, # LRU cache size
'allocation': {
'system_prompt': 0.20, # 20%
'conversation': 0.20, # 20%
'retrieved_knowledge': 0.60 # 60%
},
'repo_path': 'repository' # Repository path
})
| Command | Description |
|---|---|
cache_manager.py init | Scan repository and update index |
cache_manager.py build | Create/update hot caches |
cache_manager.py status | Show cache status |
cache_manager.py refresh | Refresh expired caches |
cache_manager.py stats | Show statistics |
# Run Phase 1 integration tests
cd repository
python test_phase1.py
# Test individual layers
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().get_stats())"
python -c "from adapters.cold_storage_adapter import ColdStorageAdapter; print(ColdStorageAdapter().get_stats())"
| Metric | Without Cache | With Cache | Savings |
|---|---|---|---|
| Cost per 1000 queries | ~¥150 | ~¥15 | 90% |
| First token latency | ~30s | ~5s | 83% |
| Monthly cost (daily 50 queries) | ~¥450 | ~¥45 | ¥405 |
| Metric | Value |
|---|---|
| API Cost | ¥0 (no API calls) |
| Latency | ~10-50ms (local files) |
| Best For | Browsing, discovery, keyword search |
| Metric | Value |
|---|---|
| Context Assembly | Automatic |
| Token Budget | Enforced |
| Multi-Source | Hot + Cold (+ Warm in future) |
| LRU Cache | Reduces repeated queries |
# Check if caches are active
python scripts/cache_manager.py status
# Rebuild if needed
python scripts/cache_manager.py build
# Verify hot layer
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().is_available())"
# Debug: Check registered sources
from repository.core.working_memory import WorkingMemoryManager
wm = WorkingMemoryManager()
print(wm.get_stats())
# Debug: Test individual layers
from adapters.hot_cache_adapter import HotCacheAdapter
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery
hot = HotCacheAdapter()
cold = ColdStorageAdapter()
query = RetrievalQuery(query="test", context_budget=2000)
print("Hot:", hot.retrieve(query))
print("Cold:", cold.retrieve(query))
Ensure API key is set in environment or config for hot layer. Cold layer works without API keys.
All paths in generated files are relative (workspace-relative) for portability.
If you were using the old cache system:
cache_helper.py functions unchangedWorkingMemoryManager for better control