| name | prompt-cache-agent-harness |
| description | Plan and inspect prompt-cache behavior for long-running Claude agent loops. Use when a user wants to split stable tool, system, and history context into cacheable layers, compare captured cache metadata, estimate cost impact from supplied pricing inputs, or keep durable memory outside the cached prefix. |
Prompt Cache Agent Harness
For the student-facing explanation of why this package exists, read
README.md first. This file is the invocation contract for Codex.
Overview
Use this skill when the user is designing or debugging a long-running Claude
agent workflow where prompt caching is supposed to help. The goal is not to
call the Anthropic API directly from the skill. The goal is to keep the harness
local and inspectable:
- plan a stable prompt spine
- keep tools, system instructions, and long-lived history in explicit layers
- keep durable memory and user-specific recall outside the cached prefix
- read captured run artifacts after the agent loop executes
- produce a small report on cache reads, cache writes, latency, and optional
token-cost estimates
When To Use
Use this skill for requests such as:
- design a prompt-cache layout for a persistent Claude agent
- separate cacheable static context from dynamic memory or retrieval facts
- compare cache reads and writes across cold and warm Claude runs
- estimate input-cost changes from supplied Anthropic pricing values
- explain why a cached prefix is being invalidated
Do not use this skill to promise provider-side savings without run artifacts.
If usage metadata is missing, say that and limit the answer to prompt-layout
review.
Prompt Layout Rules
Use a layered prompt spine:
tools: stable tool manifest, schemas, and capability boundaries
system: stable role, policies, and operating instructions
reference: stable project or domain context that changes rarely
session-summary: compressed prior state only when it is intended to stay
stable for many turns
dynamic-memory: retrieved memories, current task facts, and user-specific
details that should not invalidate the cached prefix
turn-input: the current user request and latest volatile state
Prefer cache breakpoints after stable layers, not after dynamic memory. When
the provider supports explicit cache controls, apply them only to stable
content that can be reused safely.
Expected Run Artifact
The helper reads one or more JSON or JSONL artifacts with fields like:
{
"label": "warm-run",
"latency_ms": 2800,
"input_tokens": 18000,
"cache_creation_input_tokens": 1200,
"cache_read_input_tokens": 14500,
"output_tokens": 900,
"stable_layer_hash": "tools-system-reference-v1",
"dynamic_memory_hash": "retrieval-42",
"notes": ["user-specific recall was appended after the cache boundary"]
}
Useful aliases are accepted for common captured metadata:
prompt_tokens for input_tokens
cached_tokens for cache_read_input_tokens
cache_write_tokens for cache_creation_input_tokens
prefix_hash for stable_layer_hash
Local State And Outputs
Keep raw run artifacts outside the repo by default:
~/.codex/state/prompt-cache-agent-harness/
inputs/
reports/
Do not commit transcripts, API responses, secrets, or provider logs unless the
user explicitly curates a redacted example for publication.
Commands
Preview the helper:
python3 scripts/prompt_cache_report.py --help
Generate a Markdown report without pricing:
python3 scripts/prompt_cache_report.py \
--input ~/.codex/state/prompt-cache-agent-harness/inputs/run-pair.jsonl \
--output ~/.codex/state/prompt-cache-agent-harness/reports/cache-report.md
Generate a report with user-supplied pricing values:
python3 scripts/prompt_cache_report.py \
--input ~/.codex/state/prompt-cache-agent-harness/inputs/run-pair.jsonl \
--base-input-usd-per-mtok 3.00 \
--cache-write-usd-per-mtok 3.75 \
--cache-hit-usd-per-mtok 0.30 \
--output ~/.codex/state/prompt-cache-agent-harness/reports/cache-report.md
Interpretation Rules
- Treat changes to
stable_layer_hash as the strongest cache-break signal.
- Treat changes to
dynamic_memory_hash as expected only when dynamic memory is
placed after the cache boundary.
- High cache writes with low cache reads usually means the workflow is paying
setup cost without warm reuse.
- Strong cache reads on the warm run indicate the stable prefix is being reused.
- Keep durable memory separate from prompt caching; durable memory decides what
to recall, while prompt caching rewards exact reusable prompt prefixes.
Safety Boundaries
- Do not store sensitive conversation history in the cached prefix by default.
- Do not ask the user to expose Anthropic API keys to generate a report.
- Do not persist cost or trace reports in the repo unless the user asks.
- Do not turn a prompt-cache harness into an uncontrolled autonomous agent
runner.
Response Pattern
When reporting back, include:
- which layers should be cacheable
- cold versus warm cache-read share
- cache-write tokens versus cache-read tokens
- whether the stable layer hash changed
- whether dynamic memory appears to be outside the cached prefix
- cost estimates only when pricing values were supplied