with one click
analyze-results
// Analyze sweep results after completion. Computes accuracy, CFR, signal statistics, and generates comparison tables. Use after a sweep completes to understand the data.
// Analyze sweep results after completion. Computes accuracy, CFR, signal statistics, and generates comparison tables. Use after a sweep completes to understand the data.
Search past decisions, failures, and experiment logs for relevant context before starting a task. Use this before any significant implementation or experiment to avoid repeating mistakes.
Compare results across models (Qwen vs Llama) at matching compression configurations. Generates side-by-side tables and identifies cross-model patterns.
After completing a significant task or experiment, extract lessons learned and update the project knowledge base. Captures what worked, what failed, and what to remember for next time.
Prepare and run a KV-cache compression sweep. Loads sweep configuration, validates prerequisites, and provides the exact commands needed. Use before starting any GPU experiment.
| name | analyze-results |
| description | Analyze sweep results after completion. Computes accuracy, CFR, signal statistics, and generates comparison tables. Use after a sweep completes to understand the data. |
| argument-hint | ["num-prompts"] |
| disable-model-invocation | true |
| allowed-tools | Read, Bash, Grep, Glob |
Run a structured analysis of sweep results.
Verify data completeness:
uv run kvguard verify --output-dir results --num-prompts ${0:-500}
Run built-in analysis:
uv run kvguard analyze --output-dir results --num-prompts ${0:-500}
Generate comparison tables by reading result files:
For each result file in results/, extract from the JSON:
summary.accuracy โ correct answer ratesummary.cfr โ catastrophic failure rate (looping + non-termination)Key comparisons to produce:
Table 1: Accuracy ร Compression Budget
| Model | Press | Ratio | Accuracy | CFR | Non-term | Looping |
|---|
Table 2: Signal Statistics
| Model | Press | Ratio | Mean Entropy | Max Entropy | Min Top-1 Prob |
|---|
Table 3: Model Comparison (same press ร ratio)
| Press | Ratio | Qwen Acc | Llama Acc | Qwen CFR | Llama CFR |
|---|
Flag anomalies:
Present findings as structured Markdown tables. Highlight the "cliff point" where accuracy drops sharply for each model ร compressor combination.