一键导入
compare-models
// Compare results across models (Qwen vs Llama) at matching compression configurations. Generates side-by-side tables and identifies cross-model patterns.
// Compare results across models (Qwen vs Llama) at matching compression configurations. Generates side-by-side tables and identifies cross-model patterns.
Search past decisions, failures, and experiment logs for relevant context before starting a task. Use this before any significant implementation or experiment to avoid repeating mistakes.
Analyze sweep results after completion. Computes accuracy, CFR, signal statistics, and generates comparison tables. Use after a sweep completes to understand the data.
After completing a significant task or experiment, extract lessons learned and update the project knowledge base. Captures what worked, what failed, and what to remember for next time.
Prepare and run a KV-cache compression sweep. Loads sweep configuration, validates prerequisites, and provides the exact commands needed. Use before starting any GPU experiment.
| name | compare-models |
| description | Compare results across models (Qwen vs Llama) at matching compression configurations. Generates side-by-side tables and identifies cross-model patterns. |
| argument-hint | ["num-prompts"] |
| allowed-tools | Read, Grep, Glob, Bash |
Generate a comprehensive cross-model comparison for all matching compression configurations.
Find matching result files for both models:
find results -name "*Qwen*_${0:-500}p.json" | sort
find results -name "*Llama*_${0:-500}p.json" | sort
For each press × ratio, load both models' results and extract:
Generate comparison tables:
Table: Accuracy Comparison
| Press | Ratio | Qwen 7B | Llama 8B | Delta |
|---|
Table: CFR Comparison
| Press | Ratio | Qwen CFR | Llama CFR | Delta |
|---|
Table: Failure Mode Comparison
| Press | Ratio | Qwen Loop% | Llama Loop% | Qwen NT% | Llama NT% |
|---|
Identify patterns:
Cross-model transfer implications: