with one click
sweep
// Prepare and run a KV-cache compression sweep. Loads sweep configuration, validates prerequisites, and provides the exact commands needed. Use before starting any GPU experiment.
// Prepare and run a KV-cache compression sweep. Loads sweep configuration, validates prerequisites, and provides the exact commands needed. Use before starting any GPU experiment.
| name | sweep |
| description | Prepare and run a KV-cache compression sweep. Loads sweep configuration, validates prerequisites, and provides the exact commands needed. Use before starting any GPU experiment. |
Before running a sweep, verify prerequisites and provide the correct commands.
Check environment:
uv run python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, MPS: {torch.backends.mps.is_available()}')" to verify GPU accessmake check to ensure code is cleanCheck existing results:
find results -name "*_$1p.json" 2>/dev/null | wc -lfind results -name "*_$1p.json" -exec basename {} \;find results -name "*.ckpt.jsonl" -exec wc -l {} \;Validate model access:
ls ~/.cache/huggingface/hub/models--$(echo "$0" | tr '/' '--')/ 2>/dev/nullbash scripts/download_models.shProvide the sweep command:
uv run kvguard sweep \
--num-prompts ${1:-500} \
--model "${0:-Qwen/Qwen2.5-7B-Instruct}" \
--output-dir results \
--max-new-tokens 512
For the full Phase 2 pipeline (both models + train + eval):
nohup bash scripts/run_phase2.sh &
# Monitor with: bash scripts/check_status.sh
16 configs per model:
Expected output: results/{press}/{ModelShort}_{ratio}_{num_prompts}p.json
Checkpoint files: results/{press}/{ModelShort}_{ratio}.ckpt.jsonl (auto-resume on restart)
Search past decisions, failures, and experiment logs for relevant context before starting a task. Use this before any significant implementation or experiment to avoid repeating mistakes.
Analyze sweep results after completion. Computes accuracy, CFR, signal statistics, and generates comparison tables. Use after a sweep completes to understand the data.
Compare results across models (Qwen vs Llama) at matching compression configurations. Generates side-by-side tables and identifies cross-model patterns.
After completing a significant task or experiment, extract lessons learned and update the project knowledge base. Captures what worked, what failed, and what to remember for next time.