com um clique
archive-run
// Archive old completed benchmark runs to save disk space and speed up scans. Triggers on archive runs, clean up runs, disk space, old runs.
// Archive old completed benchmark runs to save disk space and speed up scans. Triggers on archive runs, clean up runs, disk space, old runs.
Audit benchmark suites against ABC framework (Task/Outcome/Reporting validity). Checks instruction quality, verifier correctness, reproducibility. Triggers on benchmark audit, audit benchmark, abc audit, task validity.
Verify infrastructure readiness before launching benchmark runs — tokens, Docker, disk, credentials. Triggers on check infra, infrastructure check, ready to run, pre-run check.
Compare benchmark results across agent configurations (baseline, SG_full). Show where configs diverge. Triggers on compare configs, config comparison, which config wins, MCP impact.
Token and cost analysis per run, suite, and config. Shows most expensive tasks and config cost comparison. Triggers on cost report, how much did it cost, token usage, spending.
Generate the aggregate CSB evaluation report from completed Harbor runs. Triggers on generate report, eval report, ccb report, benchmark report.
Compute information retrieval quality metrics (precision, recall, MRR, nDCG, MAP) comparing file retrieval across baseline and MCP configs against ground truth. Triggers on ir analysis, retrieval metrics, file recall, ground truth, search quality.
| name | archive-run |
| description | Archive old completed benchmark runs to save disk space and speed up scans. Triggers on archive runs, clean up runs, disk space, old runs. |
| user-invocable | true |
Move old completed run directories to runs/official/archive/ to save disk and speed up scans.
cd ~/CodeScaleBench && python3 scripts/archive_run.py --older-than 7
Show the list of directories that would be archived, with their age, size, and result count.
python3 scripts/archive_run.py --older-than 7 --execute
python3 scripts/archive_run.py --run-dir pytorch_opus_20260203_160607 --execute
python3 scripts/archive_run.py --older-than 7 --execute --compress
python3 scripts/archive_run.py --list-archived
python3 scripts/archive_run.py --older-than 7 --format json