원클릭으로
check-infra
// Verify infrastructure readiness before launching benchmark runs — tokens, Docker, disk, credentials. Triggers on check infra, infrastructure check, ready to run, pre-run check.
// Verify infrastructure readiness before launching benchmark runs — tokens, Docker, disk, credentials. Triggers on check infra, infrastructure check, ready to run, pre-run check.
Archive old completed benchmark runs to save disk space and speed up scans. Triggers on archive runs, clean up runs, disk space, old runs.
Audit benchmark suites against ABC framework (Task/Outcome/Reporting validity). Checks instruction quality, verifier correctness, reproducibility. Triggers on benchmark audit, audit benchmark, abc audit, task validity.
Compare benchmark results across agent configurations (baseline, SG_full). Show where configs diverge. Triggers on compare configs, config comparison, which config wins, MCP impact.
Token and cost analysis per run, suite, and config. Shows most expensive tasks and config cost comparison. Triggers on cost report, how much did it cost, token usage, spending.
Generate the aggregate CSB evaluation report from completed Harbor runs. Triggers on generate report, eval report, ccb report, benchmark report.
Compute information retrieval quality metrics (precision, recall, MRR, nDCG, MAP) comparing file retrieval across baseline and MCP configs against ground truth. Triggers on ir analysis, retrieval metrics, file recall, ground truth, search quality.
| name | check-infra |
| description | Verify infrastructure readiness before launching benchmark runs — tokens, Docker, disk, credentials. Triggers on check infra, infrastructure check, ready to run, pre-run check. |
| user-invocable | true |
Verify all infrastructure prerequisites before launching a benchmark run.
.env.local (project root) has ANTHROPIC_API_KEY and SOURCEGRAPH_ACCESS_TOKENharbor CLI is installedruns/official/ directory existscd ~/CodeScaleBench && python3 scripts/check_infra.py
Show the table output directly — it's already formatted with color-coded status.
If FAIL items found, provide the specific fix:
source configs/_common.sh && refresh_claude_tokensource configs/_common.sh && setup_multi_accounts && ensure_fresh_token_all.env.local (project root) with required exportssudo systemctl start dockerpython3 scripts/archive_run.py --older-than 7 --executepython3 scripts/check_infra.py --format json