cost-report

Name: Cost Report
Author: sourcegraph

// Token and cost analysis per run, suite, and config. Shows most expensive tasks and config cost comparison. Triggers on cost report, how much did it cost, token usage, spending.

Ejecutar en Manus

$ git log --oneline --stat

stars:25

forks:3

updated:17 de marzo de 2026, 01:39

SKILL.md

readonly

name	cost-report
description	Token and cost analysis per run, suite, and config. Shows most expensive tasks and config cost comparison. Triggers on cost report, how much did it cost, token usage, spending.
user-invocable	true

Cost Report

Analyze token usage and estimated cost across benchmark runs.

Steps

1. Run cost analysis

cd ~/CodeScaleBench && python3 scripts/cost_report.py

2. Present findings

The table output shows:

Total cost, tokens, and wall-clock hours
Per suite/config breakdown with average cost per task
Config cost comparison (is SG_full significantly more expensive than baseline?)
Top 10 most expensive individual tasks

Variants

Filter to one suite

python3 scripts/cost_report.py --suite csb_sdlc_pytorch

Filter to one config

python3 scripts/cost_report.py --config sourcegraph_full

JSON output

python3 scripts/cost_report.py --format json

related-skills.json

mismo repositorio

archive-run.md

from "sourcegraph/CodeScaleBench"

Archive old completed benchmark runs to save disk space and speed up scans. Triggers on archive runs, clean up runs, disk space, old runs.

2026-03-1725

benchmark-audit.md

from "sourcegraph/CodeScaleBench"

Audit benchmark suites against ABC framework (Task/Outcome/Reporting validity). Checks instruction quality, verifier correctness, reproducibility. Triggers on benchmark audit, audit benchmark, abc audit, task validity.

2026-03-1725

check-infra.md

from "sourcegraph/CodeScaleBench"

Verify infrastructure readiness before launching benchmark runs — tokens, Docker, disk, credentials. Triggers on check infra, infrastructure check, ready to run, pre-run check.

2026-03-1725

compare-configs.md

from "sourcegraph/CodeScaleBench"

Compare benchmark results across agent configurations (baseline, SG_full). Show where configs diverge. Triggers on compare configs, config comparison, which config wins, MCP impact.

2026-03-1725

generate-report.md

from "sourcegraph/CodeScaleBench"

Generate the aggregate CSB evaluation report from completed Harbor runs. Triggers on generate report, eval report, ccb report, benchmark report.

2026-03-1725

ir-analysis.md

from "sourcegraph/CodeScaleBench"

Compute information retrieval quality metrics (precision, recall, MRR, nDCG, MAP) comparing file retrieval across baseline and MCP configs against ground truth. Triggers on ir analysis, retrieval metrics, file recall, ground truth, search quality.

2026-03-1725

package.json