تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

generate-report

Name: Generate Report
Author: sourcegraph

// Generate the aggregate CSB evaluation report from completed Harbor runs. Triggers on generate report, eval report, ccb report, benchmark report.

تشغيل في Manus

$ git log --oneline --stat

stars:٢٥

forks:٣

updated:١٧ مارس ٢٠٢٦ في ٠١:٣٩

SKILL.md

readonly

name	generate-report
description	Generate the aggregate CSB evaluation report from completed Harbor runs. Triggers on generate report, eval report, ccb report, benchmark report.
user-invocable	true

Generate CSB Evaluation Report

Generate the aggregate CodeScaleBench evaluation report from completed Harbor runs in runs/official/.

What This Does

Runs scripts/generate_eval_report.py which:

Discovers all completed runs in the official runs directory
Extracts metrics from each task's result.json (and fallback sources)
Enriches with selection metadata (SDLC phase, language, MCP benefit score)
Filters to canonical selected tasks
Produces three output files

Output Files (in `./eval_reports/`)

eval_report.json — Full structured data (all tasks, metrics, configs)
REPORT.md — Human-readable markdown with tables:
- Run Inventory
- Aggregate Performance (mean reward, pass rate by config)
- Per-Benchmark Breakdown (reward matrix: benchmark x config)
- Efficiency (tokens, wall clock, cost)
- Tool Utilization (MCP vs local tool calls)
- SWE-Bench Pro Partial Scores
- Performance by SDLC Phase
- Performance by Language
- Performance by MCP Benefit Score
CSV files — One per table for downstream analysis

Steps

First, show the user what runs are available:

echo "=== Completed runs ===" && \
ls runs/official/ 2>/dev/null && \
echo "" && \
echo "=== Task counts per run ===" && \
for run in runs/official/*/; do \
    count=$(find "$run" -name "result.json" -path "*/instance_*" -o -name "result.json" -path "*__*" 2>/dev/null | wc -l); \
    echo "  $(basename $run): $count tasks with results"; \
done

Generate the report:

cd ~/CodeScaleBench && \
python3 scripts/generate_eval_report.py \
    --runs-dir runs/official/ \
    --output-dir ./eval_reports/ \
    --selected-tasks ./configs/selected_benchmark_tasks.json

Display the REPORT.md summary to the user:

cat ./eval_reports/REPORT.md

Let the user know where to find the full data:

Report files written to ./eval_reports/:
  - REPORT.md (summary tables)
  - eval_report.json (full structured data)
  - *.csv (per-table CSV files)

Options

If the user asks for a report on a subset of runs or a specific directory, pass --runs-dir accordingly:

python3 scripts/generate_eval_report.py \
    --runs-dir /path/to/specific/runs/ \
    --output-dir ./eval_reports/

To skip CSV generation:

python3 scripts/generate_eval_report.py --no-csv

To skip task selection filtering (include ALL discovered tasks, not just canonical):

python3 scripts/generate_eval_report.py --selected-tasks ""

related-skills.json

نفس المستودع

archive-run.md

from "sourcegraph/CodeScaleBench"

Archive old completed benchmark runs to save disk space and speed up scans. Triggers on archive runs, clean up runs, disk space, old runs.

2026-03-1725

benchmark-audit.md

from "sourcegraph/CodeScaleBench"

Audit benchmark suites against ABC framework (Task/Outcome/Reporting validity). Checks instruction quality, verifier correctness, reproducibility. Triggers on benchmark audit, audit benchmark, abc audit, task validity.

2026-03-1725

check-infra.md

from "sourcegraph/CodeScaleBench"

Verify infrastructure readiness before launching benchmark runs — tokens, Docker, disk, credentials. Triggers on check infra, infrastructure check, ready to run, pre-run check.

2026-03-1725

compare-configs.md

from "sourcegraph/CodeScaleBench"

Compare benchmark results across agent configurations (baseline, SG_full). Show where configs diverge. Triggers on compare configs, config comparison, which config wins, MCP impact.

2026-03-1725

cost-report.md

from "sourcegraph/CodeScaleBench"

Token and cost analysis per run, suite, and config. Shows most expensive tasks and config cost comparison. Triggers on cost report, how much did it cost, token usage, spending.

2026-03-1725

ir-analysis.md

from "sourcegraph/CodeScaleBench"

Compute information retrieval quality metrics (precision, recall, MRR, nDCG, MAP) comparing file retrieval across baseline and MCP configs against ground truth. Triggers on ir analysis, retrieval metrics, file recall, ground truth, search quality.

2026-03-1725

package.json

"author": "sourcegraph"

"repository": "sourcegraph/CodeScaleBench"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	generate-report
description	Generate the aggregate CSB evaluation report from completed Harbor runs. Triggers on generate report, eval report, ccb report, benchmark report.
user-invocable	true

Generate CSB Evaluation Report

Generate the aggregate CodeScaleBench evaluation report from completed Harbor runs in runs/official/.

What This Does

Runs scripts/generate_eval_report.py which:

Discovers all completed runs in the official runs directory
Extracts metrics from each task's result.json (and fallback sources)
Enriches with selection metadata (SDLC phase, language, MCP benefit score)
Filters to canonical selected tasks
Produces three output files

Output Files (in `./eval_reports/`)

eval_report.json — Full structured data (all tasks, metrics, configs)
REPORT.md — Human-readable markdown with tables:
- Run Inventory
- Aggregate Performance (mean reward, pass rate by config)
- Per-Benchmark Breakdown (reward matrix: benchmark x config)
- Efficiency (tokens, wall clock, cost)
- Tool Utilization (MCP vs local tool calls)
- SWE-Bench Pro Partial Scores
- Performance by SDLC Phase
- Performance by Language
- Performance by MCP Benefit Score
CSV files — One per table for downstream analysis

Steps

First, show the user what runs are available:

echo "=== Completed runs ===" && \
ls runs/official/ 2>/dev/null && \
echo "" && \
echo "=== Task counts per run ===" && \
for run in runs/official/*/; do \
    count=$(find "$run" -name "result.json" -path "*/instance_*" -o -name "result.json" -path "*__*" 2>/dev/null | wc -l); \
    echo "  $(basename $run): $count tasks with results"; \
done

Generate the report:

cd ~/CodeScaleBench && \
python3 scripts/generate_eval_report.py \
    --runs-dir runs/official/ \
    --output-dir ./eval_reports/ \
    --selected-tasks ./configs/selected_benchmark_tasks.json

Display the REPORT.md summary to the user:

cat ./eval_reports/REPORT.md

Let the user know where to find the full data:

Report files written to ./eval_reports/:
  - REPORT.md (summary tables)
  - eval_report.json (full structured data)
  - *.csv (per-table CSV files)

Options

If the user asks for a report on a subset of runs or a specific directory, pass --runs-dir accordingly:

python3 scripts/generate_eval_report.py \
    --runs-dir /path/to/specific/runs/ \
    --output-dir ./eval_reports/

To skip CSV generation:

python3 scripts/generate_eval_report.py --no-csv

To skip task selection filtering (include ALL discovered tasks, not just canonical):

python3 scripts/generate_eval_report.py --selected-tasks ""

generate-report

Generate CSB Evaluation Report

What This Does

Output Files (in ./eval_reports/)

Steps

Options

المزيد من هذا المستودع

المزيد من هذا المستودع

Generate CSB Evaluation Report

What This Does

Output Files (in ./eval_reports/)

Steps

Options

Output Files (in `./eval_reports/`)

Output Files (in `./eval_reports/`)