Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

sv-report

Name: Sv Report
Author: intertwine

// Generate SV-Bench metrics reports (summary.json + report.md) for E1/E2 runs, validate metrics contracts, and produce comparison-friendly artifacts from outputs/evals/.

In Manus ausführen

$ git log --oneline --stat

stars:3

forks:0

updated:4. Februar 2026 um 02:05

SKILL.md

readonly

name	sv-report
description	Generate SV-Bench metrics reports (summary.json + report.md) for E1/E2 runs, validate metrics contracts, and produce comparison-friendly artifacts from outputs/evals/.
metadata	{"author":"security-verifiers","version":"1.0"}

SV-Bench Reporting (E1/E2)

Generate the WP1 report artifacts for evaluation runs:

summary.json (schema: bench/schemas/summary.schema.json)
report.md (human-readable)

This skill is for report generation/validation, not running new evals (use sv-eval for that).

Prereqs

Use the repo venv (.venv/) or your preferred runner.
If you want to avoid any network calls during report generation, set:
- WEAVE_DISABLED=true

Per-run report (single directory)

WEAVE_DISABLED=true .venv/bin/svbench_report --env e1 --input outputs/evals/sv-env-network-logs--gpt-5-mini/<run_id> --strict
WEAVE_DISABLED=true .venv/bin/svbench_report --env e2 --input outputs/evals/sv-env-config-verification--gpt-5-mini/<run_id> --strict

Outputs are written into the same run directory:

outputs/evals/.../<run_id>/summary.json
outputs/evals/.../<run_id>/report.md

Batch-generate reports for many runs

Generate reports for all non-archived runs under outputs/evals/:

.venv/bin/python scripts/generate_svbench_reports.py

Only E1:

.venv/bin/python scripts/generate_svbench_reports.py --env e1 --strict

Only E2:

.venv/bin/python scripts/generate_svbench_reports.py --env e2 --strict

Specific run ids:

.venv/bin/python scripts/generate_svbench_reports.py --run-ids d4e7f897 cb97305e

Comparison reports (across runs)

The Make targets produce comparison-friendly JSON across runs:

make report-network-logs
make report-config-verification

These are intended for quick comparisons / dashboards. The contract-grade per-run artifacts are generated via bench.report / svbench_report.

related-skills.json

gleiches Repository

sv-eval.md

from "intertwine/security-verifiers"

Run and analyze Security Verifiers evaluations. Use when asked to evaluate models on E1 (network-logs) or E2 (config-verification), generate metrics reports, compare model performance, or analyze eval results.

2026-02-043

sv-deploy.md

from "intertwine/security-verifiers"

Deploy Security Verifiers environments and packages. Use when asked to deploy to Prime Intellect Environments Hub, publish to PyPI, bump versions, build wheels, or manage releases.

2026-01-243

sv-data.md

from "intertwine/security-verifiers"

Build and manage Security Verifiers datasets. Use when asked to build E1 or E2 datasets, create test fixtures, validate data, or manage dataset files for network-logs or config-verification environments.

2026-01-243

sv-dev.md

from "intertwine/security-verifiers"

Development workflow for Security Verifiers. Use when asked to run tests, lint code, format files, set up the development environment, or perform CI checks on the codebase.

2026-01-243

sv-hf.md

from "intertwine/security-verifiers"

Manage HuggingFace datasets for Security Verifiers. Use when asked to push datasets to HuggingFace, manage metadata, configure gated access, or set up user HF repositories for E1/E2 datasets.

2026-01-243

package.json

"author": "intertwine"

"repository": "intertwine/security-verifiers"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

Softwarequalitätssicherungsanalysten und -testerInformatik- und Mathematikberufe15-1253L4

name	sv-report
description	Generate SV-Bench metrics reports (summary.json + report.md) for E1/E2 runs, validate metrics contracts, and produce comparison-friendly artifacts from outputs/evals/.
metadata	{"author":"security-verifiers","version":"1.0"}

SV-Bench Reporting (E1/E2)

Generate the WP1 report artifacts for evaluation runs:

summary.json (schema: bench/schemas/summary.schema.json)
report.md (human-readable)

This skill is for report generation/validation, not running new evals (use sv-eval for that).

Prereqs

Use the repo venv (.venv/) or your preferred runner.
If you want to avoid any network calls during report generation, set:
- WEAVE_DISABLED=true

Per-run report (single directory)

WEAVE_DISABLED=true .venv/bin/svbench_report --env e1 --input outputs/evals/sv-env-network-logs--gpt-5-mini/<run_id> --strict
WEAVE_DISABLED=true .venv/bin/svbench_report --env e2 --input outputs/evals/sv-env-config-verification--gpt-5-mini/<run_id> --strict

Outputs are written into the same run directory:

outputs/evals/.../<run_id>/summary.json
outputs/evals/.../<run_id>/report.md

Batch-generate reports for many runs

Generate reports for all non-archived runs under outputs/evals/:

.venv/bin/python scripts/generate_svbench_reports.py

Only E1:

.venv/bin/python scripts/generate_svbench_reports.py --env e1 --strict

Only E2:

.venv/bin/python scripts/generate_svbench_reports.py --env e2 --strict

Specific run ids:

.venv/bin/python scripts/generate_svbench_reports.py --run-ids d4e7f897 cb97305e

Comparison reports (across runs)

The Make targets produce comparison-friendly JSON across runs:

make report-network-logs
make report-config-verification

These are intended for quick comparisons / dashboards. The contract-grade per-run artifacts are generated via bench.report / svbench_report.

sv-report

SV-Bench Reporting (E1/E2)

Prereqs

Per-run report (single directory)

Batch-generate reports for many runs

Comparison reports (across runs)

Mehr aus diesem Repository

SV-Bench Reporting (E1/E2)

Prereqs

Per-run report (single directory)

Batch-generate reports for many runs

Comparison reports (across runs)

Mehr aus diesem Repository