تشغيل أي مهارة في Manus بنقرة واحدة

report

النجوم١٩

التفرعات٠

آخر تحديث٢٧ أبريل ٢٠٢٦ في ١٤:٢٦

Display a terminal scorecard of benchmark results showing pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or wants to see how their SDK performed.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

PSPDFKit-labs

PSPDFKit-labs/agentic-usability

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات·SOC 15-1253

SKILL.md

readonly

name	report
description	Display a terminal scorecard of benchmark results showing pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or wants to see how their SDK performed.
argument-hint	[project-directory] [--json] [--run runId]
allowed-tools	Bash(agentic-usability *) Read Glob

Benchmark Report

Display the benchmark scorecard for the pipeline.

agentic-usability report -p $ARGUMENTS

Options

--json: Output raw structured JSON instead of the colored table
--run <runId>: Show results for a specific run (default: latest)

Where Results Live

results/<runId>/
  report.json                        # Aggregate scorecard
  <targetName>/<testId>/
    judge.json                       # Per-test judge scores
    generated-solution.json          # Agent's solution
    agent-notes.md                   # Agent's working notes

Scoring Dimensions

Dimension	Range	What it measures
`apiDiscovery`	0-100	Found correct SDK endpoints/methods?
`callCorrectness`	0-100	API calls constructed correctly?
`completeness`	0-100	All requirements handled?
`functionalCorrectness`	0-100	Code runs and produces correct output?
`overallVerdict`	boolean	Solution works? (pass/fail)

The report aggregates these across all test cases and breaks them down by difficulty (easy/medium/hard).

Finding Runs

Runs are stored as subdirectories in results/ containing run.json:

{ "id": "run-2026-04-25T10-30-00-000Z", "createdAt": "...", "targets": [...], "testCount": 15, "label": "..." }

To list all runs, look for results/*/run.json files.

Present the results to the user. If they want deeper analysis, suggest using the insights skill.

For detailed file inventory, see pipeline-guide.md.

المزيد من هذا المستودع

نفس المستودع

init

PSPDFKit-labs/agentic-usability

Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.

2026-05-1419

sandbox

PSPDFKit-labs/agentic-usability

Launch an interactive shell inside a microsandbox for debugging. Supports bare mode, executor setup, or judge setup with optional test case scaffolding.

2026-05-1419

eval

PSPDFKit-labs/agentic-usability

Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.

2026-04-2719

execute

PSPDFKit-labs/agentic-usability

Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.

2026-04-2719

export

PSPDFKit-labs/agentic-usability

Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.

2026-04-2719

generate

PSPDFKit-labs/agentic-usability

Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.

2026-04-2719

name	report
description	Display a terminal scorecard of benchmark results showing pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or wants to see how their SDK performed.
argument-hint	[project-directory] [--json] [--run runId]
allowed-tools	Bash(agentic-usability *) Read Glob

Benchmark Report

Display the benchmark scorecard for the pipeline.

agentic-usability report -p $ARGUMENTS

Options

--json: Output raw structured JSON instead of the colored table
--run <runId>: Show results for a specific run (default: latest)

Where Results Live

results/<runId>/
  report.json                        # Aggregate scorecard
  <targetName>/<testId>/
    judge.json                       # Per-test judge scores
    generated-solution.json          # Agent's solution
    agent-notes.md                   # Agent's working notes

Scoring Dimensions

Dimension	Range	What it measures
`apiDiscovery`	0-100	Found correct SDK endpoints/methods?
`callCorrectness`	0-100	API calls constructed correctly?
`completeness`	0-100	All requirements handled?
`functionalCorrectness`	0-100	Code runs and produces correct output?
`overallVerdict`	boolean	Solution works? (pass/fail)

The report aggregates these across all test cases and breaks them down by difficulty (easy/medium/hard).

Finding Runs

Runs are stored as subdirectories in results/ containing run.json:

{ "id": "run-2026-04-25T10-30-00-000Z", "createdAt": "...", "targets": [...], "testCount": 15, "label": "..." }

To list all runs, look for results/*/run.json files.

Present the results to the user. If they want deeper analysis, suggest using the insights skill.

For detailed file inventory, see pipeline-guide.md.