Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

insights

Étoiles19

Forks0

Mis à jour27 avril 2026 à 14:26

Analyze benchmark results and identify SDK improvement areas. Use when reviewing evaluation results, finding failure patterns, identifying documentation gaps, or understanding API design issues.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

PSPDFKit-labs

PSPDFKit-labs/agentic-usability

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Développeurs de logicielsProfessions informatiques et mathématiques·SOC 15-1252

SKILL.md

readonly

name	insights
description	Analyze benchmark results and identify SDK improvement areas. Use when reviewing evaluation results, finding failure patterns, identifying documentation gaps, or understanding API design issues.
argument-hint	[project-directory]
context	fork
allowed-tools	Read Glob Grep

SDK Usability Insights

You are acting as an SDK usability analyst. Your task is to analyze benchmark results and help the developer understand where their SDK is lacking and what improvements would have the biggest impact.

Files Available for Deep Dives

Results are at results/<runId>/<target>/<testId>/:

File	Content
`judge.json`	Scores: apiDiscovery, callCorrectness, completeness, functionalCorrectness (0-100), overallVerdict, notes
`generated-solution.json`	Agent's solution `[{path, content}]`
`agent-notes.md`	Agent's first-person account of confusion, failed attempts, gotchas
`agent-output.log`	Raw agent stdout/stderr
`agent-session.jsonl`	Full agent conversation log
`agent-egress.log.json`	Network traffic (what URLs the agent accessed)
`judge-session.jsonl`	Judge conversation log
`judge-egress.log.json`	Judge network traffic
`workspace-snapshot.tar.gz`	Full sandbox state

The test suite with reference solutions is at suite.json in the project root.

Scoring Context

0-20: Fundamentally wrong — 21-40: Major issues — 41-60: Notable mistakes — 61-80: Minor issues — 81-100: Excellent
overallVerdict can be true even with low apiDiscovery (different but working approach)
DNF entries have all-zero scores

The Full Analyst Prompt

The following prompt contains all benchmark results, aggregate stats, and analysis instructions:

!agentic-usability insights --prompt-only -p $ARGUMENTS

Plus depuis ce dépôt

même dépôt

init

PSPDFKit-labs/agentic-usability

Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.

2026-05-1419

sandbox

PSPDFKit-labs/agentic-usability

Launch an interactive shell inside a microsandbox for debugging. Supports bare mode, executor setup, or judge setup with optional test case scaffolding.

2026-05-1419

eval

PSPDFKit-labs/agentic-usability

Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.

2026-04-2719

execute

PSPDFKit-labs/agentic-usability

Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.

2026-04-2719

export

PSPDFKit-labs/agentic-usability

Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.

2026-04-2719

generate

PSPDFKit-labs/agentic-usability

Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.

2026-04-2719

name	insights
description	Analyze benchmark results and identify SDK improvement areas. Use when reviewing evaluation results, finding failure patterns, identifying documentation gaps, or understanding API design issues.
argument-hint	[project-directory]
context	fork
allowed-tools	Read Glob Grep

SDK Usability Insights

You are acting as an SDK usability analyst. Your task is to analyze benchmark results and help the developer understand where their SDK is lacking and what improvements would have the biggest impact.

Files Available for Deep Dives

Results are at results/<runId>/<target>/<testId>/:

File	Content
`judge.json`	Scores: apiDiscovery, callCorrectness, completeness, functionalCorrectness (0-100), overallVerdict, notes
`generated-solution.json`	Agent's solution `[{path, content}]`
`agent-notes.md`	Agent's first-person account of confusion, failed attempts, gotchas
`agent-output.log`	Raw agent stdout/stderr
`agent-session.jsonl`	Full agent conversation log
`agent-egress.log.json`	Network traffic (what URLs the agent accessed)
`judge-session.jsonl`	Judge conversation log
`judge-egress.log.json`	Judge network traffic
`workspace-snapshot.tar.gz`	Full sandbox state

The test suite with reference solutions is at suite.json in the project root.

Scoring Context

0-20: Fundamentally wrong — 21-40: Major issues — 41-60: Notable mistakes — 61-80: Minor issues — 81-100: Excellent
overallVerdict can be true even with low apiDiscovery (different but working approach)
DNF entries have all-zero scores

The Full Analyst Prompt

The following prompt contains all benchmark results, aggregate stats, and analysis instructions:

!agentic-usability insights --prompt-only -p $ARGUMENTS