Skip to main content
在 Manus 中运行任何 Skill
一键导入
GitHub 仓库

agentic-usability

agentic-usability 收录了来自 PSPDFKit-labs 的 10 个 skills,并提供仓库级职业覆盖和站内 skill 详情页。

已收集 skills
10
Stars
19
更新
2026-05-14
Forks
0
职业覆盖
3 个职业分类 · 已分类 100%
仓库浏览

这个仓库中的 skills

init
软件开发工程师

Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.

2026-05-14
sandbox
网络与计算机系统管理员

Launch an interactive shell inside a microsandbox for debugging. Supports bare mode, executor setup, or judge setup with optional test case scaffolding.

2026-05-14
eval
软件质量保证分析师与测试员

Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.

2026-04-27
execute
软件质量保证分析师与测试员

Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.

2026-04-27
export
软件开发工程师

Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.

2026-04-27
generate
软件质量保证分析师与测试员

Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.

2026-04-27
insights
软件开发工程师

Analyze benchmark results and identify SDK improvement areas. Use when reviewing evaluation results, finding failure patterns, identifying documentation gaps, or understanding API design issues.

2026-04-27
inspect
软件开发工程师

Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline.

2026-04-27
judge
软件质量保证分析师与测试员

Have an LLM judge compare reference and generated solutions, scoring on API discovery, correctness, completeness, and functional correctness.

2026-04-27
report
软件质量保证分析师与测试员

Display a terminal scorecard of benchmark results showing pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or wants to see how their SDK performed.

2026-04-27