Skip to main content
Manus에서 모든 스킬 실행
원클릭으로
GitHub 저장소

agentic-usability

agentic-usability에는 PSPDFKit-labs에서 수집한 skills 10개가 있으며, 저장소 수준 직업 범위와 사이트 내 skill 상세 페이지를 제공합니다.

수집된 skills
10
Stars
19
업데이트
2026-05-14
Forks
0
직업 범위
직업 카테고리 3개 · 100% 분류됨
저장소 탐색

이 저장소의 skills

init
소프트웨어 개발자

Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.

2026-05-14
sandbox
네트워크·컴퓨터 시스템 관리자

Launch an interactive shell inside a microsandbox for debugging. Supports bare mode, executor setup, or judge setup with optional test case scaffolding.

2026-05-14
eval
소프트웨어 품질 보증 분석가·테스터

Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.

2026-04-27
execute
소프트웨어 품질 보증 분석가·테스터

Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.

2026-04-27
export
소프트웨어 개발자

Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.

2026-04-27
generate
소프트웨어 품질 보증 분석가·테스터

Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.

2026-04-27
insights
소프트웨어 개발자

Analyze benchmark results and identify SDK improvement areas. Use when reviewing evaluation results, finding failure patterns, identifying documentation gaps, or understanding API design issues.

2026-04-27
inspect
소프트웨어 개발자

Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline.

2026-04-27
judge
소프트웨어 품질 보증 분석가·테스터

Have an LLM judge compare reference and generated solutions, scoring on API discovery, correctness, completeness, and functional correctness.

2026-04-27
report
소프트웨어 품질 보증 분석가·테스터

Display a terminal scorecard of benchmark results showing pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or wants to see how their SDK performed.

2026-04-27