Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة
مستودع GitHub

agentic-usability

يحتوي agentic-usability على 10 من skills المجمعة من PSPDFKit-labs، مع تغطية مهنية على مستوى المستودع وصفحات skill داخل الموقع.

skills مجمعة
10
Stars
19
محدث
2026-05-14
Forks
0
التغطية المهنية
3 فئات مهنية · 100% مصنفة
مستكشف المستودعات

Skills في هذا المستودع

init
مطوّرو البرمجيات

Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.

2026-05-14
sandbox
مديرو الشبكات وأنظمة الحاسوب

Launch an interactive shell inside a microsandbox for debugging. Supports bare mode, executor setup, or judge setup with optional test case scaffolding.

2026-05-14
eval
محللو ضمان جودة البرمجيات والمختبرون

Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.

2026-04-27
execute
محللو ضمان جودة البرمجيات والمختبرون

Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.

2026-04-27
export
مطوّرو البرمجيات

Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.

2026-04-27
generate
محللو ضمان جودة البرمجيات والمختبرون

Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.

2026-04-27
insights
مطوّرو البرمجيات

Analyze benchmark results and identify SDK improvement areas. Use when reviewing evaluation results, finding failure patterns, identifying documentation gaps, or understanding API design issues.

2026-04-27
inspect
مطوّرو البرمجيات

Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline.

2026-04-27
judge
محللو ضمان جودة البرمجيات والمختبرون

Have an LLM judge compare reference and generated solutions, scoring on API discovery, correctness, completeness, and functional correctness.

2026-04-27
report
محللو ضمان جودة البرمجيات والمختبرون

Display a terminal scorecard of benchmark results showing pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or wants to see how their SDK performed.

2026-04-27