Skip to main content
Jeden Skill in Manus ausführen
mit einem Klick

agent-evaluation

Tests and benchmarks LLM agents covering behavioral testing, capability assessment, reliability metrics, and production monitoring. Use when evaluating agent quality, designing eval suites, building regression tests, or measuring real-world reliability beyond benchmark scores.

Sterne11
Forks1
Aktualisiert25. Mai 2026 um 13:43
Datei-Explorer
11 Dateien
SKILL.md
readonly