Skip to main content
Run any Skill in Manus
with one click

agent-evaluation

Tests and benchmarks LLM agents covering behavioral testing, capability assessment, reliability metrics, and production monitoring. Use when evaluating agent quality, designing eval suites, building regression tests, or measuring real-world reliability beyond benchmark scores.

Stars11
Forks1
UpdatedMay 25, 2026 at 13:43
File Explorer
11 files
SKILL.md
readonly