Skip to main content
Run any Skill in Manus
with one click

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks

Overview

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks

Install command
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill agent-evaluation

Copy and paste this command into Claude Code to install the skill

Stars39,610
Forks6,420
UpdatedApril 13, 2026 at 22:14
SKILL.md
readonly