Skip to main content
Run any Skill in Manus
with one click

ai-agent-evaluation

Stars161
Forks16
UpdatedApril 14, 2026 at 07:59

Comprehensive evaluation patterns for AI agents including multi-turn conversation testing, LLM-as-judge frameworks, benchmark suites, regression detection, and systematic eval pipelines for measuring agent quality and safety.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

SKILL.md
readonly