Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

ai-agent-evaluation

// Comprehensive evaluation patterns for AI agents including multi-turn conversation testing, LLM-as-judge frameworks, benchmark suites, regression detection, and systematic eval pipelines for measuring agent quality and safety.

$ git log --oneline --stat
stars:١٢٧
forks:٩
updated:١٤ أبريل ٢٠٢٦ في ٠٧:٥٩
SKILL.md
readonly