Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

evaluation-testing

Use this skill to design and execute evaluation frameworks for LLM agents, implement trajectory testing, deploy LLM-as-judge patterns, build automated eval pipelines, and integrate agent testing into CI/CD workflows. This skill enforces: structured behavioral assertions, trajectory-vs-outcome evaluation matrices, verifier agent topologies, regression detection baselines, hallucination scoring engines, and benchmark dataset lifecycle management. Do NOT use for: unit testing traditional software, load/performance testing infrastructure, or model fine-tuning data preparation.

النجوم٧
التفرعات٠
آخر تحديث٥ يونيو ٢٠٢٦ في ٠٩:٠٢
مستكشف الملفات
9 ملفات
SKILL.md
readonly