Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

agentic-eval-first-development

// Architect, execute, and iterate on AI evaluations using the Data-Task-Score framework. Treats evals as the modern, quantifiable version of a PRD. Use when the user asks to "build an eval," "improve model quality," "test an agent workflow," "quantify product intuition," "move beyond vibe checks," "measure AI output," "score LLM responses," "benchmark a prompt," or "set up evaluation infrastructure." Also triggers on phrases like "how do I know if this is working," "is the model getting better," or "eval-driven development."

$ git log --oneline --stat
stars:٤
forks:١
updated:٢ مايو ٢٠٢٦ في ١٧:٢٨
مستكشف الملفات
4 ملفات
SKILL.md
readonly