Skip to main content
Run any Skill in Manus
with one click

eval-driven-dev

Improve AI application with evaluation-driven development. Define eval criteria, instrument the application, build golden datasets, observe and evaluate application runs, analyze results, and produce a concrete action plan for improvements. ALWAYS USE THIS SKILL when the user asks to set up QA, add tests, add evals, evaluate, benchmark, fix wrong behaviors, improve quality, or do quality assurance for any Python project that calls an LLM model.

Overview

Improve AI application with evaluation-driven development. Define eval criteria, instrument the application, build golden datasets, observe and evaluate application runs, analyze results, and produce a concrete action plan for improvements. ALWAYS USE THIS SKILL when the user asks to set up QA, add tests, add evals, evaluate, benchmark, fix wrong behaviors, improve quality, or do quality assurance for any Python project that calls an LLM model.

Install command
npx skills add https://github.com/github/awesome-copilot --skill eval-driven-dev

Copy and paste this command into Claude Code to install the skill

Stars34,384
Forks4,211
UpdatedApril 28, 2026 at 01:27
File Explorer
19 files
SKILL.md
readonly