Skip to main content
Run any Skill in Manus
with one click

eval-harness

// Formal evaluation framework for implementing eval-driven development (EDD) principles. Define pass/fail criteria, measure reliability with pass@k metrics, create regression test suites, and benchmark performance. TRIGGER when: user wants to define success criteria for code, set up evaluation benchmarks, or as Phase 3 of the AI engineering pipeline (after blueprint and ADR, before self-improve).

$ git log --oneline --stat
stars:5
forks:0
updated:May 6, 2026 at 10:26
SKILL.md
readonly