Skip to main content
Run any Skill in Manus
with one click

evaluation-harness

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model benchmarking".

Overview

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model benchmarking".

Install command
npx skills add https://github.com/patricio0312rev/skillset --skill evaluation-harness

Copy and paste this command into Claude Code to install the skill

Stars5
Forks0
UpdatedDecember 31, 2025 at 05:05
SKILL.md
readonly