Skip to main content
在 Manus 中运行任何 Skill
一键导入

evaluation-harness

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model benchmarking".

概览

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model benchmarking".

安装命令
npx skills add https://github.com/patricio0312rev/skillset --skill evaluation-harness

复制此命令并粘贴到 Claude Code 中以安装该技能

星标5
分支0
更新时间2025年12月31日 05:05
SKILL.md
readonly