Skip to main content
Manus에서 모든 스킬 실행
원클릭으로

evaluation-harness

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model benchmarking".

개요

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model benchmarking".

설치 명령
npx skills add https://github.com/patricio0312rev/skillset --skill evaluation-harness

이 명령을 Claude Code에 복사하여 붙여넣어 스킬을 설치하세요

스타5
포크0
업데이트2025년 12월 31일 05:05
SKILL.md
readonly