Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic

llm-evals-audit

// Use this skill when a developer wants to check whether their existing evaluations are trustworthy and well-targeted. Triggers on: "audit my evals", "are my evals any good", "review my evaluation setup", "check my LLM judges", "are my evaluations reliable", "something feels off with my eval scores", "inherited an eval system", "my evals are passing but the product feels broken", "post-build eval check", "validate my eval pipeline". Inspects existing eval artifacts — judge prompts, annotation data, issue reports, alignment scores — and produces a prioritized findings report with a concrete fix for each problem. Do NOT use this to build new evals from scratch — use llm-evals-checklist first, then llm-judge-creator.

$ git log --oneline --stat
stars:14
forks:2
updated:23 avril 2026 à 14:31
SKILL.md
readonly