Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic

llm-judge-alignment

// Use this skill when a developer wants to validate how well their LLM judge aligns with human judgment. Triggers on: "validate my LLM judge", "check if my judge is accurate", "my judge scores don't match human ratings", "calibrate my evaluator", "how reliable is my judge", "measure judge alignment", "test my eval", "check my judge against human labels", "is my judge any good", "validate my evaluator", "my judge is too strict", "my judge keeps missing failures". Takes a judge prompt and human-labeled examples, measures pass agreement rate and failure catch rate, identifies directional bias (too lenient or too strict), and walks the dev through targeted fixes until alignment meets a reliable threshold.

$ git log --oneline --stat
stars:14
forks:2
updated:23 avril 2026 à 14:31
SKILL.md
readonly