Skip to main content
Run any Skill in Manus
with one click

llm-judge-alignment

// Use this skill when a developer wants to validate how well their LLM judge aligns with human judgment. Triggers on: "validate my LLM judge", "check if my judge is accurate", "my judge scores don't match human ratings", "calibrate my evaluator", "how reliable is my judge", "measure judge alignment", "test my eval", "check my judge against human labels", "is my judge any good", "validate my evaluator", "my judge is too strict", "my judge keeps missing failures". Takes a judge prompt and human-labeled examples, measures pass agreement rate and failure catch rate, identifies directional bias (too lenient or too strict), and walks the dev through targeted fixes until alignment meets a reliable threshold.

$ git log --oneline --stat
stars:14
forks:2
updated:April 23, 2026 at 14:31
SKILL.md
readonly