Skip to main content
Manusで任意のスキルを実行
ワンクリックで

openjudge

// Build custom LLM evaluation pipelines using the OpenJudge framework. Covers selecting and configuring graders (LLM-based, function-based, agentic), running batch evaluations with GradingRunner, combining scores with aggregators, applying evaluation strategies (voting, average), auto-generating graders from data, and analyzing results (pairwise win rates, statistics, validation metrics). Use when the user wants to evaluate LLM outputs, compare multiple models, design scoring criteria, or build an automated evaluation system.

$ git log --oneline --stat
stars:619
forks:52
updated:2026年3月10日 09:06
ファイルエクスプローラー
5 ファイル
SKILL.md
readonly