Run any Skill in Manus with one click

ml-model-eval-benchmark

Stars1

Forks1

UpdatedMarch 13, 2026 at 04:04

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

LJT-520

LJT-520/openClaw-backup

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Data ScientistsComputer and Mathematical Occupations·SOC 15-2051

File Explorer

5 files

SKILL.md

readonly

name	ml-model-eval-benchmark
description	Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

Define metric weights and accepted metric ranges.
Ingest model metrics for each candidate.
Compute weighted score and ranking.
Export leaderboard and promotion recommendation.

Use Bundled Resources

Run scripts/benchmark_models.py to generate benchmark outputs.
Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

Keep metric names and scales consistent across candidates.
Record weighting assumptions in the output.

ml-model-eval-benchmark

ML Model Eval Benchmark

Overview

Workflow

Use Bundled Resources

Guardrails

More from this repository

More from this repository

ML Model Eval Benchmark

Overview

Workflow

Use Bundled Resources

Guardrails