Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

ml-model-eval-benchmark

Estrellas1

Forks1

Actualizado13 de marzo de 2026, 04:04

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Instalación

Instalar con Codex o Claude Copia este prompt, pégalo en Codex, Claude u otro asistente, y deja que revise la página de la skill y la instale por ti.

Ejecutar en Manus

Fuente

LJT-520

LJT-520/openClaw-backup

Abrir repositorio de GitHub Ver repositorios del creador

Descarga

Ejecutar en Manus

Ocupaciones relacionadasSOC

Basado en la clasificación ocupacional SOC

Científicos de datosOcupaciones informáticas y matemáticas·SOC 15-2051

Explorador de archivos

5 archivos

SKILL.md

readonly

name	ml-model-eval-benchmark
description	Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

Define metric weights and accepted metric ranges.
Ingest model metrics for each candidate.
Compute weighted score and ranking.
Export leaderboard and promotion recommendation.

Use Bundled Resources

Run scripts/benchmark_models.py to generate benchmark outputs.
Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

Keep metric names and scales consistent across candidates.
Record weighting assumptions in the output.

Más de este repositorio

mismo repositorio

cn-economy-news

LJT-520/openClaw-backup

获取中国经济资讯。仅从官方权威媒体（中国政府网、新华网、人民网、国家统计局、央视财经、中国经济网）抓取高质量经济新闻，自动过滤广告和低质量内容。触发词：中国经济、经济资讯、经济新闻、财经新闻、经济政策、宏观经济、GDP、CPI、PMI、货币政策、财政政策、今日财经。

2026-03-181

ai-agent-helper

LJT-520/openClaw-backup

AI Agent 設定同優化助手 - Prompt Engineering、Task Decomposition、Agent Loop設計

2026-03-151

conatus

LJT-520/openClaw-backup

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your agent a soul.

2026-03-131

0protocol

LJT-520/openClaw-backup

Agents can sign plugins, rotate credentials without losing identity, and publicly attest to behavior.

2026-03-131

agentic-mcp-server-builder

LJT-520/openClaw-backup

Scaffold MCP server projects and baseline tool contract checks. Use for defining tool schemas, generating starter server layouts, and validating MCP-ready structure.

2026-03-131

agentic-workflow-automation

LJT-520/openClaw-backup

Generate reusable multi-step agent workflow blueprints. Use for trigger/action orchestration, deterministic workflow definitions, and automation handoff artifacts.

2026-03-131

Ejecuta cualquier Skill con un clic