Skip to main content
在 Manus 中运行任何 Skill
一键导入
GitHub 仓库

agentic-bench

agentic-bench 收录了来自 nyosegawa 的 4 个 skills,并提供仓库级职业覆盖和站内 skill 详情页。

已收集 skills
4
Stars
5
更新
2026-02-21
Forks
0
职业覆盖
1 个职业分类 · 已分类 100%
仓库浏览

这个仓库中的 skills

eval-reporter
数据科学家

Generate HTML reports and structured metrics from model evaluation results. Creates publication-ready reports with embedded outputs (images, audio, charts) and metrics.json for cross-model comparison. Use when generating reports, writing metrics, creating evaluation summaries, or formatting benchmark results. Triggers on "generate report", "write metrics", "create report", "evaluation summary", "benchmark results", "format results".

2026-02-21
gpu-runner
数据科学家

Execute model inference on GPU cloud providers. Handles code generation, deployment, execution, and result collection across HF Inference API/Endpoints, Colab, Modal, beam.cloud, Vast.ai, and RunPod. Use when running models on GPU, deploying to cloud, executing notebooks, or troubleshooting GPU execution failures. Triggers on "run on GPU", "execute model", "deploy to modal", "colab notebook", "beam deploy", "HF inference", "HF endpoints", "vast", "runpod".

2026-02-21
model-researcher
数据科学家

Investigate model specifications, requirements, and evaluation strategy. Use when researching a model before benchmarking: reading HuggingFace model cards, estimating VRAM requirements, selecting GPU providers, and determining evaluation approach. Triggers on "model research", "investigate model", "model info", "VRAM estimate", "which provider", "model card".

2026-02-21
agentic-bench
数据科学家

Autonomous model validation and benchmarking. Investigates any ML model (LLM, image gen, TTS, time series, etc.), runs it on GPU cloud, evaluates quality and performance, and generates HTML reports. Use when user asks to verify, benchmark, evaluate, or test a model. Triggers on "verify model", "benchmark", "evaluate model", "test model", "run benchmark", "model evaluation", "モデルを検証", "ベンチマーク", "モデルを試して".

2026-02-21