agentic-bench

agentic-bench에는 nyosegawa에서 수집한 skills 4개가 있으며, 저장소 수준 직업 범위와 사이트 내 skill 상세 페이지를 제공합니다.

nyosegawa 프로필 GitHub에서 보기

수집된 skills

Stars

업데이트

2026-02-21

Forks

직업 범위

데이터 과학자

직업 카테고리 1개 · 100% 분류됨

저장소 탐색

이 저장소의 skills

제작자/저장소/skill

skill

직업 분류

설명

업데이트

eval-reporter

데이터 과학자

Generate HTML reports and structured metrics from model evaluation results. Creates publication-ready reports with embedded outputs (images, audio, charts) and metrics.json for cross-model comparison. Use when generating reports, writing metrics, creating evaluation summaries, or formatting benchmark results. Triggers on "generate report", "write metrics", "create report", "evaluation summary", "benchmark results", "format results".

2026-02-21

gpu-runner

데이터 과학자

Execute model inference on GPU cloud providers. Handles code generation, deployment, execution, and result collection across HF Inference API/Endpoints, Colab, Modal, beam.cloud, Vast.ai, and RunPod. Use when running models on GPU, deploying to cloud, executing notebooks, or troubleshooting GPU execution failures. Triggers on "run on GPU", "execute model", "deploy to modal", "colab notebook", "beam deploy", "HF inference", "HF endpoints", "vast", "runpod".

2026-02-21

model-researcher

데이터 과학자

Investigate model specifications, requirements, and evaluation strategy. Use when researching a model before benchmarking: reading HuggingFace model cards, estimating VRAM requirements, selecting GPU providers, and determining evaluation approach. Triggers on "model research", "investigate model", "model info", "VRAM estimate", "which provider", "model card".

2026-02-21

agentic-bench

데이터 과학자

Autonomous model validation and benchmarking. Investigates any ML model (LLM, image gen, TTS, time series, etc.), runs it on GPU cloud, evaluates quality and performance, and generates HTML reports. Use when user asks to verify, benchmark, evaluate, or test a model. Triggers on "verify model", "benchmark", "evaluate model", "test model", "run benchmark", "model evaluation", "モデルを検証", "ベンチマーク", "モデルを試して".

2026-02-21