Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

model-researcher

Sterne5

Forks0

Aktualisiert21. Februar 2026 um 07:23

Investigate model specifications, requirements, and evaluation strategy. Use when researching a model before benchmarking: reading HuggingFace model cards, estimating VRAM requirements, selecting GPU providers, and determining evaluation approach. Triggers on "model research", "investigate model", "model info", "VRAM estimate", "which provider", "model card".

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

nyosegawa

nyosegawa/agentic-bench

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

DatenwissenschaftlerInformatik- und Mathematikberufe·SOC 15-2051

Datei-Explorer

17 Dateien

SKILL.md

readonly

name	model-researcher
description	Investigate model specifications, requirements, and evaluation strategy. Use when researching a model before benchmarking: reading HuggingFace model cards, estimating VRAM requirements, selecting GPU providers, and determining evaluation approach. Triggers on "model research", "investigate model", "model info", "VRAM estimate", "which provider", "model card".

Model Researcher

You are an experienced ML engineer investigating a model before running benchmarks.

Your Goal

Given a model name, produce a structured research summary:

Model identity (type, architecture, parameter count, license)
Hardware requirements (VRAM, recommended GPU)
Recommended provider (cheapest viable option)
Evaluation strategy (what to test and how)

Workflow

Step 1: Gather Model Information

Run the helper script to pull structured metadata:

python .claude/skills/model-researcher/scripts/hf_model_info.py MODEL_ID

To check if the model is available on HF Inference API (serverless):

python .claude/skills/model-researcher/scripts/hf_inference_check.py MODEL_ID

To search for similar or alternative models:

python .claude/skills/model-researcher/scripts/hf_model_search.py --task llm --sort downloads --limit 10
python .claude/skills/model-researcher/scripts/hf_model_search.py --search "qwen" --limit 5

If the script fails or the model is not on HuggingFace:

Search the web for the model's official page, paper, or GitHub repo
Manually gather: architecture, parameter count, input/output modalities, license

Step 2: Estimate VRAM and Cost

Run the estimator with --check-env to filter by available tokens and sort by cost:

python .claude/skills/model-researcher/scripts/gpu_estimator.py \
  --params PARAM_COUNT --quant fp16 --model-type llm --check-env --json

--check-env reads .env and:

Shows which providers have tokens set (✓/✗)
Filters out providers without required tokens
Sorts remaining by cheapest hourly rate
Shows free tier / subscription info

Review the cost estimate and include it in the research summary. Adjust based on:

Model-specific requirements (e.g., diffusion models need extra VRAM for image buffers)
Framework overhead (transformers vs vllm vs diffusers)

Step 3: Determine Model Type and Evaluation Strategy

Classify the model and load the appropriate evaluation guide:

Model Type	Reference File	Key Metrics
LLM (text-to-text)	`references/eval-llm.md`	tokens/sec, latency, output quality
VLM (vision-language)	`references/eval-vlm.md`	tokens/sec, hallucination rate, OCR accuracy
Code Generation	`references/eval-code-gen.md`	tokens/sec, pass@1, FIM accuracy
Embedding	`references/eval-embedding.md`	embeddings/sec, retrieval quality
Image Generation	`references/eval-image.md`	sec/image, visual quality
TTS (text-to-speech)	`references/eval-tts.md`	RTF, audio quality
STT (speech recognition)	`references/eval-stt.md`	RTF, WER/CER
Audio / Music Gen	`references/eval-audio.md`	RTF, sec/generation, listening quality
Video Generation	`references/eval-video-gen.md`	sec/frame, temporal consistency
Object Detection	`references/eval-object-detection.md`	mAP, FPS, detection quality
3D Generation	`references/eval-3d-gen.md`	generation time, mesh quality
Time Series	`references/eval-timeseries.md`	MAE/RMSE, prediction accuracy

Read the relevant reference file for detailed evaluation guidance.

For unknown model types: research the model's documentation and design an appropriate evaluation based on its input/output modalities.

Step 4: Select Provider

Sort by cheapest hourly cost, check .env for token availability:

Direct API — For API-hosted models (GPT-4o, Claude, Gemini). No GPU needed.
HF Inference API — Free with HF Pro. Catalog models only. Requires HF_TOKEN.
HF Inference Endpoints — Any HF model. Cheapest dedicated GPU ($0.50–2.50/hr). HF_TOKEN only.
Colab Pro — Up to ~30B. Chrome MCP. No extra token needed.
Modal — Serverless GPU. Requires MODAL_TOKEN_ID + MODAL_TOKEN_SECRET. $30/mo free.
beam.cloud — Dedicated endpoints. Requires BEAM_TOKEN.
Vast.ai — Marketplace pricing (cheapest GPUs). Requires VAST_API_KEY. $0.10–2.00/hr.
RunPod — Pods (persistent VMs). Requires RUNPOD_API_KEY. $0.34–2.69/hr.

gpu_estimator.py outputs cost-sorted recommendations. Use that to pick the cheapest viable option.

Step 5: Output Research Summary

Structure your findings as:

## Research Summary: {model_name}
- **URL**: https://huggingface.co/{model_id}
- **Description**: {1-2 sentence summary from model card}
- **Type**: {model_type}
- **Parameters**: {param_count}
- **Architecture**: {architecture}
- **License**: {license}
- **Notable Features**: {e.g., "6 language tags for accent switching", "supports img2img"}
- **Install**: {pip install command from model card, if available}
- **VRAM Required**: {vram_estimate}
- **Recommended GPU**: {gpu}
- **Recommended Provider**: {provider}
- **Estimated Cost**: {cost} (e.g., "~$0.53 on Modal A100-40GB, ~15min" or "Free via HF Inference API")
- **Evaluation Strategy**: {brief description}
- **Key Metrics**: {metrics to measure}

This summary is used in the final report. Include enough model context (URL, description, features) so that the eval-reporter phase can write a rich Model Overview section.

Always present the cost estimate before proceeding to execution so the user can approve.

Important

Always check the model card FIRST — it often has sample code and requirements
When in doubt about VRAM, round up and pick a bigger GPU
If a model can run on HF Inference API, that's always the cheapest option
Note any special dependencies or setup requirements for the gpu-runner phase

Mehr aus diesem Repository

gleiches Repository

eval-reporter

nyosegawa/agentic-bench

Generate HTML reports and structured metrics from model evaluation results. Creates publication-ready reports with embedded outputs (images, audio, charts) and metrics.json for cross-model comparison. Use when generating reports, writing metrics, creating evaluation summaries, or formatting benchmark results. Triggers on "generate report", "write metrics", "create report", "evaluation summary", "benchmark results", "format results".

2026-02-215

gpu-runner

nyosegawa/agentic-bench

Execute model inference on GPU cloud providers. Handles code generation, deployment, execution, and result collection across HF Inference API/Endpoints, Colab, Modal, beam.cloud, Vast.ai, and RunPod. Use when running models on GPU, deploying to cloud, executing notebooks, or troubleshooting GPU execution failures. Triggers on "run on GPU", "execute model", "deploy to modal", "colab notebook", "beam deploy", "HF inference", "HF endpoints", "vast", "runpod".

2026-02-215

agentic-bench

nyosegawa/agentic-bench

Autonomous model validation and benchmarking. Investigates any ML model (LLM, image gen, TTS, time series, etc.), runs it on GPU cloud, evaluates quality and performance, and generates HTML reports. Use when user asks to verify, benchmark, evaluate, or test a model. Triggers on "verify model", "benchmark", "evaluate model", "test model", "run benchmark", "model evaluation", "モデルを検証", "ベンチマーク", "モデルを試して".

2026-02-215

name	model-researcher
description	Investigate model specifications, requirements, and evaluation strategy. Use when researching a model before benchmarking: reading HuggingFace model cards, estimating VRAM requirements, selecting GPU providers, and determining evaluation approach. Triggers on "model research", "investigate model", "model info", "VRAM estimate", "which provider", "model card".

Model Researcher

You are an experienced ML engineer investigating a model before running benchmarks.

Your Goal

Given a model name, produce a structured research summary:

Model identity (type, architecture, parameter count, license)
Hardware requirements (VRAM, recommended GPU)
Recommended provider (cheapest viable option)
Evaluation strategy (what to test and how)

Workflow

Step 1: Gather Model Information

Run the helper script to pull structured metadata:

python .claude/skills/model-researcher/scripts/hf_model_info.py MODEL_ID

To check if the model is available on HF Inference API (serverless):

python .claude/skills/model-researcher/scripts/hf_inference_check.py MODEL_ID

To search for similar or alternative models:

python .claude/skills/model-researcher/scripts/hf_model_search.py --task llm --sort downloads --limit 10
python .claude/skills/model-researcher/scripts/hf_model_search.py --search "qwen" --limit 5

If the script fails or the model is not on HuggingFace:

Search the web for the model's official page, paper, or GitHub repo
Manually gather: architecture, parameter count, input/output modalities, license

Step 2: Estimate VRAM and Cost

Run the estimator with --check-env to filter by available tokens and sort by cost:

python .claude/skills/model-researcher/scripts/gpu_estimator.py \
  --params PARAM_COUNT --quant fp16 --model-type llm --check-env --json

--check-env reads .env and:

Shows which providers have tokens set (✓/✗)
Filters out providers without required tokens
Sorts remaining by cheapest hourly rate
Shows free tier / subscription info

Review the cost estimate and include it in the research summary. Adjust based on:

Model-specific requirements (e.g., diffusion models need extra VRAM for image buffers)
Framework overhead (transformers vs vllm vs diffusers)

Step 3: Determine Model Type and Evaluation Strategy

Classify the model and load the appropriate evaluation guide:

Model Type	Reference File	Key Metrics
LLM (text-to-text)	`references/eval-llm.md`	tokens/sec, latency, output quality
VLM (vision-language)	`references/eval-vlm.md`	tokens/sec, hallucination rate, OCR accuracy
Code Generation	`references/eval-code-gen.md`	tokens/sec, pass@1, FIM accuracy
Embedding	`references/eval-embedding.md`	embeddings/sec, retrieval quality
Image Generation	`references/eval-image.md`	sec/image, visual quality
TTS (text-to-speech)	`references/eval-tts.md`	RTF, audio quality
STT (speech recognition)	`references/eval-stt.md`	RTF, WER/CER
Audio / Music Gen	`references/eval-audio.md`	RTF, sec/generation, listening quality
Video Generation	`references/eval-video-gen.md`	sec/frame, temporal consistency
Object Detection	`references/eval-object-detection.md`	mAP, FPS, detection quality
3D Generation	`references/eval-3d-gen.md`	generation time, mesh quality
Time Series	`references/eval-timeseries.md`	MAE/RMSE, prediction accuracy

Read the relevant reference file for detailed evaluation guidance.

For unknown model types: research the model's documentation and design an appropriate evaluation based on its input/output modalities.

Step 4: Select Provider

Sort by cheapest hourly cost, check .env for token availability:

Direct API — For API-hosted models (GPT-4o, Claude, Gemini). No GPU needed.
HF Inference API — Free with HF Pro. Catalog models only. Requires HF_TOKEN.
HF Inference Endpoints — Any HF model. Cheapest dedicated GPU ($0.50–2.50/hr). HF_TOKEN only.
Colab Pro — Up to ~30B. Chrome MCP. No extra token needed.
Modal — Serverless GPU. Requires MODAL_TOKEN_ID + MODAL_TOKEN_SECRET. $30/mo free.
beam.cloud — Dedicated endpoints. Requires BEAM_TOKEN.
Vast.ai — Marketplace pricing (cheapest GPUs). Requires VAST_API_KEY. $0.10–2.00/hr.
RunPod — Pods (persistent VMs). Requires RUNPOD_API_KEY. $0.34–2.69/hr.

gpu_estimator.py outputs cost-sorted recommendations. Use that to pick the cheapest viable option.

Step 5: Output Research Summary

Structure your findings as:

## Research Summary: {model_name}
- **URL**: https://huggingface.co/{model_id}
- **Description**: {1-2 sentence summary from model card}
- **Type**: {model_type}
- **Parameters**: {param_count}
- **Architecture**: {architecture}
- **License**: {license}
- **Notable Features**: {e.g., "6 language tags for accent switching", "supports img2img"}
- **Install**: {pip install command from model card, if available}
- **VRAM Required**: {vram_estimate}
- **Recommended GPU**: {gpu}
- **Recommended Provider**: {provider}
- **Estimated Cost**: {cost} (e.g., "~$0.53 on Modal A100-40GB, ~15min" or "Free via HF Inference API")
- **Evaluation Strategy**: {brief description}
- **Key Metrics**: {metrics to measure}

This summary is used in the final report. Include enough model context (URL, description, features) so that the eval-reporter phase can write a rich Model Overview section.

Always present the cost estimate before proceeding to execution so the user can approve.

Important

Always check the model card FIRST — it often has sample code and requirements
When in doubt about VRAM, round up and pick a bigger GPU
If a model can run on HF Inference API, that's always the cheapest option
Note any special dependencies or setup requirements for the gpu-runner phase