Run any Skill in Manus with one click

gpu-runner

Stars5

Forks0

UpdatedFebruary 21, 2026 at 07:23

Execute model inference on GPU cloud providers. Handles code generation, deployment, execution, and result collection across HF Inference API/Endpoints, Colab, Modal, beam.cloud, Vast.ai, and RunPod. Use when running models on GPU, deploying to cloud, executing notebooks, or troubleshooting GPU execution failures. Triggers on "run on GPU", "execute model", "deploy to modal", "colab notebook", "beam deploy", "HF inference", "HF endpoints", "vast", "runpod".

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

nyosegawa

nyosegawa/agentic-bench

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Data ScientistsComputer and Mathematical Occupations·SOC 15-2051

File Explorer

8 files

SKILL.md

readonly

name

gpu-runner

description

GPU Runner

You are executing model inference on the appropriate GPU cloud provider.

Your Goal

Given a model, its requirements, and a chosen provider:

Write inference code tailored to the model
Execute it on the selected provider
Collect outputs (text, images, audio, metrics)
Handle errors and retry with alternatives if needed

Provider Selection (if not pre-selected)

Check .env for available credentials, then sort by cheapest hourly cost:

HF Inference API — Free with HF Pro. Requires HF_TOKEN. Catalog models only.
HF Inference Endpoints — Any HF model on dedicated GPU. HF_TOKEN only. $0.50–2.50/hr.
Colab Pro — Chrome MCP. No token needed. $9.99/month subscription. Up to ~30B.
Modal — Requires MODAL_TOKEN_ID + MODAL_TOKEN_SECRET. $30/month free tier. $0.59–3.95/hr.
beam.cloud — Requires BEAM_TOKEN. Existing credit. $0.54–3.50/hr.
Vast.ai — Requires VAST_API_KEY. Marketplace pricing (cheapest GPUs). $0.10–2.00/hr.
RunPod — Requires RUNPOD_API_KEY. Pods + Serverless. $0.34–2.69/hr.

Token availability check: If a provider's env vars are not set, skip it.

Provider-Specific Guides

Before executing, read the relevant provider reference:

Provider	Reference	When to Use
HF Inference API	(inline below)	Model on HF, API-supported, free
HF Inference Endpoints	`references/hf-endpoints.md`	Any HF model, cheapest dedicated GPU
Colab Pro	`references/colab-chrome-mcp.md`	Up to ~30B, interactive debugging
Modal	`references/modal.md`	30B+, serverless, reliable GPUs
beam.cloud	`references/beam-cloud.md`	Dedicated endpoints, existing credit
Vast.ai	`references/vast.md`	Cheapest GPUs, marketplace pricing
RunPod	`references/runpod.md`	Pods (persistent VMs), balanced price/reliability

HF Inference API (inline — simple enough)

import os
from huggingface_hub import InferenceClient

client = InferenceClient(token=os.environ["HF_TOKEN"])

# Text generation
response = client.text_generation("Hello, ", model="MODEL_ID", max_new_tokens=100)

# Image generation
image = client.text_to_image("A cat", model="MODEL_ID")
image.save("output.png")

Execution Workflow

Step 0: Dependency Research (BEFORE writing code)

The most expensive mistake is building a cloud image 10 times. Resolve ALL dependencies before writing the script, not through trial-and-error.

Read the model card install instructions — if it says pip install X, use that exactly
Check for heavy framework dependencies (nemo-toolkit, fairseq, detectron2, mmdet, etc.):
- Search PyPI or GitHub for the package's requirements.txt / setup.py
- List ALL transitive dependencies upfront
- Use --no-deps to install the framework, then install its dependencies explicitly
Use uv instead of pip for faster, more reliable dependency resolution
Pin versions only when the model card specifies them — otherwise let the resolver decide

Anti-pattern (never do this): Adding one missing package at a time → rebuild → discover next missing package → rebuild. This wastes 5-10 minutes per cycle. Instead, get the full dependency list right once.

Step 1: Write Inference Code

Read references/inference-patterns.md for code snippets per model type (LLM, VLM, image-gen, TTS, STT, embedding, timeseries, video-gen, object-detection, 3D, etc.).

Then write a self-contained script:

Import all dependencies
Load model (with appropriate dtype/device settings)
Run inference with test inputs
Save outputs to files
Print structured metrics (timing, token counts, etc.)

Always check the model card first — it overrides the generic patterns in inference-patterns.md. Each model may have a unique API, custom pipeline class, or special dependencies.

Save the script to results/YYYY-MM-DD_modelname/workspace/run.py.

Step 2: Execute

HF Inference API: Run directly in the current environment
Colab: Use Chrome MCP to create/run notebook cells
Modal: Deploy function and call .remote()
beam.cloud: Deploy endpoint and call via HTTP
Vast.ai: Create instance via SDK, SSH + SCP for execution
RunPod: Create Pod via SDK, SSH for execution

Step 3: Collect Results

Ensure all outputs are saved to results/YYYY-MM-DD_modelname/:

artifacts/ — Generated files (images, audio, text outputs)
workspace/run.py — The execution script (for reproducibility)

Step 4: Handle Failures

Common failure patterns and recovery:

Error	Recovery
OOM (CUDA out of memory)	Try quantization (int8/int4), smaller batch, or bigger GPU
Colab GPU unavailable	Fall back to Modal
Modal timeout	Increase timeout, or use beam.cloud
Import error	Install missing dependency in the execution environment
Model not found	Verify model ID, check if gated (needs HF token)

If a provider fails after 2 attempts, try the next provider in priority order.

Important

Always use torch.bfloat16 or torch.float16 for GPU models (never fp32)
Set device_map="auto" for large models
Include timing measurements in the execution script
Save ALL outputs — even errors are valuable for the report
Load .env with python-dotenv for API tokens

GPU Runner

You are executing model inference on the appropriate GPU cloud provider.

Your Goal

Given a model, its requirements, and a chosen provider:

Write inference code tailored to the model
Execute it on the selected provider
Collect outputs (text, images, audio, metrics)
Handle errors and retry with alternatives if needed

Provider Selection (if not pre-selected)

Check .env for available credentials, then sort by cheapest hourly cost:

HF Inference API — Free with HF Pro. Requires HF_TOKEN. Catalog models only.
HF Inference Endpoints — Any HF model on dedicated GPU. HF_TOKEN only. $0.50–2.50/hr.
Colab Pro — Chrome MCP. No token needed. $9.99/month subscription. Up to ~30B.
Modal — Requires MODAL_TOKEN_ID + MODAL_TOKEN_SECRET. $30/month free tier. $0.59–3.95/hr.
beam.cloud — Requires BEAM_TOKEN. Existing credit. $0.54–3.50/hr.
Vast.ai — Requires VAST_API_KEY. Marketplace pricing (cheapest GPUs). $0.10–2.00/hr.
RunPod — Requires RUNPOD_API_KEY. Pods + Serverless. $0.34–2.69/hr.

Token availability check: If a provider's env vars are not set, skip it.

Provider-Specific Guides

Before executing, read the relevant provider reference:

Provider	Reference	When to Use
HF Inference API	(inline below)	Model on HF, API-supported, free
HF Inference Endpoints	`references/hf-endpoints.md`	Any HF model, cheapest dedicated GPU
Colab Pro	`references/colab-chrome-mcp.md`	Up to ~30B, interactive debugging
Modal	`references/modal.md`	30B+, serverless, reliable GPUs
beam.cloud	`references/beam-cloud.md`	Dedicated endpoints, existing credit
Vast.ai	`references/vast.md`	Cheapest GPUs, marketplace pricing
RunPod	`references/runpod.md`	Pods (persistent VMs), balanced price/reliability

HF Inference API (inline — simple enough)

import os
from huggingface_hub import InferenceClient

client = InferenceClient(token=os.environ["HF_TOKEN"])

# Text generation
response = client.text_generation("Hello, ", model="MODEL_ID", max_new_tokens=100)

# Image generation
image = client.text_to_image("A cat", model="MODEL_ID")
image.save("output.png")

Execution Workflow

Step 0: Dependency Research (BEFORE writing code)

The most expensive mistake is building a cloud image 10 times. Resolve ALL dependencies before writing the script, not through trial-and-error.

Read the model card install instructions — if it says pip install X, use that exactly
Check for heavy framework dependencies (nemo-toolkit, fairseq, detectron2, mmdet, etc.):
- Search PyPI or GitHub for the package's requirements.txt / setup.py
- List ALL transitive dependencies upfront
- Use --no-deps to install the framework, then install its dependencies explicitly
Use uv instead of pip for faster, more reliable dependency resolution
Pin versions only when the model card specifies them — otherwise let the resolver decide

Step 1: Write Inference Code

Read references/inference-patterns.md for code snippets per model type (LLM, VLM, image-gen, TTS, STT, embedding, timeseries, video-gen, object-detection, 3D, etc.).

Then write a self-contained script:

Import all dependencies
Load model (with appropriate dtype/device settings)
Run inference with test inputs
Save outputs to files
Print structured metrics (timing, token counts, etc.)

Always check the model card first — it overrides the generic patterns in inference-patterns.md. Each model may have a unique API, custom pipeline class, or special dependencies.

Save the script to results/YYYY-MM-DD_modelname/workspace/run.py.

Step 2: Execute

HF Inference API: Run directly in the current environment
Colab: Use Chrome MCP to create/run notebook cells
Modal: Deploy function and call .remote()
beam.cloud: Deploy endpoint and call via HTTP
Vast.ai: Create instance via SDK, SSH + SCP for execution
RunPod: Create Pod via SDK, SSH for execution

Step 3: Collect Results

Ensure all outputs are saved to results/YYYY-MM-DD_modelname/:

artifacts/ — Generated files (images, audio, text outputs)
workspace/run.py — The execution script (for reproducibility)

Step 4: Handle Failures

Common failure patterns and recovery:

Error	Recovery
OOM (CUDA out of memory)	Try quantization (int8/int4), smaller batch, or bigger GPU
Colab GPU unavailable	Fall back to Modal
Modal timeout	Increase timeout, or use beam.cloud
Import error	Install missing dependency in the execution environment
Model not found	Verify model ID, check if gated (needs HF token)

If a provider fails after 2 attempts, try the next provider in priority order.

Important

Always use torch.bfloat16 or torch.float16 for GPU models (never fp32)
Set device_map="auto" for large models
Include timing measurements in the execution script
Save ALL outputs — even errors are valuable for the report
Load .env with python-dotenv for API tokens

gpu-runner

GPU Runner

Your Goal

Provider Selection (if not pre-selected)

Provider-Specific Guides

HF Inference API (inline — simple enough)

Execution Workflow

Step 0: Dependency Research (BEFORE writing code)

Step 1: Write Inference Code

Step 2: Execute

Step 3: Collect Results

Step 4: Handle Failures

Important

More from this repository

More from this repository

GPU Runner

Your Goal

Provider Selection (if not pre-selected)

Provider-Specific Guides

HF Inference API (inline — simple enough)

Execution Workflow

Step 0: Dependency Research (BEFORE writing code)

Step 1: Write Inference Code

Step 2: Execute

Step 3: Collect Results

Step 4: Handle Failures

Important