Skip to main content
Run any Skill in Manus
with one click

debug-failing-gpu

Stars173
Forks55
UpdatedJune 15, 2026 at 15:05

Recover from GPU-busy / GPU-unavailable failures. Use when a command (pytest, python, a TLX/Triton kernel run, a benchmark) fails with errors indicating the GPU is busy, out of memory, or unavailable — e.g. "CUDA error: out of memory", "all CUDA-capable devices are busy or unavailable", "CUDA-capable device(s) is/are busy or unavailable", "RuntimeError: No CUDA GPUs are available", "device-side assert", or a hang on the first CUDA call. Runs find_working_gpu.sh to locate a healthy GPU and re-runs the failed command pinned to it via CUDA_VISIBLE_DEVICES.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

SKILL.md
readonly