Skip to main content
Run any Skill in Manus
with one click
$pwd:

gke-ai-troubleshooting-tpu-connection-failure-vbar-oom

// Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.

$ git log --oneline --stat
stars:155
forks:73
updated:May 4, 2026 at 14:12
File Explorer
7 files
SKILL.md
readonly