Skip to main content
在 Manus 中运行任何 Skill
一键导入
$pwd:

gke-ai-troubleshooting-tpu-connection-failure-vbar-oom

// Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.

$ git log --oneline --stat
stars:155
forks:73
updated:2026年5月4日 14:12
文件资源管理器
7 个文件
SKILL.md
readonly