Diagnose Relax training launch scripts for misconfigured flags that hurt performance (time/MFU) or waste GPU memory (cards needed). Use when user asks to review/audit/check a training script, mentions "perf doctor", suspects a config is slow or OOM-prone, or wants a sanity check before launching. Produces a two-section markdown report (Performance + Memory) with cited flags, severity, and concrete fixes.
Develop and debug the Relax reinforcement learning project. Use this skill whenever modifying code in the relax/ directory, or running remote training jobs on a Ray cluster for validation. Also use it when the user mentions training, debugging training runs, submitting Ray jobs, or fixing training errors.
Connect to a remote Ray cluster head node via SSH (paramiko) to execute commands, check cluster status, inspect logs, and debug training jobs. Use this skill when the user asks to SSH into a remote machine, check Ray cluster status, or run remote commands on the Ray head node.
Expert code review of current git changes with a senior engineer lens. Detects SOLID violations, security risks, Python anti-patterns, and ML/distributed training issues. Tailored for the Relax reinforcement learning framework.
Guide for creating Claude Code skills following Anthropic's official best practices. Use when user wants to create a new skill, build a skill, write SKILL.md, update an existing skill, or needs skill creation guidelines. Provides structure, frontmatter fields, naming conventions, and new features like dynamic context injection and subagent execution.
自动排查 Ray 调度的分布式训练任务 hang 问题。使用当训练任务无响应、资源利用率异常、任务长时间无进度时。自动收集集群状态、任务调用栈、Actor 状态,分析阻塞链条并定位根因。
Write and maintain bilingual (English + Chinese) documentation for the Relax project. Use when user asks to create, update, or translate documentation pages. Ensures format correctness (VitePress, sidebar config, bilingual parity) and content correctness (matches actual codebase, no fabricated features).
Creates git commits following Conventional Commits format with type/scope/subject and detailed markdown body. Use when user wants to commit changes, create commit, save work, or stage and commit. Enforces project-specific conventions from CLAUDE.md. Each change type gets its own markdown heading (# emoji + type), with detailed item lists under each.