Skip to main content
在 Manus 中运行任何 Skill
一键导入
$pwd:
ROCm
GitHub 创作者资料

ROCm

按仓库查看 10 个 GitHub 仓库中的 59 个已收集 skills,并展示近似职业覆盖。

已收集 skills
59
仓库
10
职业领域
2
更新
2026-05-29
职业覆盖
该创作者主要覆盖的职业大类。
这里展示前 8 个仓库;完整仓库列表在下方继续。
仓库浏览

仓库与代表性 skills

#001
rocm-systems
21 个 skills388253更新于 2026-05-29
占该创作者 36%
brainstorming
项目管理专家

Use before any creative work — designing a new amdsmi_* API, adding a CLI command, building a feature, or modifying behavior. Explores intent, requirements, and design before any code is written.

2026-05-29
dispatching-parallel-agents
软件开发工程师

Use when facing 2+ independent problems with no shared state — e.g., unrelated test failures in different subsystems, multiple independent bug investigations, parallel research tasks. Dispatch one focused subagent per domain instead of investigating sequentially.

2026-05-29
executing-plans
软件开发工程师

Use when a written implementation plan exists at docs/dev/plans/ and you need to execute it task-by-task in the current session with verification at each step.

2026-05-29
restructure-commits
软件开发工程师

Use when finishing an amd-smi development branch — consolidating commits into logical groups with clean messages AND deciding how to integrate the work (merge to develop, push and open PR, keep as-is, or discard). Covers commit restructuring plus the merge/PR/cleanup workflow.

2026-05-29
systematic-debugging
软件开发工程师

Use when encountering any bug, test failure, build failure, or unexpected behavior in amd-smi — before proposing any fix. Enforces root-cause investigation before symptom patching.

2026-05-29
test-driven-development
软件开发工程师

Use when implementing any amd-smi feature, bug fix, or behavior change — before writing implementation code. Enforces strict RED-GREEN-REFACTOR: failing test first, watch it fail, minimal code to pass, refactor.

2026-05-29
using-git-worktrees
软件开发工程师

Use when starting amd-smi feature work, executing an implementation plan, or reviewing a PR that needs an isolated workspace away from the main checkout. Sets up a worktree following the rocm-systems-pr<PR#> convention.

2026-05-29
writing-plans
软件开发工程师

Use when an approved spec exists and you need a bite-sized, file-level implementation plan before any code is written. Produces a plan ready for executing-plans or subagent dispatch.

2026-05-29
当前展示该仓库 Top 8 / 21 个已收集 skills。
#002
FlyDSL
16 个 skills19256更新于 2026-05-29
占该创作者 27%
format-code
软件开发工程师

Format and clean up changed files before committing, matching the project's CI style gate. Formats Python with black + ruff and C/C++ with clang-format using the repository's .clang-format. Use when the user says "format code", "clean up code", "lint", "format before commit", "/format-code", or mentions black, ruff, clang-format, or CI style failures while tidying their working tree.

2026-05-29
capture-kernel-trace
软件开发工程师

Capture GPU kernel ATT (Advanced Thread Trace) via rocprofv3 on a remote Docker container or locally. Discovers kernel names, configures input.yaml with the target kernel_include_regex, runs rocprofv3 -i input.yaml with FLYDSL_DEBUG_ENABLE_DEBUG_INFO=1, and downloads the latest ui_output_agent_* directory for analysis. Usage: /capture-kernel-trace <test_script.py> [kernel_name_pattern]

2026-05-29
kernel-trace-analysis
软件开发工程师

Profile GPU kernels using rocprofv3 to collect ATT instruction-level traces, then analyze the trace data using hotspot_analyzer.py to identify top-K stall hotspots (VMEM-load, VMEM-wait, LDS/SMEM-wait, barrier, MFMA stalls) mapped back to source lines, and produce an actionable optimization plan. Usage: /kernel-trace-analysis <cmd> Can also analyze an existing dispatch dir directly: /kernel-trace-analysis --dir <path>

2026-05-29
lds-optimization
软件开发工程师

Optimize LDS (Local Data Share / shared memory) access patterns in FlyDSL GPU kernels. Diagnose bank conflicts and high lgkmcnt stalls from ATT trace data, then apply swizzle or padding layouts to eliminate conflicts. Also increase the distance between LDS write and subsequent LDS read to hide LDS latency. LDS read preceded by write always requires a sync (s_waitcnt lgkmcnt or s_barrier). Use when trace analysis shows ds_read/ds_write/lgkmcnt as a bottleneck. Usage: /lds-optimization

2026-05-29
prefetch-data-load
软件开发工程师

Apply prefetch optimization to FlyDSL kernel loops: pre-load the first iteration's data before the loop, issue async loads for the next iteration inside the loop body, and swap buffers at the loop tail via runtime loop-carried values. This overlaps data load latency with compute instructions. Use when a kernel has a loop where buffer_load feeds into MFMA/compute and load latency is exposed. Usage: /prefetch-data-load

2026-05-29
flydsl-kernel-authoring
软件开发工程师

Comprehensive reference for authoring FlyDSL GPU kernels on AMD GPUs. Covers the layout algebra, tiled copy/MMA, buffer ops, loop-carried range loops, SmemAllocator, autotuning, and common patterns. Use when writing, reviewing, or understanding FlyDSL kernel code.

2026-05-27
build-rocm-image
网络与计算机系统管理员

Connect to a remote host via SSH and build a Docker image with rocprofv3, aiter, and FlyDSL. Use when user wants to build/rebuild the ROCm development image on a remote host. Usage: /build-rocm-image <hostname>

2026-05-20
oob-detection
信息安全分析师

Detect out-of-bounds memory accesses in CPU or GPU code using static interval analysis and runtime assertions/printfs. Use when investigating OOB, buffer overrun, invalid memory access, HIP/ROCm illegal address, CUDA illegal memory access, silent tensor corruption, or suspicious buffer_load/store address arithmetic.

2026-05-20
当前展示该仓库 Top 8 / 16 个已收集 skills。
#003
pytorch
6 个 skills25382更新于 2026-03-12
占该创作者 10%
pr-review
软件质量保证分析师与测试员

Review PyTorch pull requests for code quality, test coverage, security, and backward compatibility. Use when reviewing PRs, when asked to review code changes, or when the user mentions "review PR", "code review", or "check this PR".

2026-03-12
triaging-issues
软件开发工程师

Triages GitHub issues by routing to oncall teams, applying labels, and closing questions. Use when processing new PyTorch issues or when asked to triage an issue.

2026-03-10
pt2-bug-basher
软件开发工程师

Debug PyTorch 2 compiler stack failures including Dynamo graph breaks, Inductor codegen errors, AOTAutograd crashes, and accuracy mismatches. Use when encountering torch.compile errors, BackendCompilerFailed exceptions, recompilation issues, Triton kernel failures, FX graph problems, or when the user mentions debugging PT2, Dynamo, Inductor, or compiled model issues.

2026-03-05
document-public-apis
软件开发工程师

Document undocumented public APIs in PyTorch by removing functions from coverage_ignore_functions and coverage_ignore_classes in docs/source/conf.py, running Sphinx coverage, and adding the appropriate autodoc directives to the correct .md or .rst doc files. Use when a user asks to remove functions from conf.py ignore lists.

2026-02-24
pyrefly-type-coverage
软件开发工程师

Migrate a file to use stricter Pyrefly type checking with annotations required for all functions, classes, and attributes.

2026-02-04
metal-kernel
软件开发工程师

Write Metal/MPS kernels for PyTorch operators. Use when adding MPS device support to operators, implementing Metal shaders, or porting CUDA kernels to Apple Silicon. Covers native_functions.yaml dispatch, host-side operators, and Metal kernel implementation.

2026-01-27
#004
ATOM
5 个 skills9964更新于 2026-05-23
占该创作者 8.5%
atom-patterns
软件开发工程师

Coding patterns and architecture index for the ATOM LLM inference engine

2026-05-23
capture-trace
网络与计算机系统管理员

Capture a PyTorch profiler / kineto trace from a running ATOM server for a short benchmark window. Use when the user asks for "a trace", "profiler trace", "GPU trace", or "抓 trace" for performance investigation — what kernels ran, what's on the critical path, what's slow. Do NOT use for crashes (use debug-agent-locate-kernel) or numerical bugs (use dump-bisect-debug).

2026-05-23
debug-agent-locate-kernel
网络与计算机系统管理员

Identify which GPU kernel is faulting/hanging in ATOM via rocm-debug-agent (for faults/asserts) or rocgdb (for silent livelocks). debug-agent dumps wave registers + faulting PC + (with --save-code-objects) disassembled code object on memory faults / ASSERT_TRAP. rocgdb attaches to a live process and lists in-flight `info dispatches` + HSA `info queues` — works when the kernel isn't faulting but just stuck (e.g. atomic-counter deadlock). Use when: server crashes with "Memory access fault by GPU node-N", server hangs with GPU at 100% but no token output, kernel asserting `s_trap`, or `HIP_LAUNCH_BLOCKING=1` makes a hang vanish. Do NOT use for: numerical bugs (use dump-bisect-debug), compile errors, OOM.

2026-05-23
dump-bisect-debug
数据科学家

Locate forward numerical bugs by dumping intermediate tensors from a target implementation and a known-good reference, then bisecting layer by layer. Also covers batch-invariance bisect (the same token at any batch position should produce a bitwise-identical output, per DeepSeek V4 paper §3.3). Use when "the output is wrong but I don't know where" — model produces gibberish, degenerates, or picks the wrong token, but code review reveals nothing.

2026-05-23
run-atom-workload
软件开发工程师

Run any ATOM workload — accuracy eval (GSM8K via lm_eval), performance benchmark, concurrency sweep, offline simple_inference, or fault repro under rocm-debug-agent. Use when the user asks to "test accuracy", "测精度", "跑 GSM8K", "跑 benchmark", "test performance", "run sweep", "repro the fault", "测一下 MTP1 精度", "跑 simple_inference" — anything that drives an ATOM workload. Encodes the canonical flow (stop → start → workload-in-shell-bg → wait_infer_drain → stop) and the model-family env vars. Same pattern works for both server-based workloads (lm_eval / benchmark client) and offline simple_inference. Do NOT use for profiling traces (use capture-trace).

2026-05-23
#005
aiter
2 个 skills450331更新于 2026-05-27
占该创作者 3.4%
#006
rocm-libraries
2 个 skills354298更新于 2026-05-28
占该创作者 3.4%
#007
TransformerEngine
2 个 skills6929更新于 2026-04-24
占该创作者 3.4%
#008
xla
2 个 skills98更新于 2026-03-25
占该创作者 3.4%
#009
repo-digest
2 个 skills60更新于 2026-05-05
占该创作者 3.4%
已展示 10 / 10 个仓库
已展示全部仓库
ROCm GitHub Skills | SkillsMP