Reviews Pull Requests or local diffs with an 8-agent fan-out covering static analysis, dead code, code smells + quality (naming, complexity, single-responsibility, magic numbers), language rules (C++/Python/CMake), architecture, simplification, performance (hot-path classification, allocations, locks, I/O), and undefined behaviour (signed overflow, lifetime, strict aliasing, data races, sanitizer coverage; C/C++/unsafe-Rust only). Use when the user asks to "review this PR", "review the diff", "audit this branch", "/pr-review", or when staging changes before push.

2026-05-25

pr-review-interactive

软件质量保证分析师与测试员

Walk through a PR review interactively, one finding at a time. Generate review via pr-review, then for each issue present analysis + proposed inline comment, let user accept/edit/skip, accumulate into a PENDING GitHub review, submit at end.

2026-05-25

analyze-pr-comments

软件质量保证分析师与测试员

Use when user wants to list, analyze, review, or summarize GitHub PR comments on a pull request number or URL

2026-05-25

architecture-tradeoff

软件开发工程师

Use when architectural decisions involve competing quality attributes (performance vs modifiability, availability vs consistency, security vs usability), when the user says "tradeoff analysis", "ATAM", "quality attributes", "what are we giving up", or when a design choice affects multiple non-functional requirements in tension.

2026-05-25

programming-cpp-constexpr

软件开发工程师

Use when moving computation from runtime to compile time — precomputed lookup tables, compile-time constants, type-based template dispatch, or zero-cost conditional branching with if constexpr; use when a function or value could be evaluated before the binary runs

2026-05-25

当前展示该仓库 Top 8 / 50 个已收集 skills。

#002

rocm-systems

23 个 skills439322更新于 2026-07-21

占该创作者 18%

skill

职业分类

描述

更新

amdsmi-changelog-automation

软件开发工程师

Check and generate changelog entries for amd-smi. Use when: reviewing PRs for changelog updates, generating release notes, checking CHANGELOG.md compliance.

2026-07-21

amdsmi-commit-and-pr-conventions

软件开发工程师

Use when writing or restructuring git commits or opening/updating a pull request for amd-smi — composing commit titles, commit message bodies, PR titles, or PR descriptions. Defines the Conventional Commits `type(amdsmi):` title convention enforced by the Systems PR Bot, the rocm-systems PR template sections, the unit-test and JIRA/ISSUE-reference gates, brevity caps, and the rule that JIRA tickets appear only in the PR JIRA ID section, never in code comments or commit bodies.

2026-07-09

amdsmi-restructure-commits

软件开发工程师

Use when finishing an amd-smi development branch — consolidating commits into logical groups with clean messages AND deciding how to integrate the work (merge to develop, push and open PR, keep as-is, or discard). Covers commit restructuring plus the merge/PR/cleanup workflow.

2026-07-09

writing-plans

软件开发工程师

Use when an approved spec exists and you need a bite-sized, file-level implementation plan before any code is written. Produces a plan ready for executing-plans or subagent dispatch.

2026-07-09

rdc-build-install

软件开发工程师

Build and install RDC from source. Use when: building locally, installing before tests, pre-review build verification, build + install + verify. Requires GRPC_ROOT to be set.

2026-06-24

feature-unit-testing

软件质量保证分析师与测试员

Use when writing, planning, or improving unit tests for low-level transport or systems code — especially when reasoning about branch coverage, test gaps, identifying which uncovered paths are worth pursuing, or deciding when a feature's test suite is ready to merge.

2026-06-19

amdsmi-agent-handoff

其他计算机职业

Use when handing work from one amd-smi agent to another (planning→development, planning→review, development→review) or compacting a long session into a fresh one. Produces a compact handoff doc referencing artifacts by path instead of duplicating them.

2026-06-17

amdsmi-interrogate

软件开发工程师

Use whenever amd-smi work begins from a design — a Confluence Feature Design, a Jira/SWDEV ticket, a driver hand-off, or a user description. Reconciles the design against what the code actually does, questions it adversarially, and produces an approved spec. This is the single front door to any feature or behavior change.

2026-06-17

当前展示该仓库 Top 8 / 23 个已收集 skills。

#003

FlyDSL

15 个 skills23992更新于 2026-07-22

占该创作者 12%

skill

职业分类

描述

更新

kernel-code-cleanup

软件开发工程师

Modernize FlyDSL kernels: replace raw MLIR dialects (arith, scf, vector, llvm, memref, math), ArithValue, redundant fx.* wrapping, fx.Index, buffer_ops, SmemPtr/SmemAllocator, per-tile *_atom_call, and raw rocdl.mfma_* with the current fx.* surface (fx types, Python control flow, make_buffer_tensor, SharedAllocator, fx.copy/fx.gemm, local @flyc.jit if/else). Also trims comments and dead code and applies the _run_compiled fast launch path. Use when reviewing, cleaning, or migrating existing kernels.

2026-07-22

add-target-atom-op

软件开发工程师

Add a new target-specific Mma / Copy Op type to any FlyDSL backend dialect (`lib/Dialect/Fly<TARGET>/<SUBTARGET>/` + `include/flydsl/Dialect/Fly<TARGET>/IR/`). Explains the `MmaOp`-type / `CopyOp`-type design (each type plugs into the generic `!fly.mma_atom<...>` / `!fly.copy_atom<...>` wrapper through `Fly_MmaOpTypeInterface` / `Fly_CopyOpTypeInterface`), the stateful-vs-stateless variants (`Fly_StatefulOpTypeInterface`), and the required `emitAtomCall` / `emitAtomCallSSA` lowering contract to the backend dialect (LLVM/ROCDL/NVVM/SPIR-V/...). Use when adding a new tensor-core / matrix instruction (MFMA, WMMA, HMMA, WGMMA, ...), a new buffer / shared-memory / global copy atom, a new stateful copy (e.g. per-atom offset or descriptor), or bringing up a new backend dialect (`FlyPTX`, `FlyCPU`, etc.). The current reference implementation is `FlyROCDL` with `CDNA3` MFMA, `CDNA3` BufferCopy, `CDNA4` LDS-read-transpose, `GFX1250` WMMA / MX-scaled WMMA (stateful scale) / N-D TDM copy (stateful descriptor); treat thes

2026-07-12

flydsl-kernel-authoring

软件开发工程师

Comprehensive reference for authoring FlyDSL GPU kernels on AMD GPUs. Covers the layout algebra, tiled copy/MMA, buffer ops, loop-carried range loops, SharedAllocator (LDS), autotuning, and common patterns. Use when writing, reviewing, or understanding FlyDSL kernel code.

2026-07-12

flydsl-tile-programming

软件开发工程师

Guided step-by-step wizard for producing a new FlyDSL GPU kernel from a requirement: classify the kernel type, pick a skeleton, fill in compute, add control flow / sync / LDS, then test on GPU. Use when the user wants to WRITE a new kernel, port a Triton kernel to FlyDSL, or learn tile programming by following a procedure. For API/layout-algebra lookups, per-op reference tables, and troubleshooting, use the flydsl-kernel-authoring skill instead.

2026-07-12

debug-flydsl-kernel

软件开发工程师

Debug FlyDSL GPU kernels that produce NaN, inf, wrong results, or crash. Covers cache invalidation, tracing pitfalls (runtime conditionals, range vs range_constexpr), loop-carried state packing, buffer_load addressing, MFMA operand layout verification, LDS bank conflict diagnosis, and systematic error isolation (all-1s test, single-partition test, host-side tensor inspection). Use when a FlyDSL kernel produces incorrect output or compilation errors. Usage: /debug-flydsl-kernel

2026-07-10

gemm-optimization

软件开发工程师

Comprehensive guide to optimizing GEMM (General Matrix Multiply) kernels in FlyDSL on AMD CDNA GPUs. Covers tiling strategy, LDS ping-pong double-buffer, XOR bank-conflict swizzle, A/B data prefetch pipeline, 2-stage software pipelining, MFMA instruction scheduling (hot_loop_scheduler), epilogue strategies (direct store vs CShuffle), TFLOPS/bandwidth calculation, main-loop instruction count analysis, and bottleneck identification from ATT traces. Based on the production preshuffle_gemm kernel. Usage: /gemm-optimization

2026-07-10

lds-optimization

软件开发工程师

Optimize LDS (Local Data Share / shared memory) access patterns in FlyDSL GPU kernels. Diagnose bank conflicts and high lgkmcnt stalls from ATT trace data, then apply swizzle or padding layouts to eliminate conflicts. Also increase the distance between LDS write and subsequent LDS read to hide LDS latency. LDS read preceded by write always requires a sync (s_waitcnt lgkmcnt or s_barrier). Use when trace analysis shows ds_read/ds_write/lgkmcnt as a bottleneck. Usage: /lds-optimization

2026-07-10

oob-detection

软件开发工程师

Detect out-of-bounds memory accesses in CPU or GPU code using static interval analysis and runtime assertions/printfs. Use when investigating OOB, buffer overrun, invalid memory access, HIP/ROCm illegal address, CUDA illegal memory access, silent tensor corruption, or suspicious buffer_load/store address arithmetic.

2026-07-10

当前展示该仓库 Top 8 / 15 个已收集 skills。

#004

rocm-libraries

11 个 skills390346更新于 2026-07-15

占该创作者 8.7%

skill

职业分类

描述

更新

hipdnn-superbuild

软件开发工程师

Build hipDNN with providers via the repository superbuild. Faster than standalone since providers build alongside hipDNN in a single CMake invocation. On Windows, auto-runs the wheel-based ROCm setup if not already prepared.

2026-07-15

hipdnn-superbuild-test

软件质量保证分析师与测试员

Run tests against an existing hipDNN superbuild. Supports per-component selection (hipdnn, miopen-provider, hipblaslt-provider, hip-kernel-provider, integration-tests), unit/integration/external-integration scope, and gtest filtering. Reproduces the cross-provider external-integration-check suite. Handles Windows DLL PATH automatically.

2026-07-15

pr-summary

软件开发工程师

Draft or revise pull request titles and bodies with a concise standard format: summary, risk assessment, testing summary, testing checklist, and technical changes. Use when preparing a new draft or ready-for-review PR, opening a PR from a branch, updating an existing draft/open PR, reopening a PR, or when the user provides a GitHub PR URL/branch and asks for PR summary, risk, testing, or description text. New PRs should be draft by default unless the user explicitly asks to open them ready for review.

2026-06-26

hipblaslt-pr-quality

软件质量保证分析师与测试员

hipBLASLt supplements to the ROCm PR quality base skill. Use for hipBLASLt PR author, review, or pre-merge gating (target branch develop; product paths under projects/hipblaslt/**, including tensilelite/). Adds and tightens base rules; never relaxes a base MUST.

2026-06-26

rfc-backlog

软件开发工程师

Turn a technical RFC into an actionable backlog of JIRA stories and tasks, grounded in the codebase the RFC affects. Derives unified, independently-shippable work items (each with a user-story sentence, acceptance criteria, functional and non-functional requirements), presents them for review, then — unless run as a dry run — collects component/epic/label defaults and creates the tickets in JIRA. Use when you have an approved or near-approved RFC and need to plan the implementation as trackable work.

2026-06-08

hipdnn-review

软件质量保证分析师与测试员

Review a hipDNN pull request, branch, or local diff for correctness, API compatibility, provider behavior, resource management, code reuse, and testing coverage/quality. Uses local source/worktrees for cross-reference by default. Use when asked to review hipDNN code, review a PR, or assess whether a change is ready to merge.

2026-06-05

rfc-review-compatibility

软件开发工程师

Deep-dive review of a technical RFC focused on change-management concerns — API/ABI/wire/on-disk compatibility, version skew, migration tooling, rollout mechanics, reversibility, and consumer impact. Produces a written review report. Use when an RFC touches a stable surface (public API, plugin ABI, on-disk or wire format), or as a follow-up pass after `/rfc-review` flags compatibility as a deeper concern.

2026-06-05

rfc-review-ops

软件开发工程师

Deep operational-concerns review of a technical RFC — build system impact, packaging, CI, observability, performance-in-production, deployment, capacity, and failure modes. Use as a focused follow-up to /rfc-review when the operational lens deserves more than the umbrella's one-paragraph pass, or standalone when the RFC is primarily ops-shaped (new dependency, build/CI change, perf-sensitive subsystem). Not a substitute for /review on implementation PRs.

2026-06-05

当前展示该仓库 Top 8 / 11 个已收集 skills。

#005

ATOM

6 个 skills141100更新于 2026-07-19

占该创作者 4.7%

skill

职业分类

描述

更新

capture-trace

软件开发工程师

Capture a PyTorch profiler / kineto trace from a running ATOM server for a short benchmark window. Use when the user asks for "a trace", "profiler trace", "GPU trace", or "抓 trace" for performance investigation — what kernels ran, what's on the critical path, what's slow. Do NOT use for crashes (use debug-agent-locate-kernel) or numerical bugs (use dump-bisect-debug).

2026-07-19

run-atom-workload

软件开发工程师

Run any ATOM workload — accuracy eval (GSM8K via lm_eval), performance benchmark, concurrency sweep, offline simple_inference, or fault repro under rocm-debug-agent. Use when the user asks to "test accuracy", "测精度", "跑 GSM8K", "跑 benchmark", "test performance", "run sweep", "repro the fault", "测一下 MTP1 精度", "跑 simple_inference" — anything that drives an ATOM workload. Encodes the canonical flow (stop → start → workload-in-bg → wait_infer_drain → stop) and the model-family env vars. Same pattern works for both server-based workloads (lm_eval / benchmark client) and offline simple_inference. Do NOT use for profiling traces (use capture-trace).

2026-07-18

review-pr

软件质量保证分析师与测试员

AI code review for ATOM PRs. ATOM consumes aiter kernels and integrates with vLLM/SGLang plugins. Reviews check perf claims, aiter cross-repo deps, model coverage, dispatch correctness, and AI-generated code patterns. Invoke with a PR number.

2026-07-15

debug-agent-locate-kernel

软件开发工程师

Identify which GPU kernel is faulting/hanging in ATOM via rocm-debug-agent (for faults/asserts) or rocgdb (for silent livelocks). debug-agent dumps wave registers + faulting PC + (with --save-code-objects) disassembled code object on memory faults / ASSERT_TRAP. rocgdb attaches to a live process and lists in-flight `info dispatches` + HSA `info queues` — works when the kernel isn't faulting but just stuck (e.g. atomic-counter deadlock). Use when: server crashes with "Memory access fault by GPU node-N", server hangs with GPU at 100% but no token output, kernel asserting `s_trap`, or `HIP_LAUNCH_BLOCKING=1` makes a hang vanish. Do NOT use for: numerical bugs (use dump-bisect-debug), compile errors, OOM.

2026-06-06

atom-patterns

软件开发工程师

Coding patterns and architecture index for the ATOM LLM inference engine

2026-05-23

dump-bisect-debug

软件开发工程师

Locate forward numerical bugs by dumping intermediate tensors from a target implementation and a known-good reference, then bisecting layer by layer. Also covers batch-invariance bisect (the same token at any batch position should produce a bitwise-identical output, per DeepSeek V4 paper §3.3). Use when "the output is wrong but I don't know where" — model produces gibberish, degenerates, or picks the wrong token, but code review reveals nothing.

2026-05-23

#006

aiter

5 个 skills497426更新于 2026-07-15

占该创作者 3.9%

skill

职业分类

描述

更新

review-pr

软件质量保证分析师与测试员

AI code review for aiter PRs. Catches perf regressions, silent correctness bugs, dispatch gate holes, and AI-generated code patterns. Invoke with a PR number; works through fetch → semantic understanding → rule checklist → verdict. Add new rules here as patterns emerge from real reviews.

2026-07-15

aiter-config-shape

软件开发工程师

How to add/upload tuned config CSVs under aiter/configs (incl. model_configs/) without introducing duplicate shapes, and how to find & resolve duplicate-shape collisions. Use whenever adding a model's tuned config, merging/uploading config CSVs, editing anything under aiter/configs/**, or when a run hits "duplicate shape entries during merge".

2026-07-07

aiter-op-test

软件质量保证分析师与测试员

Standard structure for aiter op_tests under op_tests/test_*.py — @benchmark + run_perftest candidate loop, a torch reference, a final markdown summary table, a __main__ guard so the module is importable, and faithful reproduction of the real model call (output buffer, layout, shapes). Use whenever writing, rewriting, or extending any aiter unit/perf test, or adding model-derived shapes (e.g. DeepSeek-V4) to an existing one.

2026-06-29

opus-module-build-optimization

软件开发工程师

Module-level JIT build-wall optimization for opus-based aiter modules. Use when an aiter JIT module's first-call build wall is a user-visible bottleneck or when adding a new module.

2026-05-27

opus-kernel-best-practice

软件开发工程师

Compile-time optimization guidance for HIP/C++ kernels using opus.hpp. Use when writing or reviewing OPUS kernels, analyzing compile time, reducing template instantiation overhead, or optimizing hipcc build performance.

2026-04-13

#007

omnistat

3 个 skills255更新于 2026-07-08

占该创作者 2.4%

skill

职业分类

描述

更新

job-analysis

软件开发工程师

Analyze an HPC job from an Omnistat database using hypothesis-driven exploration, driven by the omnistat-inspect tool. Use this to diagnose why a job behaved as it did — performance bottlenecks, hardware issues, anomalies, or comparing a degraded job against a healthy baseline. For a plain factual snapshot without investigation, use job-report instead.

2026-07-08

job-report

软件开发工程师

Generate a factual summary report card for an HPC job from an Omnistat database using the single-shot omnistat-inspect JSON command. Use this for a quick, comprehensive snapshot of what a job did (stats, energy, health, data quality) without diagnosing why. For root-cause investigation, performance debugging, or comparing jobs, use job-analysis instead.

2026-07-08

open-database

网络与计算机系统管理员

Open and explore an Omnistat database using VictoriaMetrics. Use this when the user wants to view, query, or visualize collected Omnistat telemetry data from a user-mode job.

2026-07-08

#008

TheRock

2 个 skills1.2k290更新于 2026-06-25

占该创作者 1.6%

skill

职业分类

描述

更新

rocm-pr-quality

软件质量保证分析师与测试员

Help an engineer author, review, or pre-merge-gate a pull request to a ROCm library so it is traceable, tested, and safe to merge. Use when preparing a PR, reviewing a PR, or deciding whether an approved PR is safe to merge right now, or when the user provides a GitHub PR URL or branch and asks for help with PR quality, description, testing, review, or merge-readiness. Library-agnostic base; component overlays extend it.

2026-06-25

therock-pr-quality

软件质量保证分析师与测试员

TheRock build-repo supplements to the ROCm PR quality base skill. Use for TheRock PR author, review, or pre-merge gating where the change touches the superbuild, submodules/patches, artifact descriptors, or reusable CI workflows. Adds and tightens base rules; never relaxes a base MUST.

2026-06-25

#009

rocMLIR

2 个 skills18556更新于 2026-06-05

占该创作者 1.6%

skill

职业分类

描述

更新

review-rocmlir-pr

软件质量保证分析师与测试员

Review a rocMLIR pull request with deep expertise in MLIR/LLVM coding standards, the Rock dialect, MIGraphX integration, kernel codegen for AMD GPUs, lit/E2E testing, and the rocMLIR CMake build. Use when asked to review a rocMLIR PR or check a rocMLIR change. Read-only; never posts to GitHub.

2026-06-05

update-pr-review

软件质量保证分析师与测试员

Reconcile fresh review findings against existing inline comment threads on a PR. Iterates over previous Claude root comments first so that fixed issues are correctly resolved, then handles still-present issues, then identifies genuinely new findings. Never posts the same issue twice.

2026-05-29

#010

TransformerEngine

2 个 skills7234更新于 2026-04-24

占该创作者 1.6%

skill

职业分类

描述

更新

软件质量保证分析师与测试员

Verify AMD copyright header compliance on files modified or introduced by ROCm. Checks presence, format, and year correctness. Use whenever reviewing a PR on the ROCm TransformerEngine fork, or when asked to audit copyright headers.

2026-04-24

review-pr

软件质量保证分析师与测试员

Deep code review of a branch as a PR against dev. Focus on intent, correctness, reuse, and test semantics.

2026-04-24

#011

xla

2 个 skills1110更新于 2026-06-19

占该创作者 1.6%

skill

职业分类

描述

更新

review-xla-pr

软件质量保证分析师与测试员

Review an XLA pull request with deep expertise in HLO optimizations, GPU backend, Triton codegen, autotuner, and AMD/ROCm parity. Use when asked to review an XLA PR or check an XLA change.

2026-06-19

update-pr-review

软件质量保证分析师与测试员

Given fresh review findings (from a prior review skill) and a PR number, fetch previous Claude inline comments, cross-reference findings, and update the PR in a thread-aware way. Resolves addressed issues, replies to active threads, and posts only genuinely new findings as new inline comments. Never posts the same issue twice.

2026-03-02

#012

repo-digest

2 个 skills60更新于 2026-05-05

占该创作者 1.6%

skill

职业分类

描述

更新

download-artifacts

网络与计算机系统管理员

Use this skill when the user wants to download GitHub Actions artifacts from repo-digest workflows. Triggers when the user mentions "download artifact", "get the digest", "fetch artifact", "latest digest", or references workflows like "triton-daily-digest", "triton-weekly-digest", "xla-daily-digest", "llvm-weekly-digest", "maxtext-daily-digest", "maxtext-weekly-digest", "jax-daily-digest", or "jax-weekly-digest".

已展示 12 / 16 个仓库