一键导入
aiter-reflection
// This skill should be used when optimizing AMD GPU kernels on MI300 using the aiter project, including running op tests, benchmarking, iterating on kernel changes, and recording results in the kernel experiment database.
// This skill should be used when optimizing AMD GPU kernels on MI300 using the aiter project, including running op tests, benchmarking, iterating on kernel changes, and recording results in the kernel experiment database.
This skill should be used when reasoning about GPU architecture fundamentals to guide kernel optimization choices such as memory hierarchy usage, execution model mapping, block sizing, and latency-aware tuning across HIP, Triton, and PyTorch.
This skill should be used when writing or tuning HIP kernels on AMD/NVIDIA GPUs, covering memory coalescing, shared-memory tiling, bank conflict avoidance, warp primitives, occupancy, vectorization, async ops, loop unrolling, and profiling.
This skill should be used when optimizing kernels in this repo and needing to consult past optimization experiments, or when recording the current optimization iteration back into the kernel experiment database.
MI300/CDNA3 architecture guide for HIP/Triton optimization—MFMA variants, dual register files, data formats, sparsity, LDS/GWS, and best practices.
CDNA3/MI300 HIP programming insights—chiplet/cache model, Infinity Cache, memory coherency, matrix cores, sparsity, and best practices.
MI300 HIP programming differences vs NVIDIA—wavefront vs warp, memory hierarchy, MFMA usage, occupancy, and profiling pitfalls.
| name | aiter-reflection |
| description | This skill should be used when optimizing AMD GPU kernels on MI300 using the aiter project, including running op tests, benchmarking, iterating on kernel changes, and recording results in the kernel experiment database. |
Optimize AMD MI300 GPU kernels for correctness and performance using the aiter workflow, then record each iteration to the kernel experiment database.
rocm-smi or ps aux | grep python to identify GPU tasks.venv Python environmentrocprof-compute at least once to deepen bottleneck analysis.kernel-exp-history to review related optimization history and extract ideas.python -m pip install -e . --no-build-isolation --no-deps --force-reinstallrm -f aiter/jit/*.so && rm -rf aiter/jit/build ~/.aiterDocument the results:
Use kernel-exp-history to store in database
Verify result quality: If showing unexpected regression, investigate before recording
Restore the repo code to the main branch state after finishing the iteration