Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

hipfire-kernel-atlas

Name: Hipfire Kernel Atlas
Author: Kaden-Schutt

// Use Kernel Atlas to collect phase-aware hipfire measurements and render ISA Fit View visualizations for AMD GPU kernels, quant formats, and architectures. Use when a user asks how MQ/HFQ/HFP/Q8 quants occupy hardware, asks for an ASCII ISA visualization, wants to compare gfx1010/gfx1030/gfx11/gfx12 kernel fit, or wants an agent-readable "left on table" summary from Atlas rows.

In Manus ausführen

$ git log --oneline --stat

stars:411

forks:44

updated:18. Mai 2026 um 23:49

Datei-Explorer

3 Dateien

SKILL.md

readonly

name	hipfire-kernel-atlas
description	Use Kernel Atlas to collect phase-aware hipfire measurements and render ISA Fit View visualizations for AMD GPU kernels, quant formats, and architectures. Use when a user asks how MQ/HFQ/HFP/Q8 quants occupy hardware, asks for an ASCII ISA visualization, wants to compare gfx1010/gfx1030/gfx11/gfx12 kernel fit, or wants an agent-readable "left on table" summary from Atlas rows.

hipfire-kernel-atlas

Use this skill when the task is to explain or visualize how a hipfire quant format and kernel use an AMD GPU ISA target. The primary tool is scripts/kernel_atlas.py; this skill is a thin agent wrapper around that CLI.

Core Workflow

Collect or locate Atlas rows
- Prefer existing JSONL under .codeinsight+research/kernel-atlas/runs/.
- For AR prefill/decode, collect with collect-ar.
- Use --profile-prefill / --profile-decode for AR rows when the user wants the ISA view scoped to runtime-hot kernels and tagged by op role.
- For speculative decode, collect with collect-dflash.
- Keep raw run data in .codeinsight+research/; it is ignored and may be private.
Attach ISA metadata
- Use --isa-file for one known HSACO/code object.
- Use --isa-dir .hipfire_kernels/<arch> plus --isa-filter for a bounded set.
- Prefer --isa-output <path>.json so multiple rows reference one manifest.
Attach dispatch/source provenance
- Use --dispatch-provenance when rows have profiled kernel names.
- Prefer --dispatch-output <path>.json so multiple rows reference one manifest.
- Treat dispatch references as evidence to inspect, not proof of a unique runtime branch.
- Prefer rows with a known arch; source ranking is target-arch-aware when arch-specific kernel files exist.
Render the ISA Fit View
- Use .agents/skills/hipfire-kernel-atlas/render-fit.sh.
- If a row has artifacts.profile_kernels, the view joins profiled kernel names to ISA object kernel names/symbols and summarizes only matched objects.
- If a row has dispatch provenance, the view prints hot-kernel op/source/dispatch attribution.
- Report the visual plus a short readout of likely limit and left on table.
Ask Atlas for candidate experiments
- Use python3 scripts/kernel_atlas.py suggest --row ... --isa ... --dispatch ....
- Prefer --format markdown for humans and JSON for automation.
- Let suggest auto-load default history from .codeinsight+research/kernel-atlas/tasks/; use --history only for extra history paths.
- Treat suggestions as an experiment queue, not as predicted wins.
- Each suggestion should name the lever type, hot kernel, files, risk, rationale, and eval contract.
Create an optimization task
- Use python3 scripts/kernel_atlas.py task to turn a row into task.json and TASK.md.
- Include --allowed-file for every path an agent may edit.
- Include correctness commands for DFlash or risky runtime changes.
- Generated tasks strip known profiling/instrumentation env from eval and preserve the original row env as baseline.row_env.
Evaluate a candidate
- Use python3 scripts/kernel_atlas.py eval --task ... --runs 5 --warmup-runs 1 --output-dir ....
- Use --refresh-baseline first to write baseline.json; use --baseline <baseline.json> for candidate comparisons.
- Report result.json status, selected metric median, speedup, stability, and any failed command output tail.
- Treat the local ledger.jsonl as experiment lineage, not a public benchmark.
- If status is needs_baseline, do not claim a speedup; refresh or provide a clean baseline first.

Commands

Render an existing row:

.agents/skills/hipfire-kernel-atlas/render-fit.sh \
  --row .codeinsight+research/kernel-atlas/runs/atlas.jsonl \
  --row-index 0 \
  --isa .codeinsight+research/kernel-atlas/runs/isa.json

Collect a small AR smoke with ISA:

python3 scripts/kernel_atlas.py collect-ar \
  --model ~/.hipfire/models/qwen3.5-0.8b.mq4 \
  --workload qwen3.5-0.8b \
  --model-size 0.8b \
  --quant mq4 \
  --prefill 32 \
  --gen 5 \
  --kv-mode asym3 \
  --profile-prefill \
  --profile-decode \
  --isa-dir .hipfire_kernels/gfx1030 \
  --isa-filter 'gemm_hfq4g256|gemv_hfq4g256' \
  --isa-output .codeinsight+research/kernel-atlas/runs/isa-gfx1030.json \
  --dispatch-provenance \
  --dispatch-output .codeinsight+research/kernel-atlas/runs/dispatch-gfx1030.json \
  --output .codeinsight+research/kernel-atlas/runs/atlas-gfx1030.jsonl

Suggest candidate experiments from a profiled row:

python3 scripts/kernel_atlas.py suggest \
  --row .codeinsight+research/kernel-atlas/runs/atlas-gfx1201.jsonl \
  --row-index 1 \
  --isa .codeinsight+research/kernel-atlas/runs/isa-gfx1201.json \
  --dispatch .codeinsight+research/kernel-atlas/runs/dispatch-gfx1201.json \
  --format markdown

Create a bounded task from a profiled row:

python3 scripts/kernel_atlas.py task \
  --row .codeinsight+research/kernel-atlas/runs/atlas-gfx1201.jsonl \
  --row-index 1 \
  --isa .codeinsight+research/kernel-atlas/runs/isa-gfx1201.json \
  --dispatch .codeinsight+research/kernel-atlas/runs/dispatch-gfx1201.json \
  --allowed-file kernels/src/gemv_hfq4g256_multirow.hip \
  --output-dir .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4

Create a PyTorch-shape task for non-Qwen work:

python3 scripts/kernel_atlas.py task-pytorch \
  --name llama-rmsnorm-shape \
  --op rmsnorm \
  --input-shape 1,2048,4096 \
  --dtype float16 \
  --eval-command 'python3 bench_rmsnorm.py' \
  --allowed-file kernels/src/rmsnorm_candidate.hip \
  --output-dir .codeinsight+research/kernel-atlas/tasks/llama-rmsnorm-shape

Refresh a stable baseline and then evaluate a candidate:

python3 scripts/kernel_atlas.py eval \
  --task .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/task.json \
  --runs 5 \
  --warmup-runs 1 \
  --refresh-baseline \
  --output-dir .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/eval-baseline

python3 scripts/kernel_atlas.py eval \
  --task .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/task.json \
  --baseline .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/eval-baseline/baseline.json \
  --runs 5 \
  --warmup-runs 1 \
  --output-dir .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/eval-001

Interpretation Rules

Treat the view as ISA fit, not full hardware occupancy. True occupancy also needs counters, wave residency, clocks, cache behavior, and launch overlap.
If matrix units are available but observed matrix ops are zero, ask whether the workload phase should route through WMMA/MFMA or whether it is a decode GEMV path where memory/launch dominates.
If VGPR/SGPR/spills are high, prioritize register pressure and spill removal before claiming a bandwidth win.
If the row is DFlash, do not treat tok/s alone as correctness evidence. Run the DFlash coherence gate before claiming a spec-decode improvement.
If eval reports unstable, do not claim a win or regression; tighten the run shape or rerun after DPM/thermal state settles.
For PyTorch-shape tasks, treat the eval command as the source of truth until Atlas has a real PyTorch profiler/extractor producer.
If the worktree is dirty, cite the row's provenance.diff_md5 and avoid comparing it as a shipped baseline.

Good Agent Output

Include:

the rendered ASCII fit view, or the most relevant section of it
the row path and ISA manifest path
arch, quant, phase, and shape bucket
runtime metric used for the readout
one concise interpretation of likely limit and left on table

Avoid:

calling the heuristic a roofline model
claiming a perf win from smoke runs
mixing rows from different prompts or dirty binaries without saying so

related-skills.json

gleiches Repository

serve-restart.md

from "Kaden-Schutt/hipfire"

Cleanly stop, free the port, and restart `hipfire serve`. Use when serve "Failed to start (port in use)", a stale daemon holds VRAM, an os-error-2/JSON-parse pre-warm crash left a zombie singleton, or you just want a guaranteed-fresh daemon. Kills both bun CLI serve and the spawned target/release/examples/daemon, reaps stale ~/.hipfire/daemon.pid + serve.pid + GPU lock, fuser-frees the port, then relaunches.

2026-05-25411

astrea.md

from "Kaden-Schutt/hipfire"

Use for hipfire quant calibration, imatrix-driven experiments, KLD/PPL quality evaluation, k-map/format selection, MQ/HFQ/HFP/MFP tradeoff work, ParoQuant-style weight transform planning, and KV policy planning. Use when deciding whether a calibrated model candidate should be promoted, rejected, packaged, or sent through Atlas for AR/DFlash perf validation.

2026-05-18411

hipfire-arch-port.md

from "Kaden-Schutt/hipfire"

Port hipfire compute kernels to a new RDNA / CDNA architecture (gfx1201/gfx1200/gfx94x/gfx1150/etc.). Use when adding support for a new GPU arch, fixing arch-specific kernel codegen failures (e.g. "Cannot select intrinsic %llvm.amdgcn.wmma..."), or refactoring dispatch.rs's arch-conditional branches. Captures the WMMA operand-shape matrix, builtin name table per arch, dispatch routing convention, validation procedure (channel-test / coherence-gate / speed-gate), contributor onboarding workflow, and known correctness traps. Triggers on phrases like "port to gfx12", "9070 XT support", "R9700 support", "WMMA gfx12", "Cannot select intrinsic wmma", "amdgcn.wmma", "new arch port", "cross-arch kernel".

2026-05-18411

hipfire-autoheal.md

from "Kaden-Schutt/hipfire"

Triage and repair hipfire runtime failures such as daemon hangs, stale serve.pid, port 11435 conflicts, ROCm include-path problems, missing precompiled kernels, VRAM OOM, kernel JIT failures, and multi-turn recall regressions. Use after diagnostics identify a likely runtime issue or when the user asks to fix a broken hipfire serve/run flow.

2026-05-18411

hipfire-diag.md

from "Kaden-Schutt/hipfire"

Run and interpret hipfire GPU diagnostics for ROCm/HIP bring-up, missing kernels, test_kernels failures, inference smoke failures, and install/runtime environment problems. Use when a user asks to diagnose hipfire, check GPU readiness, run baseline tests, or explain diagnostic output.

2026-05-18411

hipfire-kernel-tuning.md

from "Kaden-Schutt/hipfire"

Optimize hipfire HIP/compute kernels — pick a tuning lever (multi-row, K-tile depth, prefetch, wave-size port, WMMA/MFMA, fused projections, ISA flags) and validate the win across the supported RDNA arch matrix. Use when you've identified a hot kernel, want to land a real perf win, and need to NOT regress on archs you don't have hardware for. Codifies the methodology from this repo's actual perf history — wave64 CDNA3 port (commit 4105035, 2× decode), nontemporal-load revert (34eb024, -13% caught only by clean-baseline bisect), gfx12 WMMA port (PR

2026-05-18411

package.json

"author": "Kaden-Schutt"

"repository": "Kaden-Schutt/hipfire"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

DatenwissenschaftlerInformatik- und Mathematikberufe15-2051L4

name	hipfire-kernel-atlas
description	Use Kernel Atlas to collect phase-aware hipfire measurements and render ISA Fit View visualizations for AMD GPU kernels, quant formats, and architectures. Use when a user asks how MQ/HFQ/HFP/Q8 quants occupy hardware, asks for an ASCII ISA visualization, wants to compare gfx1010/gfx1030/gfx11/gfx12 kernel fit, or wants an agent-readable "left on table" summary from Atlas rows.

hipfire-kernel-atlas

Core Workflow

Collect or locate Atlas rows
- Prefer existing JSONL under .codeinsight+research/kernel-atlas/runs/.
- For AR prefill/decode, collect with collect-ar.
- Use --profile-prefill / --profile-decode for AR rows when the user wants the ISA view scoped to runtime-hot kernels and tagged by op role.
- For speculative decode, collect with collect-dflash.
- Keep raw run data in .codeinsight+research/; it is ignored and may be private.
Attach ISA metadata
- Use --isa-file for one known HSACO/code object.
- Use --isa-dir .hipfire_kernels/<arch> plus --isa-filter for a bounded set.
- Prefer --isa-output <path>.json so multiple rows reference one manifest.
Attach dispatch/source provenance
- Use --dispatch-provenance when rows have profiled kernel names.
- Prefer --dispatch-output <path>.json so multiple rows reference one manifest.
- Treat dispatch references as evidence to inspect, not proof of a unique runtime branch.
- Prefer rows with a known arch; source ranking is target-arch-aware when arch-specific kernel files exist.
Render the ISA Fit View
- Use .agents/skills/hipfire-kernel-atlas/render-fit.sh.
- If a row has artifacts.profile_kernels, the view joins profiled kernel names to ISA object kernel names/symbols and summarizes only matched objects.
- If a row has dispatch provenance, the view prints hot-kernel op/source/dispatch attribution.
- Report the visual plus a short readout of likely limit and left on table.
Ask Atlas for candidate experiments
- Use python3 scripts/kernel_atlas.py suggest --row ... --isa ... --dispatch ....
- Prefer --format markdown for humans and JSON for automation.
- Let suggest auto-load default history from .codeinsight+research/kernel-atlas/tasks/; use --history only for extra history paths.
- Treat suggestions as an experiment queue, not as predicted wins.
- Each suggestion should name the lever type, hot kernel, files, risk, rationale, and eval contract.
Create an optimization task
- Use python3 scripts/kernel_atlas.py task to turn a row into task.json and TASK.md.
- Include --allowed-file for every path an agent may edit.
- Include correctness commands for DFlash or risky runtime changes.
- Generated tasks strip known profiling/instrumentation env from eval and preserve the original row env as baseline.row_env.
Evaluate a candidate
- Use python3 scripts/kernel_atlas.py eval --task ... --runs 5 --warmup-runs 1 --output-dir ....
- Use --refresh-baseline first to write baseline.json; use --baseline <baseline.json> for candidate comparisons.
- Report result.json status, selected metric median, speedup, stability, and any failed command output tail.
- Treat the local ledger.jsonl as experiment lineage, not a public benchmark.
- If status is needs_baseline, do not claim a speedup; refresh or provide a clean baseline first.

Commands

Render an existing row:

.agents/skills/hipfire-kernel-atlas/render-fit.sh \
  --row .codeinsight+research/kernel-atlas/runs/atlas.jsonl \
  --row-index 0 \
  --isa .codeinsight+research/kernel-atlas/runs/isa.json

Collect a small AR smoke with ISA:

python3 scripts/kernel_atlas.py collect-ar \
  --model ~/.hipfire/models/qwen3.5-0.8b.mq4 \
  --workload qwen3.5-0.8b \
  --model-size 0.8b \
  --quant mq4 \
  --prefill 32 \
  --gen 5 \
  --kv-mode asym3 \
  --profile-prefill \
  --profile-decode \
  --isa-dir .hipfire_kernels/gfx1030 \
  --isa-filter 'gemm_hfq4g256|gemv_hfq4g256' \
  --isa-output .codeinsight+research/kernel-atlas/runs/isa-gfx1030.json \
  --dispatch-provenance \
  --dispatch-output .codeinsight+research/kernel-atlas/runs/dispatch-gfx1030.json \
  --output .codeinsight+research/kernel-atlas/runs/atlas-gfx1030.jsonl

Suggest candidate experiments from a profiled row:

python3 scripts/kernel_atlas.py suggest \
  --row .codeinsight+research/kernel-atlas/runs/atlas-gfx1201.jsonl \
  --row-index 1 \
  --isa .codeinsight+research/kernel-atlas/runs/isa-gfx1201.json \
  --dispatch .codeinsight+research/kernel-atlas/runs/dispatch-gfx1201.json \
  --format markdown

Create a bounded task from a profiled row:

python3 scripts/kernel_atlas.py task \
  --row .codeinsight+research/kernel-atlas/runs/atlas-gfx1201.jsonl \
  --row-index 1 \
  --isa .codeinsight+research/kernel-atlas/runs/isa-gfx1201.json \
  --dispatch .codeinsight+research/kernel-atlas/runs/dispatch-gfx1201.json \
  --allowed-file kernels/src/gemv_hfq4g256_multirow.hip \
  --output-dir .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4

Create a PyTorch-shape task for non-Qwen work:

python3 scripts/kernel_atlas.py task-pytorch \
  --name llama-rmsnorm-shape \
  --op rmsnorm \
  --input-shape 1,2048,4096 \
  --dtype float16 \
  --eval-command 'python3 bench_rmsnorm.py' \
  --allowed-file kernels/src/rmsnorm_candidate.hip \
  --output-dir .codeinsight+research/kernel-atlas/tasks/llama-rmsnorm-shape

Refresh a stable baseline and then evaluate a candidate:

python3 scripts/kernel_atlas.py eval \
  --task .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/task.json \
  --runs 5 \
  --warmup-runs 1 \
  --refresh-baseline \
  --output-dir .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/eval-baseline

python3 scripts/kernel_atlas.py eval \
  --task .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/task.json \
  --baseline .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/eval-baseline/baseline.json \
  --runs 5 \
  --warmup-runs 1 \
  --output-dir .codeinsight+research/kernel-atlas/tasks/gfx1201-gemv-r4/eval-001

Interpretation Rules

Treat the view as ISA fit, not full hardware occupancy. True occupancy also needs counters, wave residency, clocks, cache behavior, and launch overlap.
If matrix units are available but observed matrix ops are zero, ask whether the workload phase should route through WMMA/MFMA or whether it is a decode GEMV path where memory/launch dominates.
If VGPR/SGPR/spills are high, prioritize register pressure and spill removal before claiming a bandwidth win.
If the row is DFlash, do not treat tok/s alone as correctness evidence. Run the DFlash coherence gate before claiming a spec-decode improvement.
If eval reports unstable, do not claim a win or regression; tighten the run shape or rerun after DPM/thermal state settles.
For PyTorch-shape tasks, treat the eval command as the source of truth until Atlas has a real PyTorch profiler/extractor producer.
If the worktree is dirty, cite the row's provenance.diff_md5 and avoid comparing it as a shipped baseline.

Good Agent Output

Include:

the rendered ASCII fit view, or the most relevant section of it
the row path and ISA manifest path
arch, quant, phase, and shape bucket
runtime metric used for the readout
one concise interpretation of likely limit and left on table

Avoid:

calling the heuristic a roofline model
claiming a perf win from smoke runs
mixing rows from different prompts or dirty binaries without saying so

hipfire-kernel-atlas

hipfire-kernel-atlas

Core Workflow

Commands

Interpretation Rules

Good Agent Output

Mehr aus diesem Repository

Mehr aus diesem Repository

hipfire-kernel-atlas

Core Workflow

Commands

Interpretation Rules

Good Agent Output