원클릭으로 Manus에서 모든 스킬 실행

시작하기

$pwd:

atom-patterns

Name: Atom Patterns
Author: ROCm

// Coding patterns and architecture index for the ATOM LLM inference engine

Manus에서 실행

$ git log --oneline --stat

stars:99

forks:64

updated:2026년 5월 23일 16:05

SKILL.md

readonly

name	atom-patterns
description	Coding patterns and architecture index for the ATOM LLM inference engine
version	1.2.0
scope	ATOM repo on AMD ROCm
last_updated	"2026-05-20T00:00:00.000Z"

ATOM Patterns

Code Architecture

atom/
├── config.py                  # Config, QuantizationConfig, HF config loading
├── entrypoints/               # Server entry (openai_server.py)
├── examples/                  # simple_inference.py (offline smoke test)
├── model_engine/              # Core engine pipeline
│   ├── llm_engine.py          # Top-level engine
│   ├── engine_core.py         # Per-DP-rank loop
│   ├── scheduler.py           # Batch scheduling
│   └── model_runner.py        # Forward pass, CUDAGraph, KV cache binding
├── model_loader/
│   └── loader.py              # Weight loading (safetensors, FP8/FP4, WeightsMapper)
├── model_ops/                 # AITER kernel wrappers
│   ├── linear.py              # LinearBase, ColumnParallel, RowParallel
│   ├── moe.py                 # FusedMoE, Mxfp4MoEMethod, weight_loader
│   ├── fused_moe_triton.py    # Triton matmul_ogs MoE path
│   ├── attention_mla.py       # MLA attention (DeepSeek)
│   ├── attention_mha.py       # Standard MHA attention
│   └── paged_attention.py     # Paged attention backend
├── models/                    # Model implementations (one file per family;
│                              # `*_mtp.py` for MTP / speculative variants —
│                              # run `ls atom/models/` for the current set)
├── spec_decode/
│   └── eagle.py               # MTP proposer (speculative decoding)
├── plugin/                    # vLLM/SGLang plugin adapters
└── utils/
    ├── envs.py                # All ATOM_* env var definitions
    └── forward_context.py     # Module-level forward context

Model Implementation Pattern

Adding a New Model

Every model class follows this contract:

class NewModelForCausalLM(nn.Module):
    # Weight loading config (class-level)
    packed_modules_mapping = { ... }
    weights_mapping = { ... }
    
    def __init__(self, config: Config, prefix: str = ""):
        ...
    
    def forward(self, input_ids, positions, intermediate_tensors=None, inputs_embeds=None):
        return hidden_states  # or logits
    
    def compute_logits(self, hidden_states):
        return self.lm_head(hidden_states)

Registration in model_runner.py:

support_model_arch_dict = {
    "NewModelForCausalLM": ("new_model", "NewModelForCausalLM"),
}

Model Reuse Relationships

One model file often serves multiple HuggingFace model_types when their architecture is identical or a strict subset. The mapping lives in atom/model_engine/model_runner.py:support_model_arch_dict — that's the authoritative source. Common patterns:

A shared base file covers a family of closely-related releases (e.g. DeepSeek V3 / V3.2 / GLM-5 all dispatch to one deepseek_v2.py).
A standalone file when the architecture diverges enough that sharing causes branching everywhere (e.g. DeepSeek V4's hot-cold sparse attention + FP4 MoE).
An _mtp.py companion when the spec-decode head differs from the base model.

TP Parallel Linear Pattern

ColumnParallelLinear: shards output dim, no all-reduce needed
RowParallelLinear: shards input dim, all-reduce on output (reduce_results=True)
ReplicatedLinear: full copy on each rank (gates, small projections)

MoE pattern: FusedMoE + shared_experts both use reduce_results=False, parent does one all-reduce.

Workflows

Adding a Model (file co-change pattern)

atom/models/new_model.py — Model implementation
atom/model_engine/model_runner.py — Register in support_model_arch_dict
atom/config.py — Add to _CONFIG_REGISTRY if config schema differs
.github/benchmark/models_accuracy.json — CI accuracy test entry
recipes/ — Usage recipe

Bug Fix Workflow

Identify bug via activation dump / per-layer comparison
Fix in model file
grep same pattern across codebase (fix-then-sweep)
Verify with simple_inference.py smoke test
Run lm_eval for accuracy regression

FP8/FP4 Weight Loading

Checkpoint weights: weight (FP8/FP4 packed) + weight.scale (E8M0 block scale)
ATOM renames .scale → .weight_scale_inv → .weight_scale (auto-rename in loader)
process_weights_after_loading() hook: shuffle weights for CK kernel layout
FP4 expert weights: Mxfp4MoEMethod.create_weights() + mxf4_merged_weight_loader()

Debug Instrumentation Rules

NEVER modify @support_torch_compile decorated models (breaks Dynamo)
Put debug code in forward() (has @torch.inference_mode()), NOT in run_model()
Gate debug prints with env vars (e.g., ATOM_V4_DIAG=1)
Use --level 0 --enforce-eager to disable both torch.compile and CUDAGraph

Testing Patterns

Test location: tests/ directory at repo root
Framework: pytest
No GPU needed: tests mock AITER and torch.cuda
Naming: test_<module>.py (e.g., test_scheduler.py, test_block_manager.py)
Smoke test: python -m atom.examples.simple_inference --model <path> --kv_cache_dtype fp8
Accuracy: lm_eval with gsm8k (CI threshold != actual baseline)

Environment Variables

Authoritative list: atom/utils/envs.py (all ATOM_* defined as lazy lambdas). To read what an env var does, grep the file for its name — the lambda and the call site comment describe the behavior. Required per-model env vars are listed in .github/benchmark/models.json and .github/benchmark/models_accuracy.json.

CI/CD

Accuracy tests: .github/benchmark/models_accuracy.json (model matrix, thresholds, baselines).
Benchmark: .github/benchmark/models.json (server args, bench args, runner pinning).
Dashboard: .github/dashboard/index.html (gh-pages).
Workflows + action versions: .github/workflows/*.yaml is the source of truth (action versions pinned per workflow).

related-skills.json

같은 저장소

capture-trace.md

from "ROCm/ATOM"

Capture a PyTorch profiler / kineto trace from a running ATOM server for a short benchmark window. Use when the user asks for "a trace", "profiler trace", "GPU trace", or "抓 trace" for performance investigation — what kernels ran, what's on the critical path, what's slow. Do NOT use for crashes (use debug-agent-locate-kernel) or numerical bugs (use dump-bisect-debug).

2026-05-2399

debug-agent-locate-kernel.md

from "ROCm/ATOM"

Identify which GPU kernel is faulting/hanging in ATOM via rocm-debug-agent (for faults/asserts) or rocgdb (for silent livelocks). debug-agent dumps wave registers + faulting PC + (with --save-code-objects) disassembled code object on memory faults / ASSERT_TRAP. rocgdb attaches to a live process and lists in-flight `info dispatches` + HSA `info queues` — works when the kernel isn't faulting but just stuck (e.g. atomic-counter deadlock). Use when: server crashes with "Memory access fault by GPU node-N", server hangs with GPU at 100% but no token output, kernel asserting `s_trap`, or `HIP_LAUNCH_BLOCKING=1` makes a hang vanish. Do NOT use for: numerical bugs (use dump-bisect-debug), compile errors, OOM.

2026-05-2399

dump-bisect-debug.md

from "ROCm/ATOM"

Locate forward numerical bugs by dumping intermediate tensors from a target implementation and a known-good reference, then bisecting layer by layer. Also covers batch-invariance bisect (the same token at any batch position should produce a bitwise-identical output, per DeepSeek V4 paper §3.3). Use when "the output is wrong but I don't know where" — model produces gibberish, degenerates, or picks the wrong token, but code review reveals nothing.

2026-05-2399

run-atom-workload.md

from "ROCm/ATOM"

Run any ATOM workload — accuracy eval (GSM8K via lm_eval), performance benchmark, concurrency sweep, offline simple_inference, or fault repro under rocm-debug-agent. Use when the user asks to "test accuracy", "测精度", "跑 GSM8K", "跑 benchmark", "test performance", "run sweep", "repro the fault", "测一下 MTP1 精度", "跑 simple_inference" — anything that drives an ATOM workload. Encodes the canonical flow (stop → start → workload-in-shell-bg → wait_infer_drain → stop) and the model-family env vars. Same pattern works for both server-based workloads (lm_eval / benchmark client) and offline simple_inference. Do NOT use for profiling traces (use capture-trace).

2026-05-2399

package.json

"author": "ROCm"

"repository": "ROCm/ATOM"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	atom-patterns
description	Coding patterns and architecture index for the ATOM LLM inference engine
version	1.2.0
scope	ATOM repo on AMD ROCm
last_updated	"2026-05-20T00:00:00.000Z"

ATOM Patterns

Code Architecture

atom/
├── config.py                  # Config, QuantizationConfig, HF config loading
├── entrypoints/               # Server entry (openai_server.py)
├── examples/                  # simple_inference.py (offline smoke test)
├── model_engine/              # Core engine pipeline
│   ├── llm_engine.py          # Top-level engine
│   ├── engine_core.py         # Per-DP-rank loop
│   ├── scheduler.py           # Batch scheduling
│   └── model_runner.py        # Forward pass, CUDAGraph, KV cache binding
├── model_loader/
│   └── loader.py              # Weight loading (safetensors, FP8/FP4, WeightsMapper)
├── model_ops/                 # AITER kernel wrappers
│   ├── linear.py              # LinearBase, ColumnParallel, RowParallel
│   ├── moe.py                 # FusedMoE, Mxfp4MoEMethod, weight_loader
│   ├── fused_moe_triton.py    # Triton matmul_ogs MoE path
│   ├── attention_mla.py       # MLA attention (DeepSeek)
│   ├── attention_mha.py       # Standard MHA attention
│   └── paged_attention.py     # Paged attention backend
├── models/                    # Model implementations (one file per family;
│                              # `*_mtp.py` for MTP / speculative variants —
│                              # run `ls atom/models/` for the current set)
├── spec_decode/
│   └── eagle.py               # MTP proposer (speculative decoding)
├── plugin/                    # vLLM/SGLang plugin adapters
└── utils/
    ├── envs.py                # All ATOM_* env var definitions
    └── forward_context.py     # Module-level forward context

Model Implementation Pattern

Adding a New Model

Every model class follows this contract:

class NewModelForCausalLM(nn.Module):
    # Weight loading config (class-level)
    packed_modules_mapping = { ... }
    weights_mapping = { ... }
    
    def __init__(self, config: Config, prefix: str = ""):
        ...
    
    def forward(self, input_ids, positions, intermediate_tensors=None, inputs_embeds=None):
        return hidden_states  # or logits
    
    def compute_logits(self, hidden_states):
        return self.lm_head(hidden_states)

Registration in model_runner.py:

support_model_arch_dict = {
    "NewModelForCausalLM": ("new_model", "NewModelForCausalLM"),
}

Model Reuse Relationships

A shared base file covers a family of closely-related releases (e.g. DeepSeek V3 / V3.2 / GLM-5 all dispatch to one deepseek_v2.py).
A standalone file when the architecture diverges enough that sharing causes branching everywhere (e.g. DeepSeek V4's hot-cold sparse attention + FP4 MoE).
An _mtp.py companion when the spec-decode head differs from the base model.

TP Parallel Linear Pattern

ColumnParallelLinear: shards output dim, no all-reduce needed
RowParallelLinear: shards input dim, all-reduce on output (reduce_results=True)
ReplicatedLinear: full copy on each rank (gates, small projections)

MoE pattern: FusedMoE + shared_experts both use reduce_results=False, parent does one all-reduce.

Workflows

Adding a Model (file co-change pattern)

atom/models/new_model.py — Model implementation
atom/model_engine/model_runner.py — Register in support_model_arch_dict
atom/config.py — Add to _CONFIG_REGISTRY if config schema differs
.github/benchmark/models_accuracy.json — CI accuracy test entry
recipes/ — Usage recipe

Bug Fix Workflow

Identify bug via activation dump / per-layer comparison
Fix in model file
grep same pattern across codebase (fix-then-sweep)
Verify with simple_inference.py smoke test
Run lm_eval for accuracy regression

FP8/FP4 Weight Loading

Checkpoint weights: weight (FP8/FP4 packed) + weight.scale (E8M0 block scale)
ATOM renames .scale → .weight_scale_inv → .weight_scale (auto-rename in loader)
process_weights_after_loading() hook: shuffle weights for CK kernel layout
FP4 expert weights: Mxfp4MoEMethod.create_weights() + mxf4_merged_weight_loader()

Debug Instrumentation Rules

NEVER modify @support_torch_compile decorated models (breaks Dynamo)
Put debug code in forward() (has @torch.inference_mode()), NOT in run_model()
Gate debug prints with env vars (e.g., ATOM_V4_DIAG=1)
Use --level 0 --enforce-eager to disable both torch.compile and CUDAGraph

Testing Patterns

Test location: tests/ directory at repo root
Framework: pytest
No GPU needed: tests mock AITER and torch.cuda
Naming: test_<module>.py (e.g., test_scheduler.py, test_block_manager.py)
Smoke test: python -m atom.examples.simple_inference --model <path> --kv_cache_dtype fp8
Accuracy: lm_eval with gsm8k (CI threshold != actual baseline)

Environment Variables

CI/CD

Accuracy tests: .github/benchmark/models_accuracy.json (model matrix, thresholds, baselines).
Benchmark: .github/benchmark/models.json (server args, bench args, runner pinning).
Dashboard: .github/dashboard/index.html (gh-pages).
Workflows + action versions: .github/workflows/*.yaml is the source of truth (action versions pinned per workflow).

atom-patterns

ATOM Patterns

Code Architecture

Model Implementation Pattern

Adding a New Model

Model Reuse Relationships

TP Parallel Linear Pattern

Workflows

Adding a Model (file co-change pattern)

Bug Fix Workflow

FP8/FP4 Weight Loading

Debug Instrumentation Rules

Testing Patterns

Environment Variables

CI/CD

이 저장소의 다른 Skills

이 저장소의 다른 Skills

ATOM Patterns

Code Architecture

Model Implementation Pattern

Adding a New Model

Model Reuse Relationships

TP Parallel Linear Pattern

Workflows

Adding a Model (file co-change pattern)

Bug Fix Workflow

FP8/FP4 Weight Loading

Debug Instrumentation Rules

Testing Patterns

Environment Variables

CI/CD