Run any Skill in Manus with one click

hf-architecture-tikz

Draw Sebastian-Raschka-gallery-style TikZ architecture diagrams for any HuggingFace decoder-only LLM, with per-block parameter formulas and concrete numbers. Supports MHA, GQA, MLA, DeepSeek-V4-Flash (Hyper-Connections + Sparse Attention with learned indexer), dense and MoE FFNs (incl. hash routing), and MTP heads. Use when the user asks to visualize / diagram / illustrate a transformer or LLM architecture (DeepSeek, Qwen, Llama, Mistral, gpt-oss, etc.), wants a Raschka-style figure, or wants a TikZ/LaTeX rendering of an HF model.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/yzlnew/infra-skills --skill hf-architecture-tikz

Copy and paste this command into Claude Code to install the skill

Source

yzlnew/infra-skills

Stars128

Forks9

UpdatedMay 22, 2026 at 02:27

File Explorer

12 files

SKILL.md

readonly

More from this repository

same repository

tikz-flowchart

yzlnew/infra-skills

Creates professional TikZ flowcharts with standardized themes, including Google Material-like and Anthropic-inspired options.

2026-04-15128

html-flowchart-anthropic

yzlnew/infra-skills

Create and revise pure HTML/CSS flowcharts using an Anthropic-inspired design language. Use when Codex needs to produce process diagrams, decision trees, pipelines, or system flows that should share warm ivory backgrounds, transparent dashed grouping containers, pastel node fills, SF Pro-style sans-serif labels, smaller rounded corners, quiet orthogonal connectors, and theme-tinted text hierarchy in standalone `.html` outputs.

2026-04-10128

material-you-slides

yzlnew/infra-skills

Create presentation slides using Material You (Material Design 3) style. Generates 1280x720 HTML slides with M3 color tokens, Roboto typography, rounded cards, flow diagrams, metric cards, code blocks, and structured layouts. Use when the user asks to create slides, presentations, or decks and wants a clean, modern Material Design 3 aesthetic.

2026-02-02128

slime-user

yzlnew/infra-skills

Guide for using SLIME (LLM post-training framework for RL Scaling). Use when working with SLIME for reinforcement learning training of language models, including setup, configuration, training execution, multi-turn interactions, custom reward models, tool calling scenarios, or troubleshooting SLIME workflows. Covers GRPO, GSPO, PPO, Reinforce++, multi-agent RL, VLM training, FSDP/Megatron backends, SGLang integration, dynamic sampling, and custom generation functions.

2026-01-14128

megatron-memory-estimator

yzlnew/infra-skills

Estimate GPU memory usage for Megatron-based MoE (Mixture of Experts) and dense models. Use when users need to (1) estimate memory from HuggingFace model configs (DeepSeek-V3, Qwen, etc.), (2) plan GPU resource allocation for training, (3) compare different parallelism strategies (TP/PP/EP/CP), (4) determine if a model fits in available GPU memory, or (5) optimize training configurations for memory efficiency.

2026-01-10128

tilelang-developer

yzlnew/infra-skills

Write, optimize, and debug high-performance AI compute kernels using TileLang (a Python DSL for GPU programming). Use when the user requests: (1) Writing custom GPU kernels for AI workloads (GEMM, Attention, MLA, etc.), (2) Optimizing existing TileLang code for NVIDIA, AMD, or Ascend hardware, (3) Implementing non-standard operators (like DeepSeek MLA, FlashAttention variants), (4) Debugging TileLang compilation or runtime errors, or (5) Cross-platform kernel development targeting multiple GPU vendors.

2026-01-07128

Source

yzlnew

yzlnew/infra-skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

hf-architecture-tikz

description

HF Architecture → TikZ

Generate a publication-quality vertical architecture diagram (in the style of Sebastian Raschka's LLM Architecture Gallery) for any HuggingFace decoder-only LLM. The diagram annotates every sub-block with its parameter-count formula and the concrete number for the loaded config.

When to use

"Draw the architecture of <HF repo>."
"Visualize how <model> is structured" / "make a diagram of <model> like Raschka's gallery."
"I want a TikZ figure of <model> for a paper / blog post."
The user mentions DeepSeek-V4-Flash, mHC / Hyper-Connections, MLA, MoE, sparse attention, MTP, and asks for a figure.

If the user just wants memory / parallelism numbers, prefer megatron-memory-estimator instead.

Quick start

cd hf-architecture-tikz/

# 1. Pull config from HF + emit normalized arch.json
uv run python scripts/extract_arch.py deepseek-ai/DeepSeek-V4-Flash \
    --output examples/deepseek-v4-flash/arch.json

# 2. Render TikZ from arch.json
uv run python scripts/render_tikz.py \
    examples/deepseek-v4-flash/arch.json \
    --output examples/deepseek-v4-flash/deepseek-v4-flash.tex

# 3. Compile to PNG
bash scripts/compile.sh examples/deepseek-v4-flash/deepseek-v4-flash.tex

For a model with custom code (e.g. brand-new architectures), pass --trust-remote-code. For a local config:

uv run python scripts/extract_arch.py /path/to/config.json --output arch.json

Workflow

Acquire config. extract_arch.py tries transformers.AutoConfig first; if the installed transformers doesn't recognize the model_type (e.g. deepseek_v4 introduces hc_mult, compress_ratios), it falls back to raw JSON via huggingface_hub.hf_hub_download. Local file paths bypass network.
Detect architecture family. Pure config-field rules — see references/architecture_families.md. The script labels the model with a family tag (mha, gqa, mla, dsv4) plus orthogonal flags (MoE, hash routing, shared experts, MTP, tied LM head, first_k_dense_replace).
Compute parameter counts. Closed-form formulas keyed by family — see references/param_formulas.md. The script (not Claude) does the arithmetic and emits arch.json with one entry per architectural unit, each carrying name, family, shape_in, shape_out, formula_symbolic, formula_concrete, param_count.
Assemble TikZ. render_tikz.py reads arch.json plus templates/anthropic.tex.j2 (Jinja2 template — all block macros are inlined for shared coordinate-space layout). The repeated transformer block is drawn once with a × N layers annotation; per-layer-varying behavior (V4-Flash compress_ratios, hash vs score routing) appears as a small pattern strip beneath the block.
Compile. bash scripts/compile.sh out.tex runs xelatex ×2 (TikZ fit/positioning needs a second pass) then pdftocairo -png -r 300 -singlefile. Falls back to pdflatex if XeTeX is unavailable.

Architecture family detection

Detection rules live in references/architecture_families.md. Summary:

Family	Detector	Examples
`dsv4`	`model_type == "deepseek_v4"` or presence of `hc_mult`+`compress_ratios`+`index_n_heads`	DeepSeek-V4-Flash
`mla`	`q_lora_rank` + `kv_lora_rank` + `qk_nope_head_dim` + `qk_rope_head_dim` + `v_head_dim`	DeepSeek-V2/V3
`gqa`	`num_key_value_heads < num_attention_heads`	Llama-3, Qwen3, Mistral
`mha`	otherwise	GPT-2, OPT

Orthogonal flags: MoE (n_routed_experts/num_local_experts), hash routing (num_hash_layers > 0), shared experts (n_shared_experts > 0), MTP head (num_nextn_predict_layers > 0), tied LM head (tie_word_embeddings), dense-prefix layers (first_k_dense_replace > 0).

Parameter formulas

Full table in references/param_formulas.md. One-line summary per family attention: MHA 4·d²; GQA 2·d² + 2·d·Hkv·dh; MLA six projections; DSv4 wq_a + q_norm + wq_b + wkv + kv_norm + wo_a + wo_b + attn_sink (+ Compressor + Indexer). SwiGLU 3·d·f. Standard MoE = E routed experts (each 3·d·f) + router d·E + Es shared. Hash MoE replaces router with a vocab×topk token→expert table.

Worked example: DeepSeek-V4-Flash

The example under examples/deepseek-v4-flash/ covers the most architecturally novel components in the supported set:

Hyper-Connections (mHC): four parallel hidden-state copies, with Sinkhorn-balanced reduction (hc_sinkhorn_iters=20) before each sublayer and weighted expansion + cross-copy mixing after. Drawn as a fan-in / fan-out inside each block.
Sparse Attention: Q-LoRA (d → q_lora_rank → H·dh), KV projection (d → dh, Hkv=1), per-layer Compressor (overlap pooling for compress_ratio=4, block pooling for compress_ratio=128), learned Indexer for compress_ratio=4 layers (top-index_topk=512 selection over compressed KV), sliding window of 128, grouped O-LoRA (o_groups=8, o_lora_rank=1024).
MoE with hash routing: first 3 layers use a learned tid2eid table (vocab × topk); remaining 40 layers use sqrtsoftplus scoring + top-6 routing.
MTP head: one MTPBlock (= e_proj + h_proj + their RMSNorms + a full Block) for next-token prediction.
Compress-ratios pattern strip: drawn beneath the block to make the per-layer alternation [0, 0, 4, 128, 4, 128, …, 4, 0] visible.

Customization

Palette. Reuses the warm-pastel palette from tikz-flowchart/themes/anthropic.md (lavender = attention, mint = norm, teal = projection, cream = router/MoE infra, amber = experts, peach = embedding/output).
Detail level. The default is full expansion (every sub-block separately). To collapse sub-blocks, edit the dsv4 branch of templates/anthropic.tex.j2 and replace the inner attention expansion with a single rounded card.
Other models. The non-dsv4 branch of templates/anthropic.tex.j2 covers mha / gqa / mla (with optional MoE FFN) as a simpler vertical stack. The renderer dispatches based on the family flag emitted by extract_arch.py.

Troubleshooting

AutoConfig raises on unknown fields. Expected for very new model types. The loader catches and falls back to raw JSON automatically. If both fail, pass a local config.json path.
mbridge is unavailable / unsupported model. Not required — we use transformers + raw JSON. mbridge is referenced only for cross-checking V3/Qwen counts.
trust_remote_code warnings. extract_arch.py does not enable this flag silently. Pass --trust-remote-code only if the user explicitly requests it.
Tied embeddings double-counting. When tie_word_embeddings=True, the embedding-table contribution is folded into the LM head and not counted twice.
Tall PNG. Full expansion + side annotations + MTP branch typically renders to 4–6k pixels tall. Use --no-mtp (renderer flag) to suppress the MTP branch if you need a shorter figure.
xelatex not installed. The compile script falls back to pdflatex automatically. Font macros are guarded with \IfFontExistsTF.

Dependencies

Python: transformers, huggingface_hub, jinja2. Run via uv run. System: xelatex (preferred) or pdflatex; pdftocairo (from poppler).