mit einem Klick
clone-repos
// Clone SGLang, FlashInfer, sgl-cookbook, and flashinfer-trace repositories to tmp/. Use when setting up the project, preparing for kernel extraction, or when the user needs the source repositories.
// Clone SGLang, FlashInfer, sgl-cookbook, and flashinfer-trace repositories to tmp/. Use when setting up the project, preparing for kernel extraction, or when the user needs the source repositories.
Auto-collect workloads from SGLang inference runs using FlashInfer logging API. Dumps tensors, sanitizes them according to kernel definitions, and submits PR to flashinfer-trace workload repo.
Discover candidate LLMs and produce a kernel inventory — required definitions, classified as existing/new and fi_supported/fi_missing — for onboarding. Use as Phase 1 of /onboard-model, or standalone to plan onboarding work.
Generate Definition JSON files for the flashinfer-trace HuggingFace dataset by harvesting them from a short SGLang inference pass (FlashInfer's @flashinfer_api(trace=...) dumper) — or, as a fallback, by manually transcribing the schema from SGLang sources when FlashInfer doesn't yet have a trace template. Use when adding a new model, extracting GPU kernels (MLA, MoE, GQA, RMSNorm, GEMM, GDN, RoPE, sampling), or filling gaps in the dataset.
End-to-end pipeline for discovering new LLMs with novel kernels and onboarding them into FlashInfer-Bench. Orchestrates repo updates, model discovery, kernel definition generation, workload collection, and PR submission.
Add pytest tests to validate reference implementations in the flashinfer-trace HuggingFace dataset against FlashInfer or SGLang ground truth. Use when validating kernel definitions, adding tests for new op_types, or verifying reference implementations are correct.
Open the per-definition pair of PRs that publishes a model onboarding — PR 2 to the HuggingFace flashinfer-trace dataset (definition + reference test + baseline solution + workloads + blobs + eval traces) and PR 1 to flashinfer-bench (docs/model_coverage.mdx update only). Use as Phase 4 of /onboard-model.
| name | clone-repos |
| description | Clone SGLang, FlashInfer, sgl-cookbook, and flashinfer-trace repositories to tmp/. Use when setting up the project, preparing for kernel extraction, or when the user needs the source repositories. |
Clone SGLang, FlashInfer, sgl-cookbook, and flashinfer-trace repositories to the tmp/ directory.
This skill sets up the required repositories for kernel extraction, testing, and workload collection workflows. It:
tmp/ directory (if not already present) with all submodulesmain branch by default (or specified branch)Repositories:
tmp/flashinfer-trace# Clone all repos to ./tmp directory, update if exists, and install from source
/clone-repos
# Clone specific branches
/clone-repos --sglang-branch v0.4.0 --flashinfer-branch v0.2.0
sglang_branch (optional): SGLang branch to checkout (default: "main")flashinfer_branch (optional): FlashInfer branch to checkout (default: "main")cookbook_branch (optional): sgl-cookbook branch to checkout (default: "main")When executing this skill:
Create tmp directory if needed:
mkdir -p tmp
Handle SGLang repository:
# Check if repo exists
if [ -d "tmp/sglang/.git" ]; then
echo "SGLang exists, pulling latest changes..."
(cd tmp/sglang && git fetch origin && git checkout "${sglang_branch:-main}" && git reset --hard "origin/${sglang_branch:-main}" && git submodule update --init --recursive)
else
echo "Cloning SGLang with submodules..."
git clone --recurse-submodules https://github.com/sgl-project/sglang.git tmp/sglang
(cd tmp/sglang && git checkout "${sglang_branch:-main}")
fi
Note: Using (cd ...) subshell syntax ensures directory changes are isolated and don't affect subsequent commands.
Handle FlashInfer repository:
# Check if repo exists
if [ -d "tmp/flashinfer/.git" ]; then
echo "FlashInfer exists, pulling latest changes..."
(cd tmp/flashinfer && git fetch origin && git checkout "${flashinfer_branch:-main}" && git reset --hard "origin/${flashinfer_branch:-main}" && git submodule update --init --recursive)
else
echo "Cloning FlashInfer with submodules..."
git clone --recurse-submodules https://github.com/flashinfer-ai/flashinfer.git tmp/flashinfer
(cd tmp/flashinfer && git checkout "${flashinfer_branch:-main}")
fi
Note: Using (cd ...) subshell syntax ensures directory changes are isolated and don't affect subsequent commands.
Handle sgl-cookbook repository:
# Check if repo exists
if [ -d "tmp/sgl-cookbook/.git" ]; then
echo "sgl-cookbook exists, pulling latest changes..."
(cd tmp/sgl-cookbook && git fetch origin && git checkout "${cookbook_branch:-main}" && git reset --hard "origin/${cookbook_branch:-main}")
else
echo "Cloning sgl-cookbook..."
git clone https://github.com/sgl-project/sgl-cookbook.git tmp/sgl-cookbook
(cd tmp/sgl-cookbook && git checkout "${cookbook_branch:-main}")
fi
Note: sgl-cookbook doesn't require submodules or installation. It contains serving configuration files only.
Handle flashinfer-trace repository:
# Check if repo exists
if [ -d "tmp/flashinfer-trace/.git" ]; then
echo "flashinfer-trace exists, pulling latest changes..."
(cd tmp/flashinfer-trace && git fetch origin && git checkout main && git reset --hard origin/main)
else
echo "Cloning flashinfer-trace..."
git clone https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace tmp/flashinfer-trace
fi
Note: flashinfer-trace is a HuggingFace dataset repo (not GitHub). It contains kernel definitions, workloads, and blob safetensors. All workload collection writes to this directory.
Install packages from source:
# Upgrade pip once
pip install --upgrade pip
# Install FlashInfer (pyproject.toml in repo root)
(cd tmp/flashinfer && python -m pip install --no-build-isolation -e . -v)
# Install SGLang (pyproject.toml in python/ subdirectory)
(cd tmp/sglang && pip install -e "python")
Note: Subshell syntax (cd ... && command) keeps working directory unchanged.
Verify installations:
# Test imports
python -c "import sglang; print(f'SGLang: {sglang.__version__}')"
python -c "import flashinfer; print(f'FlashInfer: {flashinfer.__version__}')"
# Verify directory structure
ls tmp/sglang/python/sglang/srt/models/
ls tmp/flashinfer/flashinfer/
ls tmp/flashinfer/tests/
ls tmp/sgl-cookbook/
ls tmp/flashinfer-trace/definitions/
flashinfer-bench/
└── tmp/ # Cloned repositories (auto-updated)
├── sglang/ # SGLang repository (installed in current env)
│ └── python/sglang/srt/
│ ├── models/ # Model implementations
│ │ ├── llama.py
│ │ ├── deepseek_v3.py
│ │ ├── qwen2_moe.py
│ │ └── ...
│ └── layers/ # Layer implementations
│ ├── attention/
│ ├── moe/
│ └── layernorm.py
├── flashinfer/ # FlashInfer repository (installed in current env)
│ ├── flashinfer/ # Python package in root (not python/ subdir!)
│ │ ├── attention.py
│ │ ├── norm.py
│ │ ├── moe.py
│ │ └── ...
│ ├── tests/ # Reference tests with vanilla implementations
│ ├── csrc/ # CUDA source files
│ └── include/ # C++ headers with kernel implementations
├── sgl-cookbook/ # Serving configuration repository (NOT installed)
└── flashinfer-trace/ # HuggingFace dataset clone — single source of truth
│ # for definitions, ref tests, baselines, workloads,
│ # blobs, and eval traces. All trace edits commit here.
├── definitions/{op_type}/ # Kernel definition JSONs
├── tests/references/ # Reference tests (pytest)
├── solutions/baseline/ # FlashInfer-wrapper baseline solutions
├── workloads/{op_type}/ # Sanitized workload JSONLs
├── blob/workloads/{op_type}/ # Safetensors blobs referenced by JSONLs
└── traces/{op_type}/ # Eval traces (one entry per workload)
git submodule update --init --recursiveThis skill provides the foundation for:
tmp/flashinfer-trace/definitions/ (the HuggingFace dataset clone)tmp/flashinfer-trace/tests/references/tmp/flashinfer-trace (HuggingFace dataset clone) as the target for workload JSONL + safetensors blobs, then submits a PRExample workflow:
# Step 1: Clone SGLang, FlashInfer, and sgl-cookbook repositories
/clone-repos
# Step 2: Extract kernel definitions from a model (uses sgl-cookbook for TP/EP configs)
/extract-kernel-definitions --model-name deepseek_v3
# Step 3: Add reference tests
/add-reference-tests --op-type mla_paged
# Or run the full end-to-end pipeline
/onboard-model --model-name qwen3-235b-a22b
pip install -e) for developmenttmp/flashinfer/flashinfer/ (not in python/ subdirectory)Update this file when changing repository URLs, directory structure, or adding new repositories.