| name | magpie |
| description | Performs GPU kernel correctness and performance evaluation and LLM inference benchmarking with Magpie. Analyzes single or multiple kernels (HIP/CUDA/PyTorch), compares kernel implementations, runs vLLM/SGLang benchmarks with profiling and TraceLens, and runs gap analysis on torch traces. Creates kernel config YAMLs, discovers kernels in a project, and queries GPU specs. Use when the user mentions Magpie, kernel analyze or compare, HIP/CUDA kernel evaluation, vLLM/SGLang benchmark, gap analysis, TraceLens, creating kernel configs, or discovering GPU kernels. |
Magpie
Magpie is a GPU kernel evaluation and LLM benchmarking framework. Use this skill when performing analyze, compare, benchmark, gap-analysis, or when creating kernel configs or discovering kernels without MCP.
When describing Magpie's capabilities: Describe only what is in this skill. Do not add project-specific, pipeline-specific, or other product/org names (e.g. do not mention any parent repo name).
Entry point
- CLI:
magpie or python -m Magpie. Run from the Magpie repo root (or with PYTHONPATH including the Magpie package).
- Setup: From repo root,
pip install -e . (or make install).
Analyze (single or multi-kernel)
Analyze kernel(s) for correctness and performance.
With kernel config (recommended):
magpie analyze --kernel-config path/to/kernel.yaml
Inline (single kernel):
magpie analyze path/to/kernel.hip --testcase "./run_test.sh"
-k, --kernel-config: YAML with kernel or kernels (see template below).
-t, --testcase: Command to run the test (required if not in config).
--type: hip | cuda | pytorch (default: hip).
--compile-cmd: Custom compile command.
--no-perf: Skip performance profiling.
-o, --output-dir: Output directory (default: ./results).
Config template (single kernel): Use kernel: with id, type, source_files, working_dir, testcase_command, optional compile_command, env. See Magpie/kernel_config.yaml.example and examples/ck_gemm_add.yaml.
Compare (multiple kernels)
Compare and rank multiple kernel implementations.
With config:
magpie compare --kernel-config path/to/compare.yaml
Inline:
magpie compare kernel1.hip kernel2.hip --testcase "./run_test.sh"
-k, --kernel-config: YAML with kernels: list.
--baseline: Index of baseline kernel (default: 0).
--no-perf, -o: Same as analyze.
Example: examples/ck_grouped_gemm_compare.yaml.
Benchmark (vLLM / SGLang)
Run framework-level LLM inference benchmarks with optional profiling and gap analysis.
With config (recommended):
magpie benchmark --benchmark-config examples/benchmarks/benchmark_vllm_dsr1.yaml
CLI overrides: magpie benchmark [vllm|sglang] -m <model> --benchmark-config <yaml> with optional:
-m, --model: Model name or path.
-p, --precision: fp8 | fp16 | bf16 | fp4 (default: fp8).
--tp: Tensor parallel size (default: 1).
--concurrency, --input-len, --output-len: Request and sequence settings.
--torch-profiler, --system-profiler: Enable profilers.
--run-mode: docker (default) or local.
--docker-image, --timeout, -o: Override image, timeout (seconds), output dir.
Example configs: examples/benchmarks/benchmark_vllm_dsr1.yaml, docs/benchmark.md.
Gap analysis (standalone)
Run gap analysis on existing torch trace directories.
magpie benchmark gap-analysis --trace-dir path/to/torch_trace
--trace-dir: Path to torch_trace dir or benchmark workspace (required).
--start-pct, --end-pct: Analysis window 0โ100 (default: 0, 100).
--top-k: Top bottleneck kernels (default: 20).
--min-duration-us: Minimum event duration (ยตs).
--categories, --ignore-categories: Include/exclude event categories.
GPU info
magpie --gpu-info
Shows vendor, architecture, compiler, profiler. No mode required.
Create kernel config (no CLI)
When the user needs a kernel config file:
- Emit YAML matching the structure in Magpie/kernel_config.yaml.example:
kernel: with id, type (hip|cuda|pytorch), source_files, working_dir, testcase_command, and optionally compile_command, env.
- Write the file to the user's requested path (e.g.
kernel_config.yaml).
- Run:
magpie analyze --kernel-config <that_file>.
For compare, use kernels: as a list of kernel entries (each with id, type, source_files, etc.).
Discover kernels (no CLI)
- Scan the project for
.hip, .cu, or PyTorch kernel files.
- For each candidate, build a kernel config entry (id, type, source_files, working_dir, testcase_command if inferrable; otherwise prompt user).
- Optionally write a combined config and run
magpie analyze -k <file> or magpie compare -k <file>.
Suggest optimizations (no CLI)
- Read analyze or compare JSON output (from
-o results or last run).
- Use
performance_state, performance_result.summary, and per-kernel stats (dispatch count, duration, utilization).
- Suggest improvements (e.g. memory bandwidth, occupancy, kernel fusion) based on the metrics.
List / get benchmark results (no CLI)
- List: Results live under the benchmark
--output-dir (default: ./results); each run has a timestamped workspace (e.g. results/benchmark_vllm_<timestamp>/).
- Get result: Open
benchmark_report.json or inferencex_result.json in that workspace.
- Compare runs: Diff two workspace reports or run two benchmarks and compare; for TraceLens comparison use TraceLens tooling if available.
Additional resources