Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

liger-kernel-perf

Name: Liger Kernel Perf
Author: linkedin

// Optimizes the performance of existing Liger Kernel Triton kernels. Profiles kernels, diagnoses bottlenecks (memory-bound vs compute-bound), generates multiple optimization variants with benchmarking, and applies the best variant while maintaining correctness. Supports GPU architecture-specific optimization (Ampere, Hopper, Blackwell). Use when a user asks to optimize, speed up, tune, profile, or reduce memory of an existing Liger kernel.

Exécuter dans Manus

$ git log --oneline --stat

stars:6 394

forks:535

updated:15 mai 2026 à 19:59

Explorateur de fichiers

7 fichiers

SKILL.md

readonly

name

liger-kernel-perf

description

Optimizes the performance of existing Liger Kernel Triton kernels. Profiles kernels, diagnoses bottlenecks (memory-bound vs compute-bound), generates multiple optimization variants with benchmarking, and applies the best variant while maintaining correctness. Supports GPU architecture-specific optimization (Ampere, Hopper, Blackwell). Use when a user asks to optimize, speed up, tune, profile, or reduce memory of an existing Liger kernel.

Liger Kernel Perf

Optimizes existing Liger Kernel Triton kernels through a 3-stage pipeline: Profile, Optimize, Finalize. Supports interactive mode (human checkpoints between stages) and autonomous mode (runs end-to-end). NVIDIA GPUs only.

Mode Detection

Interactive mode (default): Human checkpoints between each stage
Autonomous mode: User says "just optimize it", "run without asking me", "optimize autonomously" → all stages run end-to-end, user sees only the final report

Input Parsing

Extract from the user's request:

Field	Description	Default
`target_kernel`	Which kernel to optimize (e.g., "rms_norm", "cross_entropy")	Required
`optimization_goal`	speed / memory / balanced	balanced
`scope`	Specific pass (forward/backward), input regime, or general	general
`target_gpu`	Ampere / Hopper / Blackwell / auto-detect	auto-detect
`autonomy`	interactive / autonomous	interactive
`max_variants`	Max optimization variants to try	8
`target_metric`	Optional concrete target (e.g., "forward under 0.3ms at hidden_size=4096")	none

Pre-Flight Validation

Before starting the pipeline, validate:

Kernel file exists: src/liger_kernel/ops/{kernel}.py
Benchmark script exists: benchmark/scripts/benchmark_{kernel}.py
Test file exists: test/transformers/test_{kernel}.py
GPU is available and CUDA works
Project is installed in dev mode (pip install -e ".[dev]")

If any validation fails, report clearly and stop.

Pipeline

Stage 1: Profile

Follow the Profiler workflow in profiler.md. If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.

This stage:

Creates the workspace directory optimization/{kernel}/
Copies the original kernel as a snapshot
Runs baseline benchmarks using the existing benchmark script
Detects GPU architecture (or uses user-specified target)
Optionally runs NCU profiling (if ncu is available)
Analyzes the kernel code (tier classification, patterns, optimization opportunities)
Classifies the bottleneck: memory-bound vs compute-bound
Produces an optimization profile with a recommended strategy order
Saves profile to optimization/{kernel}/profile.md

Human checkpoint (interactive mode): Present the optimization profile with bottleneck diagnosis and proposed strategy order. Confirm before proceeding.

Stage 2: Optimize

Follow the Optimizer workflow in optimizer.md.

This stage runs an autonomous optimization loop:

Read the optimization profile and original kernel
Always try parameter tuning first (BLOCK_SIZE, num_warps, num_stages manual sweep -- NOT @triton.autotune)
Then apply diagnosis-driven techniques from optimization-strategies.md
For each variant: a. Generate the variant code → optimization/{kernel}/{kernel}_vN.py b. Write the variant lab notebook → optimization/{kernel}/{kernel}_vN_notes.md c. Run quick smoke test (single shape, float32, forward+backward) → discard on failure d. Run the full existing benchmark script → optimization/{kernel}/benchmarks/vN_results.csv e. Check guardrails (no catastrophic regressions) f. Update the variant notes with actual results
Read all prior variant notes before generating the next variant
Stop when: budget exhausted, 2 consecutive variants with <1% improvement, or target metric met
Produce a comparison table of ALL variants

Human checkpoint (interactive mode): Present the comparison table across all variants. User approves the winner (or skill picks best if autonomous).

Stage 3: Finalize

Follow the Finalizer workflow in finalizer.md.

This stage:

Applies the winning variant in-place to src/liger_kernel/ops/{kernel}.py
Runs the full test suite: python -m pytest test/transformers/test_{kernel}.py -xvs (hard gate)
Runs checkstyle: make checkstyle (auto-fix with ruff check . --fix && ruff format .)
Generates 3-way comparison plots (original liger vs optimized liger vs huggingface baseline) using benchmarks_visualizer.py
Generates the final optimization report → optimization/{kernel}/report.md
Creates a PR with only the kernel code changes (no plots or optimization workspace files)
Presents the before/after summary with plots

Human checkpoint (interactive mode): Present the final report with before/after numbers, comparison plots, and test results.

Guardrails

These apply to EVERY variant, regardless of mode:

Guardrail	Threshold	Action
Non-target metric regression	>5% worse	Reject variant
Cross-pass regression	>10% on one pass to marginally improve other	Reject variant
Smoke test failure	Any correctness failure	Discard variant immediately
Full test suite failure	Any	Do NOT apply winner, report failure, stop
Checkstyle failure	Any	Auto-fix with ruff, retry once

Reference Files

profiler.md -- Profiler Agent specification
optimizer.md -- Optimizer Agent specification
finalizer.md -- Finalizer Agent specification
optimization-strategies.md -- Catalog of optimization techniques
templates/optimization-profile.md -- Profiling output format (cross-stage contract)
templates/variant-notes.md -- Per-variant lab notebook format

related-skills.json

même dépôt

liger-autopatch.md

from "linkedin/Liger-Kernel"

Adds Liger Kernel support for a new HuggingFace Transformers model, or modifies existing monkey-patching. Generates lce_forward, monkey-patch function, tests, and README entry. Use when adding a new model to Liger Kernel, when a user asks to patch an unsupported model, when extending MODEL_TYPE_TO_APPLY_LIGER_FN, or when modifying/updating/fixing an existing monkey-patch (e.g., adding a new kernel to an already-supported model, fixing instance patching, updating a patch for upstream HF changes).

2026-05-156.4k

liger-kernel-dev.md

from "linkedin/Liger-Kernel"

Develops production-ready Triton kernels for Liger Kernel. Creates new kernels from PyTorch operations (local files, URLs, code snippets, or natural language) with ops, module wrappers, functional APIs, unit tests, benchmarks, and plots. Also modifies existing Liger kernels. Use when adding a new Triton kernel, converting a PyTorch operation to Triton, or updating an existing Liger kernel.

2026-05-156.4k

package.json

"author": "linkedin"

"repository": "linkedin/Liger-Kernel"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name

liger-kernel-perf

description

Liger Kernel Perf

Mode Detection

Interactive mode (default): Human checkpoints between each stage
Autonomous mode: User says "just optimize it", "run without asking me", "optimize autonomously" → all stages run end-to-end, user sees only the final report

Input Parsing

Extract from the user's request:

Field	Description	Default
`target_kernel`	Which kernel to optimize (e.g., "rms_norm", "cross_entropy")	Required
`optimization_goal`	speed / memory / balanced	balanced
`scope`	Specific pass (forward/backward), input regime, or general	general
`target_gpu`	Ampere / Hopper / Blackwell / auto-detect	auto-detect
`autonomy`	interactive / autonomous	interactive
`max_variants`	Max optimization variants to try	8
`target_metric`	Optional concrete target (e.g., "forward under 0.3ms at hidden_size=4096")	none

Pre-Flight Validation

Before starting the pipeline, validate:

Kernel file exists: src/liger_kernel/ops/{kernel}.py
Benchmark script exists: benchmark/scripts/benchmark_{kernel}.py
Test file exists: test/transformers/test_{kernel}.py
GPU is available and CUDA works
Project is installed in dev mode (pip install -e ".[dev]")

If any validation fails, report clearly and stop.

Pipeline

Stage 1: Profile

Follow the Profiler workflow in profiler.md. If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.

This stage:

Creates the workspace directory optimization/{kernel}/
Copies the original kernel as a snapshot
Runs baseline benchmarks using the existing benchmark script
Detects GPU architecture (or uses user-specified target)
Optionally runs NCU profiling (if ncu is available)
Analyzes the kernel code (tier classification, patterns, optimization opportunities)
Classifies the bottleneck: memory-bound vs compute-bound
Produces an optimization profile with a recommended strategy order
Saves profile to optimization/{kernel}/profile.md

Human checkpoint (interactive mode): Present the optimization profile with bottleneck diagnosis and proposed strategy order. Confirm before proceeding.

Stage 2: Optimize

Follow the Optimizer workflow in optimizer.md.

This stage runs an autonomous optimization loop:

Read the optimization profile and original kernel
Always try parameter tuning first (BLOCK_SIZE, num_warps, num_stages manual sweep -- NOT @triton.autotune)
Then apply diagnosis-driven techniques from optimization-strategies.md
For each variant: a. Generate the variant code → optimization/{kernel}/{kernel}_vN.py b. Write the variant lab notebook → optimization/{kernel}/{kernel}_vN_notes.md c. Run quick smoke test (single shape, float32, forward+backward) → discard on failure d. Run the full existing benchmark script → optimization/{kernel}/benchmarks/vN_results.csv e. Check guardrails (no catastrophic regressions) f. Update the variant notes with actual results
Read all prior variant notes before generating the next variant
Stop when: budget exhausted, 2 consecutive variants with <1% improvement, or target metric met
Produce a comparison table of ALL variants

Human checkpoint (interactive mode): Present the comparison table across all variants. User approves the winner (or skill picks best if autonomous).

Stage 3: Finalize

Follow the Finalizer workflow in finalizer.md.

This stage:

Applies the winning variant in-place to src/liger_kernel/ops/{kernel}.py
Runs the full test suite: python -m pytest test/transformers/test_{kernel}.py -xvs (hard gate)
Runs checkstyle: make checkstyle (auto-fix with ruff check . --fix && ruff format .)
Generates 3-way comparison plots (original liger vs optimized liger vs huggingface baseline) using benchmarks_visualizer.py
Generates the final optimization report → optimization/{kernel}/report.md
Creates a PR with only the kernel code changes (no plots or optimization workspace files)
Presents the before/after summary with plots

Human checkpoint (interactive mode): Present the final report with before/after numbers, comparison plots, and test results.

Guardrails

These apply to EVERY variant, regardless of mode:

Guardrail	Threshold	Action
Non-target metric regression	>5% worse	Reject variant
Cross-pass regression	>10% on one pass to marginally improve other	Reject variant
Smoke test failure	Any correctness failure	Discard variant immediately
Full test suite failure	Any	Do NOT apply winner, report failure, stop
Checkstyle failure	Any	Auto-fix with ruff, retry once

Reference Files

profiler.md -- Profiler Agent specification
optimizer.md -- Optimizer Agent specification
finalizer.md -- Finalizer Agent specification
optimization-strategies.md -- Catalog of optimization techniques
templates/optimization-profile.md -- Profiling output format (cross-stage contract)
templates/variant-notes.md -- Per-variant lab notebook format

liger-kernel-perf

Liger Kernel Perf

Mode Detection

Input Parsing

Pre-Flight Validation

Pipeline

Stage 1: Profile

Stage 2: Optimize

Stage 3: Finalize

Guardrails

Reference Files

Plus depuis ce dépôt

Liger Kernel Perf

Mode Detection

Input Parsing

Pre-Flight Validation

Pipeline

Stage 1: Profile

Stage 2: Optimize

Stage 3: Finalize

Guardrails

Reference Files

Plus depuis ce dépôt