Run any Skill in Manus with one click

$pwd:

autows-docs

Name: Autows Docs
Author: facebookexperimental

// Consult and maintain AutoWS documentation. Use BEFORE exploring AutoWS source code — when investigating, planning, or modifying files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Also use AFTER making non-trivial changes to AutoWS code to keep docs in sync.

Run Skill in Manus

$ git log --oneline --stat

stars:170

forks:51

updated:April 25, 2026 at 05:07

SKILL.md

readonly

name	autows-docs
description	Consult and maintain AutoWS documentation. Use BEFORE exploring AutoWS source code — when investigating, planning, or modifying files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Also use AFTER making non-trivial changes to AutoWS code to keep docs in sync.

AutoWS Documentation

AutoWS has comprehensive design docs that live alongside the source code at:

third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/docs/

CRITICAL: Read docs BEFORE reading source

When investigating or planning changes to AutoWS code, always read the relevant docs first before exploring the source files. The docs explain the design intent, invariants, and relationships between passes — information that is difficult to reconstruct from code alone. Reading docs first will:

Give you the correct mental model before diving into implementation details
Identify which files are relevant so you search less
Surface invariants and edge cases that aren't obvious from code

How to find the right doc

Use the file map below to match your task to the relevant doc(s):

If you're working on...	Read this doc first
Overall pipeline, pass ordering	`docs/Overview.md`
Task ID assignment (Hopper)	`docs/TaskPartitionAndPropagation.md`
Splitting ops across warp groups	`docs/DataPartition.md`
Channel insertion, async copies, barriers	`docs/CodePartition.md`
Code specialization / cloning into regions	`docs/CodeSpecialization.md`
SMEM/TMEM allocation, multi-buffering	`docs/BufferAllocation.md`, `docs/AccumulationCounters.md`, `docs/SmemAllocationDesign.md`
Memory planner liveness analysis	`docs/MemoryPlannerVisualization.md`
Memory lowering (global/shared/tensor)	`docs/MemoryLowering.md`
Token/barrier lowering to hardware	`docs/TokenBarrierLowering.md`
Ping-pong scheduling	`docs/PingPongScheduling.md`
Barrier fusion/merging	`docs/BarrierFusion.md`
Operand D / accumulator handling	`docs/OperandDHandling.md`
Reuse groups for buffer sharing	`docs/ReuseGroups.md`
TMEM allocation heuristics	`docs/TMEMAllocationHeuristics.md`
Utility functions	`docs/Utilities.md`

Workflow

Read the matching doc(s) from the table above.
Then explore source files, guided by what the docs describe.
If no doc matches your task, read docs/Overview.md for the pipeline context and file map, then proceed to source.

CRITICAL: Update docs AFTER non-trivial code changes

When you make changes to AutoWS code that go beyond a simple bug fix, you must update the corresponding documentation. Specifically, update docs when:

Adding a new pass or file: Add an entry to docs/Overview.md (file map and pipeline diagram) and create a new doc if the pass is substantial.
Changing pass behavior or invariants: Update the doc that describes that pass to reflect the new behavior.
Adding or changing data structures: Update the doc that references those structures.
Changing the pipeline order: Update docs/Overview.md.
Adding new concepts or terminology: Document them in the relevant doc or create a new one if no existing doc fits.

Do NOT update docs for:

Pure bug fixes that don't change documented behavior
Code style / refactoring that preserves semantics

Doc conventions

Docs live in third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/docs/
Each doc covers one logical area (one pass or closely related group of passes)
Docs should explain why, not just what — design rationale matters
Include the file(s) the doc covers at the top
Use code snippets or IR examples to illustrate transformations

related-skills.json

same repository

tlx-api-reference.md

from "facebookexperimental/triton"

TLX DSL API reference for low-level GPU primitives. Use when writing or modifying TLX kernel code that uses barriers (mbarrier, named barriers), memory allocation (local_alloc, SMEM, TMEM), TMA operations, warp specialization (async_tasks, async_task), CLC (cluster launch control), or wgmma instructions. Covers Hopper and Blackwell hardware differences.

2026-05-29170

proxy-fence-insertion.md

from "facebookexperimental/triton"

Use when working on fence-related compiler passes, TMA store lowering, proxy fence insertion, investigating missing or spurious fences, or debugging correctness issues in TLX kernels that use tlx.async_descriptor_store or MMA operations.

2026-05-22170

autows-testing.md

from "facebookexperimental/triton"

Run autoWS (automatic warp specialization) correctness tests. Use when working on autoWS compiler code — files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Do NOT use TLX correctness tests (third_party/tlx/tutorials/testing/test_correctness.py) for autoWS work — those test manual warp specialization via TLX, not the automatic compiler pipeline.

2026-05-21170

tma-illegal-instruction.md

from "facebookexperimental/triton"

Diagnose CUDA "illegal instruction" / kernel crashes on Triton kernels that reference to TMA loads or stores (`make_tensor_descriptor`, `TensorDescriptor`, `descriptor.load`, `descriptor.store`, `tl.async_descriptor_load`, async TMA copies) as the source code line. Use when the user reports CUDA error 716, "an illegal instruction was encountered", segfault inside a TMA op, kernel hang followed by an illegal instruction trap, or a crash that only fires on the first or last tile of a launch. Covers the pattern where a TMA store/load is issued at an offset entirely past a tensor's shape — TMA does NOT silently mask out-of-bounds tile accesses; it traps. The root cause is almost never "missing in-kernel mask" — it is commonly a structural launcher / tile-mapping bug.

2026-04-23170

barrier-visualization.md

from "facebookexperimental/triton"

Produce a structured barrier report for AutoWS (automatic warp specialization) IR. Use when the user wants to visualize, audit, or debug barrier usage across warp-specialized partitions, or when debugging a GPU kernel hang (deadlock). For hangs, first dump IR using the ir-debugging skill, then run this barrier analysis to identify mismatched arrive/wait counts, missing backward barriers, or other synchronization issues that cause deadlocks. Covers mbarriers, named barriers, tcgen05 commit, TMA-implicit arrives, Aref-based synchronization, and producer/consumer barrier patterns.

2026-04-13170

ir-debugging.md

from "facebookexperimental/triton"

Debug Triton compilation by dumping IR at each stage (TTIR, TTGIR, LLVM, PTX). Use when investigating compilation failures, kernel performance, register spills, or when user asks to inspect IR output. Covers TRITON_KERNEL_DUMP, MLIR_ENABLE_DUMP, LLVM_IR_ENABLE_DUMP, TRITON_DUMP_PTXAS_LOG, and related env vars.

2026-02-12170

package.json

"author": "facebookexperimental"

"repository": "facebookexperimental/triton"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	autows-docs
description	Consult and maintain AutoWS documentation. Use BEFORE exploring AutoWS source code — when investigating, planning, or modifying files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Also use AFTER making non-trivial changes to AutoWS code to keep docs in sync.

AutoWS Documentation

AutoWS has comprehensive design docs that live alongside the source code at:

third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/docs/

CRITICAL: Read docs BEFORE reading source

Give you the correct mental model before diving into implementation details
Identify which files are relevant so you search less
Surface invariants and edge cases that aren't obvious from code

How to find the right doc

Use the file map below to match your task to the relevant doc(s):

If you're working on...	Read this doc first
Overall pipeline, pass ordering	`docs/Overview.md`
Task ID assignment (Hopper)	`docs/TaskPartitionAndPropagation.md`
Splitting ops across warp groups	`docs/DataPartition.md`
Channel insertion, async copies, barriers	`docs/CodePartition.md`
Code specialization / cloning into regions	`docs/CodeSpecialization.md`
SMEM/TMEM allocation, multi-buffering	`docs/BufferAllocation.md`, `docs/AccumulationCounters.md`, `docs/SmemAllocationDesign.md`
Memory planner liveness analysis	`docs/MemoryPlannerVisualization.md`
Memory lowering (global/shared/tensor)	`docs/MemoryLowering.md`
Token/barrier lowering to hardware	`docs/TokenBarrierLowering.md`
Ping-pong scheduling	`docs/PingPongScheduling.md`
Barrier fusion/merging	`docs/BarrierFusion.md`
Operand D / accumulator handling	`docs/OperandDHandling.md`
Reuse groups for buffer sharing	`docs/ReuseGroups.md`
TMEM allocation heuristics	`docs/TMEMAllocationHeuristics.md`
Utility functions	`docs/Utilities.md`

Workflow

Read the matching doc(s) from the table above.
Then explore source files, guided by what the docs describe.
If no doc matches your task, read docs/Overview.md for the pipeline context and file map, then proceed to source.

CRITICAL: Update docs AFTER non-trivial code changes

When you make changes to AutoWS code that go beyond a simple bug fix, you must update the corresponding documentation. Specifically, update docs when:

Adding a new pass or file: Add an entry to docs/Overview.md (file map and pipeline diagram) and create a new doc if the pass is substantial.
Changing pass behavior or invariants: Update the doc that describes that pass to reflect the new behavior.
Adding or changing data structures: Update the doc that references those structures.
Changing the pipeline order: Update docs/Overview.md.
Adding new concepts or terminology: Document them in the relevant doc or create a new one if no existing doc fits.

Do NOT update docs for:

Pure bug fixes that don't change documented behavior
Code style / refactoring that preserves semantics

Doc conventions

Docs live in third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/docs/
Each doc covers one logical area (one pass or closely related group of passes)
Docs should explain why, not just what — design rationale matters
Include the file(s) the doc covers at the top
Use code snippets or IR examples to illustrate transformations

autows-docs

AutoWS Documentation

CRITICAL: Read docs BEFORE reading source

How to find the right doc

Workflow

CRITICAL: Update docs AFTER non-trivial code changes

Doc conventions

More from this repository

More from this repository

AutoWS Documentation

CRITICAL: Read docs BEFORE reading source

How to find the right doc

Workflow

CRITICAL: Update docs AFTER non-trivial code changes

Doc conventions