Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

running-with-buck

Sterne173

Forks55

Aktualisiert16. Juni 2026 um 18:08

How to build and run GPU targets under Buck in fbcode. Use when invoking buck2 run / buck2 build for any GPU benchmark, test, or kernel — selecting the GPU architecture and CUDA version, using @mode/opt and the beta Triton modifier, passing environment variables through, and running from the right directory. Covers the general requirements plus the B200/GB200 (b200a, CUDA >= 12.8) and GB300 (b300a, CUDA >= 13.0) hardware requirements.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

facebookexperimental

facebookexperimental/triton

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

SoftwareentwicklerInformatik- und Mathematikberufe·SOC 15-1252

SKILL.md

readonly

name

running-with-buck

description

Running GPU Targets with Buck

This skill covers the mechanics of building and running GPU targets under Buck in fbcode. It is target-agnostic: substitute your own <buck target> and program arguments.

General requirements

Run from fbsource/fbcode. cd to fbsource/fbcode before invoking Buck. The @mode/opt flags (and other @mode/... files) only resolve when Buck is run from there.
Use @mode/opt. It provides the core GPU build configuration. @mode/opt generally sets up the GPU build, but some tritonbench targets still pass -c fbcode.enable_gpu_sections=true explicitly — add it if a target's GPU sections fail to build.
Build against the beta Triton with -m ovr_config//triton:beta. This directory is the beta Triton compiler. Pass this modifier so the target builds and runs against the beta Triton in this tree rather than the default/stable Triton — without it, changes made here are not exercised.
Select the GPU architecture with -c fbcode.nvcc_arch=<arch> and, where required, the CUDA version with -c fbcode.platform010_cuda_version=<ver> (see Hardware requirements).

Environment variables prefix the buck2 run and are forwarded to the launched process:

<ENV VARS> buck2 run @mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=<arch> [-c fbcode.platform010_cuda_version=<ver>] \
  <buck target> -- <program args>

buck2 build vs buck2 run. Use buck2 build <target> to compile only (e.g. to surface a compile failure without executing); use buck2 run <target> -- <args> to build and run. Program arguments go after --.

Hardware requirements

Pick the arch (and CUDA version) for the single GPU you are targeting:

Hardware	`fbcode.nvcc_arch`	`fbcode.platform010_cuda_version`
Hopper (H100)	`h100a`	(default)
Blackwell B200 / GB200	`b200a`	`>= 12.8`
Blackwell GB300	`b300a`	`>= 13.0`

Notes:

B200 / GB200 require CUDA >= 12.8. Existing tritonbench targets pin 12.8; use the version your build expects.
GB300 requires arch b300a and CUDA >= 13.0.
Set the CUDA version explicitly whenever a minimum applies, since the platform default may be older than the arch requires.

Examples

Blackwell GB300 (b300a, CUDA 13.0), from fbsource/fbcode:

buck2 run @mode/opt -m ovr_config//triton:beta \
  -c fbcode.nvcc_arch=b300a \
  -c fbcode.platform010_cuda_version=13.0 \
  <buck target> -- <program args>

Blackwell B200 / GB200 (b200a, CUDA >= 12.8):

buck2 run @mode/opt -m ovr_config//triton:beta \
  -c fbcode.nvcc_arch=b200a \
  -c fbcode.platform010_cuda_version=12.8 \
  <buck target> -- <program args>

Hopper (h100a):

buck2 run @mode/opt -m ovr_config//triton:beta \
  -c fbcode.nvcc_arch=h100a \
  <buck target> -- <program args>

With env vars forwarded (e.g. enabling a feature for the run):

SOME_ENV=1 buck2 run @mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=b300a \
  -c fbcode.platform010_cuda_version=13.0 \
  <buck target> -- <program args>

Compile only (surface a build/compile failure without running):

buck2 build @mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=b300a \
  -c fbcode.platform010_cuda_version=13.0 \
  <buck target>

Mehr aus diesem Repository

gleiches Repository

autows-authoring

facebookexperimental/triton

Author Triton kernels with automatic warp specialization (AutoWS). Use when writing new AutoWS kernels, adding warp_specialize=True to tl.range loops, choosing tl.range kwargs and JIT options, debugging why WS was not applied, or structuring a kernel to work with both Meta WS and upstream OAI Triton. Covers GEMM and Flash Attention patterns on Hopper and Blackwell.

2026-06-23173

barrier-visualization

facebookexperimental/triton

Produce a structured barrier report for AutoWS (automatic warp specialization) IR. Use when the user wants to visualize, audit, or debug barrier usage across warp-specialized partitions, or when debugging a GPU kernel hang (deadlock). For hangs, first dump IR using the ir-debugging skill, then run this barrier analysis to find the barrier that actually deadlocks -- reasoning with the mbarrier phase model (NOT raw arrive/wait counts, which give false positives), plus missing backward barriers and other synchronization issues. Covers mbarriers, named barriers, tcgen05 commit, TMA-implicit arrives, Aref-based synchronization, and producer/consumer barrier patterns.

2026-06-22173

ir-override-ablation

facebookexperimental/triton

Design and run Triton TTGIR debugging ablations using ir_override. Use when reducing a provided or dumped TTGIR, trying user-provided or agent-generated ablation/oblation ideas, updating a test harness around ir_override, or preserving a compile/runtime failure while simplifying IR to expose a fundamental compiler or lowering gap.

2026-06-18173

debug-failing-gpu

facebookexperimental/triton

Recover from GPU-busy / GPU-unavailable failures. Use when a command (pytest, python, a TLX/Triton kernel run, a benchmark) fails with errors indicating the GPU is busy, out of memory, or unavailable — e.g. "CUDA error: out of memory", "all CUDA-capable devices are busy or unavailable", "CUDA-capable device(s) is/are busy or unavailable", "RuntimeError: No CUDA GPUs are available", "device-side assert", or a hang on the first CUDA call. Runs find_working_gpu.sh to locate a healthy GPU and re-runs the failed command pinned to it via CUDA_VISIBLE_DEVICES.

2026-06-15173

tlx-api-reference

facebookexperimental/triton

TLX DSL API reference for low-level GPU primitives. Use when writing or modifying TLX kernel code that uses barriers (mbarrier, named barriers), memory allocation (local_alloc, SMEM, TMEM), TMA operations, warp specialization (async_tasks, async_task), CLC (cluster launch control), or wgmma instructions. Covers Hopper and Blackwell hardware differences.

2026-06-08173

proxy-fence-insertion

facebookexperimental/triton

Use when working on fence-related compiler passes, TMA store lowering, proxy fence insertion, investigating missing or spurious fences, or debugging correctness issues in TLX kernels that use tlx.async_descriptor_store or MMA operations.

2026-05-22173

name

running-with-buck

description

Running GPU Targets with Buck

This skill covers the mechanics of building and running GPU targets under Buck in fbcode. It is target-agnostic: substitute your own <buck target> and program arguments.

General requirements

Run from fbsource/fbcode. cd to fbsource/fbcode before invoking Buck. The @mode/opt flags (and other @mode/... files) only resolve when Buck is run from there.
Use @mode/opt. It provides the core GPU build configuration. @mode/opt generally sets up the GPU build, but some tritonbench targets still pass -c fbcode.enable_gpu_sections=true explicitly — add it if a target's GPU sections fail to build.
Build against the beta Triton with -m ovr_config//triton:beta. This directory is the beta Triton compiler. Pass this modifier so the target builds and runs against the beta Triton in this tree rather than the default/stable Triton — without it, changes made here are not exercised.
Select the GPU architecture with -c fbcode.nvcc_arch=<arch> and, where required, the CUDA version with -c fbcode.platform010_cuda_version=<ver> (see Hardware requirements).

Environment variables prefix the buck2 run and are forwarded to the launched process:

<ENV VARS> buck2 run @mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=<arch> [-c fbcode.platform010_cuda_version=<ver>] \
  <buck target> -- <program args>

buck2 build vs buck2 run. Use buck2 build <target> to compile only (e.g. to surface a compile failure without executing); use buck2 run <target> -- <args> to build and run. Program arguments go after --.

Hardware requirements

Pick the arch (and CUDA version) for the single GPU you are targeting:

Hardware	`fbcode.nvcc_arch`	`fbcode.platform010_cuda_version`
Hopper (H100)	`h100a`	(default)
Blackwell B200 / GB200	`b200a`	`>= 12.8`
Blackwell GB300	`b300a`	`>= 13.0`

Notes:

B200 / GB200 require CUDA >= 12.8. Existing tritonbench targets pin 12.8; use the version your build expects.
GB300 requires arch b300a and CUDA >= 13.0.
Set the CUDA version explicitly whenever a minimum applies, since the platform default may be older than the arch requires.

Examples

Blackwell GB300 (b300a, CUDA 13.0), from fbsource/fbcode:

buck2 run @mode/opt -m ovr_config//triton:beta \
  -c fbcode.nvcc_arch=b300a \
  -c fbcode.platform010_cuda_version=13.0 \
  <buck target> -- <program args>

Blackwell B200 / GB200 (b200a, CUDA >= 12.8):

buck2 run @mode/opt -m ovr_config//triton:beta \
  -c fbcode.nvcc_arch=b200a \
  -c fbcode.platform010_cuda_version=12.8 \
  <buck target> -- <program args>

Hopper (h100a):

buck2 run @mode/opt -m ovr_config//triton:beta \
  -c fbcode.nvcc_arch=h100a \
  <buck target> -- <program args>

With env vars forwarded (e.g. enabling a feature for the run):

SOME_ENV=1 buck2 run @mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=b300a \
  -c fbcode.platform010_cuda_version=13.0 \
  <buck target> -- <program args>

Compile only (surface a build/compile failure without running):

buck2 build @mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=b300a \
  -c fbcode.platform010_cuda_version=13.0 \
  <buck target>