Skip to main content
在 Manus 中运行任何 Skill
一键导入
$pwd:
facebookexperimental
GitHub 创作者资料

facebookexperimental

按仓库查看 1 个 GitHub 仓库中的 8 个已收集 skills,并展示近似职业覆盖。

已收集 skills
8
仓库
1
职业领域
1
更新
2026-05-29
职业覆盖
该创作者主要覆盖的职业大类。
仓库分布

Skills 分布在哪些仓库

按已收集 skill 数展示主要仓库,并显示它们在该创作者目录中的占比和职业覆盖。

仓库浏览

仓库与代表性 skills

#001
triton
8 个 skills17051更新于 2026-05-29
占该创作者 100%
tlx-api-reference
软件开发工程师

TLX DSL API reference for low-level GPU primitives. Use when writing or modifying TLX kernel code that uses barriers (mbarrier, named barriers), memory allocation (local_alloc, SMEM, TMEM), TMA operations, warp specialization (async_tasks, async_task), CLC (cluster launch control), or wgmma instructions. Covers Hopper and Blackwell hardware differences.

2026-05-29
proxy-fence-insertion
软件开发工程师

Use when working on fence-related compiler passes, TMA store lowering, proxy fence insertion, investigating missing or spurious fences, or debugging correctness issues in TLX kernels that use tlx.async_descriptor_store or MMA operations.

2026-05-22
autows-testing
软件质量保证分析师与测试员

Run autoWS (automatic warp specialization) correctness tests. Use when working on autoWS compiler code — files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Do NOT use TLX correctness tests (third_party/tlx/tutorials/testing/test_correctness.py) for autoWS work — those test manual warp specialization via TLX, not the automatic compiler pipeline.

2026-05-21
autows-docs
软件开发工程师

Consult and maintain AutoWS documentation. Use BEFORE exploring AutoWS source code — when investigating, planning, or modifying files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Also use AFTER making non-trivial changes to AutoWS code to keep docs in sync.

2026-04-25
tma-illegal-instruction
软件开发工程师

Diagnose CUDA "illegal instruction" / kernel crashes on Triton kernels that reference to TMA loads or stores (`make_tensor_descriptor`, `TensorDescriptor`, `descriptor.load`, `descriptor.store`, `tl.async_descriptor_load`, async TMA copies) as the source code line. Use when the user reports CUDA error 716, "an illegal instruction was encountered", segfault inside a TMA op, kernel hang followed by an illegal instruction trap, or a crash that only fires on the first or last tile of a launch. Covers the pattern where a TMA store/load is issued at an offset entirely past a tensor's shape — TMA does NOT silently mask out-of-bounds tile accesses; it traps. The root cause is almost never "missing in-kernel mask" — it is commonly a structural launcher / tile-mapping bug.

2026-04-23
barrier-visualization
软件开发工程师

Produce a structured barrier report for AutoWS (automatic warp specialization) IR. Use when the user wants to visualize, audit, or debug barrier usage across warp-specialized partitions, or when debugging a GPU kernel hang (deadlock). For hangs, first dump IR using the ir-debugging skill, then run this barrier analysis to identify mismatched arrive/wait counts, missing backward barriers, or other synchronization issues that cause deadlocks. Covers mbarriers, named barriers, tcgen05 commit, TMA-implicit arrives, Aref-based synchronization, and producer/consumer barrier patterns.

2026-04-13
ir-debugging
计算机程序员

Debug Triton compilation by dumping IR at each stage (TTIR, TTGIR, LLVM, PTX). Use when investigating compilation failures, kernel performance, register spills, or when user asks to inspect IR output. Covers TRITON_KERNEL_DUMP, MLIR_ENABLE_DUMP, LLVM_IR_ENABLE_DUMP, TRITON_DUMP_PTXAS_LOG, and related env vars.

2026-02-12
kernel-perf-testing
计算机程序员

Run TLX kernel performance benchmarks on Hopper and Blackwell GPUs. Use when user asks to benchmark, profile, or measure performance of any TLX kernel (GEMM, Flash Attention variants). Handles GPU selection, denoise wrapping, and version flags. Never run unless explicitly asked.

2026-02-12
已展示 1 / 1 个仓库
已展示全部仓库