Skip to main content
Manus에서 모든 스킬 실행
원클릭으로
$pwd:
facebookexperimental
GitHub creator profile

facebookexperimental

Repository-level view of 8 collected skills across 1 GitHub repositories, including approximate occupation coverage.

skills collected
8
repositories
1
occupation fields
1
updated
2026-05-29
occupation focus
Major fields detected across this creator.
repository map

Where the skills live

Top repositories by collected skill count, with their share of this creator catalog and occupation spread.

repository explorer

Repositories and representative skills

#001
triton
8 skills17051updated 2026-05-29
100% of creator
tlx-api-reference
소프트웨어 개발자

TLX DSL API reference for low-level GPU primitives. Use when writing or modifying TLX kernel code that uses barriers (mbarrier, named barriers), memory allocation (local_alloc, SMEM, TMEM), TMA operations, warp specialization (async_tasks, async_task), CLC (cluster launch control), or wgmma instructions. Covers Hopper and Blackwell hardware differences.

2026-05-29
proxy-fence-insertion
소프트웨어 개발자

Use when working on fence-related compiler passes, TMA store lowering, proxy fence insertion, investigating missing or spurious fences, or debugging correctness issues in TLX kernels that use tlx.async_descriptor_store or MMA operations.

2026-05-22
autows-testing
소프트웨어 품질 보증 분석가·테스터

Run autoWS (automatic warp specialization) correctness tests. Use when working on autoWS compiler code — files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Do NOT use TLX correctness tests (third_party/tlx/tutorials/testing/test_correctness.py) for autoWS work — those test manual warp specialization via TLX, not the automatic compiler pipeline.

2026-05-21
autows-docs
소프트웨어 개발자

Consult and maintain AutoWS documentation. Use BEFORE exploring AutoWS source code — when investigating, planning, or modifying files under WarpSpecialization/, partition scheduling, warp_specialize ops, WSCodePartition, WSDataPartition, WSTaskPartition, WSMemoryPlanner, or related passes. Also use AFTER making non-trivial changes to AutoWS code to keep docs in sync.

2026-04-25
tma-illegal-instruction
소프트웨어 개발자

Diagnose CUDA "illegal instruction" / kernel crashes on Triton kernels that reference to TMA loads or stores (`make_tensor_descriptor`, `TensorDescriptor`, `descriptor.load`, `descriptor.store`, `tl.async_descriptor_load`, async TMA copies) as the source code line. Use when the user reports CUDA error 716, "an illegal instruction was encountered", segfault inside a TMA op, kernel hang followed by an illegal instruction trap, or a crash that only fires on the first or last tile of a launch. Covers the pattern where a TMA store/load is issued at an offset entirely past a tensor's shape — TMA does NOT silently mask out-of-bounds tile accesses; it traps. The root cause is almost never "missing in-kernel mask" — it is commonly a structural launcher / tile-mapping bug.

2026-04-23
barrier-visualization
소프트웨어 개발자

Produce a structured barrier report for AutoWS (automatic warp specialization) IR. Use when the user wants to visualize, audit, or debug barrier usage across warp-specialized partitions, or when debugging a GPU kernel hang (deadlock). For hangs, first dump IR using the ir-debugging skill, then run this barrier analysis to identify mismatched arrive/wait counts, missing backward barriers, or other synchronization issues that cause deadlocks. Covers mbarriers, named barriers, tcgen05 commit, TMA-implicit arrives, Aref-based synchronization, and producer/consumer barrier patterns.

2026-04-13
ir-debugging
컴퓨터 프로그래머

Debug Triton compilation by dumping IR at each stage (TTIR, TTGIR, LLVM, PTX). Use when investigating compilation failures, kernel performance, register spills, or when user asks to inspect IR output. Covers TRITON_KERNEL_DUMP, MLIR_ENABLE_DUMP, LLVM_IR_ENABLE_DUMP, TRITON_DUMP_PTXAS_LOG, and related env vars.

2026-02-12
kernel-perf-testing
컴퓨터 프로그래머

Run TLX kernel performance benchmarks on Hopper and Blackwell GPUs. Use when user asks to benchmark, profile, or measure performance of any TLX kernel (GEMM, Flash Attention variants). Handles GPU selection, denoise wrapping, and version flags. Never run unless explicitly asked.

2026-02-12
저장소 1개 중 1개 표시
모든 저장소를 표시했습니다