Run any Skill in Manus with one click

$pwd:

webgpu-impl-compute-usecases

Name: Webgpu Impl Compute Usecases
Author: Impertio-Studio

// Use when building WebGPU compute workloads: image processing, particle systems, physics simulation, or reduction and prefix-sum. Prevents data races and stale-read bugs in multi-pass compute pipelines. Covers image processing, particle systems, physics simulation, reduction and scan patterns, and workgroup-shared-memory tiling. Keywords: compute use case, image processing, blur, particle system, physics simulation, reduction, prefix sum, scan, workgroup shared memory, storage texture, ping-pong, how do I do GPU compute, GPGPU.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 20, 2026 at 00:41

File Explorer

4 files

SKILL.md

readonly

related-skills.json

same repository

webgpu-agents-pipeline-orchestrator.md

from "Impertio-Studio/WebGPU-Claude-Skill-Package"

Use when building a WebGPU application end to end, setting up the full adapter to device to pipeline to resource chain, or deciding which WebGPU skill applies to the current step. Prevents setup-order mistakes, wrong buffer usage flags, and bind-group layout inconsistency. Covers the full WebGPU setup sequence, the decision routing to the 33 detailed WebGPU skills, and end-to-end render and compute scaffolds. Keywords: WebGPU setup, full pipeline, end to end, getting started, scaffold a WebGPU app, which WebGPU skill, orchestrate WebGPU, render pipeline setup, compute pipeline setup, how do I build a WebGPU app.

2026-05-200

webgpu-agents-quality-validator.md

from "Impertio-Studio/WebGPU-Claude-Skill-Package"

Use when reviewing or validating generated WebGPU or WGSL code before trusting it, or auditing a WebGPU codebase for correctness. Prevents shipping hallucinated APIs, alignment errors, missing device-loss handling, and unlabeled descriptors. Covers a category-by-category WebGPU review checklist, the consolidated anti-pattern catalog, and routing each issue to the skill that fixes it. Keywords: WebGPU code review, validate WebGPU code, audit, quality check, review WGSL, is this WebGPU code correct, hallucinated API, alignment error, checklist, what is wrong with my WebGPU code.

2026-05-200

webgpu-errors-debugging.md

from "Impertio-Studio/WebGPU-Claude-Skill-Package"

Use when debugging WebGPU: diagnosing validation messages, surfacing WGSL compile errors, or capturing a GPU frame. Prevents the undebuggable-from-a-generic-message problem caused by unlabeled descriptors. Covers object labels, error scopes for isolation, getCompilationInfo and GPUCompilationMessage, debug groups and markers, and browser GPU tooling. Keywords: WebGPU debugging, label, getCompilationInfo, GPUCompilationMessage, shader compile error, debug group, insertDebugMarker, RenderDoc, chrome gpu, how do I debug WebGPU, where is my shader error.

2026-05-200

webgpu-errors-device-loss.md

from "Impertio-Studio/WebGPU-Claude-Skill-Package"

Use when handling WebGPU device loss, the device.lost promise, or recovering after a GPU process crash. Prevents the silent-retry-loop anti-pattern and recovering after an intentional device.destroy. Covers the device.lost promise, GPUDeviceLostInfo reason values, and the explicit recovery pattern that recreates every GPU resource. Keywords: device lost, device.lost, GPUDeviceLostInfo, device loss recovery, device.destroy, GPU process crashed, WebGPU stopped working, reason destroyed, reason unknown, how do I recover from device loss.

2026-05-200

webgpu-errors-validation.md

from "Impertio-Studio/WebGPU-Claude-Skill-Package"

Use when a WebGPU call fails validation, when capturing errors with error scopes, or when handling the uncapturederror event. Prevents the mistake of treating uncapturederror like a synchronous getError. Covers GPUValidationError, GPUOutOfMemoryError, GPUInternalError, pushErrorScope and popErrorScope, the uncapturederror event, and the contagious-error model. Keywords: validation error, GPUValidationError, GPUOutOfMemoryError, pushErrorScope, popErrorScope, uncapturederror, GPUError, error scope, why did my WebGPU call fail, how do I debug a validation error.

2026-05-200

webgpu-impl-render-usecases.md

from "Impertio-Studio/WebGPU-Claude-Skill-Package"

Use when building WebGPU render workloads: PBR materials, full-screen passes, post-processing, or screen-space effects. Prevents the unnecessary-vertex-buffer mistake for full-screen passes and WebGL clip-space assumptions. Covers PBR material setup, the full-screen oversized-triangle trick, post-processing, and screen-space effects like SSAO and SSR. Keywords: render use case, PBR, physically based rendering, full-screen quad, full-screen triangle, post-processing, screen-space effects, SSAO, SSR, how do I render PBR materials, full screen pass.

2026-05-200

package.json

"author": "Impertio-Studio"

"repository": "Impertio-Studio/WebGPU-Claude-Skill-Package"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	webgpu-impl-compute-usecases
description	Use when building WebGPU compute workloads: image processing, particle systems, physics simulation, or reduction and prefix-sum. Prevents data races and stale-read bugs in multi-pass compute pipelines. Covers image processing, particle systems, physics simulation, reduction and scan patterns, and workgroup-shared-memory tiling. Keywords: compute use case, image processing, blur, particle system, physics simulation, reduction, prefix sum, scan, workgroup shared memory, storage texture, ping-pong, how do I do GPU compute, GPGPU.
license	MIT
compatibility	Designed for Claude Code. Requires WebGPU 1.0-stable.
metadata	{"author":"OpenAEC-Foundation","version":"1.0"}

WebGPU Compute Use Cases

Map four common GPGPU workloads, image processing, particle systems, physics simulation, and reduction or prefix-sum, onto the WebGPU compute pipeline with the correct resources, buffering, and synchronization.

Quick Reference

WebGPU 1.0-stable (Chrome 113+, Safari 26+, Firefox 141+).

Use case	Input resource	Output resource	Buffering	Synchronization
Image processing	Sampled or storage texture	Storage texture (`STORAGE_BINDING`)	Two textures if in-place	`workgroupBarrier` after tile load
Particle system	Storage buffer (`STORAGE`)	Same storage buffer	Single buffer, integrate in place	None within a workgroup if no shared state
Physics simulation	Storage buffer A (`read`)	Storage buffer B (`read_write`)	Double-buffer, swap each step	`workgroupBarrier` / `storageBarrier`
Reduction / scan	Storage buffer	Partials buffer, then final	Multi-pass, one buffer per level	`workgroupBarrier` between tree steps

Rules that ALWAYS hold:

A storage texture used as a compute output MUST be created with GPUTextureUsage.STORAGE_BINDING and a storage-capable format (rgba8unorm, rgba16float, r32float, and similar). NEVER use rgba8unorm-srgb as a storage texture format.
A WGSL var<workgroup> array shared across invocations MUST be followed by workgroupBarrier() between the write phase and the read phase.
Physics state MUST be double-buffered. NEVER read and write the same particle index across neighbours in one pass.
dispatchWorkgroups(x, y, z) arguments are workgroup COUNTS, not invocation counts. For N items at @workgroup_size(64), dispatch Math.ceil(N / 64).
Reading a compute-written storage buffer on the CPU in the same frame requires await device.queue.onSubmittedWorkDone() first. NEVER map it immediately.

Decision Tree

What is the compute workload?
├─ Per-pixel image transform (blur, convolution, color grading)
│  └─ Input texture + output STORAGE texture. One workgroup per pixel tile.
│     Cache the tile + halo in var<workgroup>, workgroupBarrier, then write.
│
├─ Many independent agents updated each frame (particles)
│  └─ Particle state in one STORAGE buffer. Compute pass integrates
│     position += velocity * dt. Render pass draws the SAME buffer via
│     @builtin(instance_index). GPU-decided count → dispatchWorkgroupsIndirect.
│
├─ Agents that read each other's state (physics, n-body, cloth)
│  └─ Double-buffer: read state A, write state B, swap buffers each step.
│     A single buffer creates a read-write hazard and nondeterministic results.
│
└─ Collapse an array to one value, or compute a running total (scan)
   └─ Multi-pass tree reduction. Each workgroup reduces a chunk in
      var<workgroup>, writes one partial. A second dispatch reduces the
      partials. Chrome 134+ subgroup builtins accelerate the inner step.

The compute pipeline object and the WGSL @compute shader mechanics are NOT taught here. See webgpu-syntax-compute-pipeline and webgpu-wgsl-compute-shaders.

Core Patterns

Pattern 1: ALWAYS output image results to a storage texture

For per-pixel image processing, bind the source as a sampled or storage texture and write the result into a SEPARATE storage texture. NEVER write into the same texture you are reading, because read-write ordering across invocations is undefined.

@group(0) @binding(0) var src: texture_2d<f32>;
@group(0) @binding(1) var dst: texture_storage_2d<rgba8unorm, write>;

@compute @workgroup_size(8, 8)
fn main(@builtin(global_invocation_id) gid: vec3u) {
  let dims = textureDimensions(src);
  if (gid.x >= dims.x || gid.y >= dims.y) { return; } // ALWAYS bounds-check
  let c = textureLoad(src, vec2i(gid.xy), 0);
  textureStore(dst, vec2i(gid.xy), c);
}

The output texture descriptor MUST include STORAGE_BINDING and a storage-capable format. The bind group layout entry uses storageTexture: { access: "write-only", format: "rgba8unorm" }.

Pattern 2: ALWAYS cache a tile in var for neighbour-reading kernels

A blur or convolution reads neighbouring pixels. Caching the tile plus a halo in var<workgroup> shared memory cuts texture reads from kernelSize per pixel to one. ALWAYS place a workgroupBarrier() between the load phase and the compute phase.

var<workgroup> tile: array<vec4f, 100>; // 8x8 tile + 1px halo = 10x10

@compute @workgroup_size(8, 8)
fn blur(@builtin(local_invocation_id) lid: vec3u,
        @builtin(workgroup_id) wid: vec3u) {
  // each invocation loads its texels into tile[...]
  workgroupBarrier(); // MANDATORY: all loads complete before any read
  // now read tile[...] for the kernel; race-free
}

Pattern 3: ALWAYS integrate particles in place, draw the same buffer

Particle state lives in ONE storage buffer with STORAGE usage. The compute pass integrates each particle independently. The render pass draws that exact buffer via @builtin(instance_index). NEVER copy the buffer between the two passes.

const particles = device.createBuffer({
  size: count * 32, // e.g. pos:vec3 + pad + vel:vec3 + pad
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});
const enc = device.createCommandEncoder();
const cp = enc.beginComputePass();
cp.setPipeline(integratePipeline);
cp.setBindGroup(0, simBindGroup);
cp.dispatchWorkgroups(Math.ceil(count / 64));
cp.end();
const rp = enc.beginRenderPass(renderPassDesc);
rp.setPipeline(drawPipeline);
rp.draw(6, count); // instanceCount = particle count
rp.end();
device.queue.submit([enc.finish()]); // encoder orders compute before render

The command encoder establishes ordering between the compute and render pass within one queue.submit. NO barrier or readback is needed between them.

Pattern 4: ALWAYS double-buffer physics state

When an invocation reads a neighbour's state, read from buffer A and write to buffer B, then swap the bindings for the next step. NEVER mutate one buffer in place when invocations read each other.

let read = stateBufferA, write = stateBufferB;
function step() {
  const enc = device.createCommandEncoder();
  const cp = enc.beginComputePass();
  cp.setPipeline(physicsPipeline);
  cp.setBindGroup(0, makeBindGroup(read, write)); // read A, write B
  cp.dispatchWorkgroups(Math.ceil(count / 64));
  cp.end();
  device.queue.submit([enc.finish()]);
  [read, write] = [write, read]; // swap for next step
}

WGSL declares the bindings as var<storage, read> and var<storage, read_write>. Within a workgroup use workgroupBarrier(); for visibility across storage memory within a dispatch use storageBarrier().

Pattern 5: ALWAYS reduce in multiple passes with a partials buffer

A reduction or prefix-sum over a large array does NOT fit one workgroup. Each workgroup reduces a chunk into var<workgroup>, writes ONE partial; a second dispatch reduces the partials. ALWAYS workgroupBarrier() between each step of the in-workgroup tree.

var<workgroup> scratch: array<f32, 64>;

@compute @workgroup_size(64)
fn reduce(@builtin(local_invocation_id) lid: vec3u,
          @builtin(workgroup_id) wid: vec3u) {
  scratch[lid.x] = input[wid.x * 64u + lid.x];
  workgroupBarrier();
  for (var s = 32u; s > 0u; s = s >> 1u) {
    if (lid.x < s) { scratch[lid.x] += scratch[lid.x + s]; }
    workgroupBarrier(); // MANDATORY every tree step
  }
  if (lid.x == 0u) { partials[wid.x] = scratch[0]; }
}

Pattern 6: ALWAYS feature-detect subgroup builtins

subgroupAdd, subgroupInclusiveAdd, and subgroupExclusiveAdd accelerate reduction and scan but require the subgroups feature (Chrome 134+). NEVER emit them unconditionally.

const adapter = await navigator.gpu.requestAdapter();
const hasSubgroups = adapter.features.has("subgroups");
const device = await adapter.requestDevice({
  requiredFeatures: hasSubgroups ? ["subgroups"] : [],
});
// Select the subgroup-accelerated WGSL only when hasSubgroups is true.

A subgroup-accelerated shader MUST begin with enable subgroups; and that shader MUST only be compiled when the feature was granted.

Common Anti-Patterns

Mapping a compute-written storage buffer to the CPU in the same frame without onSubmittedWorkDone. WHY it fails: the encoder orders passes on the GPU timeline, but a CPU mapAsync is not ordered against GPU completion. The map can resolve before the compute pass finishes, so getMappedRange returns stale or partial data. Fix: await device.queue.onSubmittedWorkDone() before mapping the readback buffer.
Mutating one physics state buffer in place. WHY it fails: invocation order across a dispatch is unspecified. If particle i reads particle j's position while particle j is being written, the result depends on scheduling and is nondeterministic. Fix: double-buffer, read A and write B, swap each step.
Omitting workgroupBarrier() in a tiled kernel or tree reduction. WHY it fails: var<workgroup> memory is shared, but invocations run concurrently. Reading a slot another invocation has not written yet is a data race that yields garbage. Fix: place workgroupBarrier() between every write phase and the read phase that follows it.

Critical Warnings

NEVER assume subgroup builtins exist. subgroupAdd / subgroupInclusiveAdd / subgroupExclusiveAdd require the subgroups feature and an enable subgroups; directive. Compiling them without the feature is a shader-creation error.
NEVER use an -srgb format for a storage texture. Storage textures require a non-srgb storage-capable format.
NEVER call workgroupBarrier() inside divergent (non-uniform) control flow. All invocations of the workgroup MUST reach the same barrier; a barrier inside an if that varies per invocation is undefined behaviour.
NEVER pass invocation counts to dispatchWorkgroups. The arguments are workgroup counts. For N items at @workgroup_size(64), dispatch Math.ceil(N / 64).
NEVER read and write the same storage texture in one compute pass. Use two textures or ping-pong.
NEVER map a STORAGE-usage buffer directly. Copy it into a separate COPY_DST | MAP_READ staging buffer with copyBufferToBuffer first.

Reference Files

references/methods.md : per-use-case API and resource recipe for image processing, particle systems, physics simulation, and reduction or scan, including buffer usage flags, texture formats, bind group layout entries, and dispatch sizing.
references/examples.md : verified working code for an image-processing compute pass, a particle update pass, and a two-level reduction.
references/anti-patterns.md : mistakes with WHY-it-fails analysis.

Related skills: webgpu-syntax-compute-pipeline (creating the compute pipeline and compute pass), webgpu-wgsl-compute-shaders (@compute, @workgroup_size, builtins, barriers), webgpu-impl-instancing-indirect (dispatchWorkgroupsIndirect, drawing a particle buffer instanced), webgpu-impl-buffer-upload (uploading initial particle and input data).

webgpu-impl-compute-usecases

More from this repository

More from this repository

WebGPU Compute Use Cases

Quick Reference

Decision Tree

Core Patterns

Pattern 1: ALWAYS output image results to a storage texture

Pattern 2: ALWAYS cache a tile in var for neighbour-reading kernels

Pattern 3: ALWAYS integrate particles in place, draw the same buffer

Pattern 4: ALWAYS double-buffer physics state

Pattern 5: ALWAYS reduce in multiple passes with a partials buffer

Pattern 6: ALWAYS feature-detect subgroup builtins

Common Anti-Patterns

Critical Warnings

Reference Files

WebGPU Compute Use Cases

Quick Reference

Decision Tree

Core Patterns

Pattern 1: ALWAYS output image results to a storage texture

Pattern 2: ALWAYS cache a tile in var for neighbour-reading kernels

Pattern 3: ALWAYS integrate particles in place, draw the same buffer

Pattern 4: ALWAYS double-buffer physics state

Pattern 5: ALWAYS reduce in multiple passes with a partials buffer

Pattern 6: ALWAYS feature-detect subgroup builtins

Common Anti-Patterns

Critical Warnings

Reference Files