com um clique
mobile-shader-optimization
// GPU architecture, precision types, fillrate, overdraw, baked lighting, and LOD optimization for Unity mobile/WebGL shaders
// GPU architecture, precision types, fillrate, overdraw, baked lighting, and LOD optimization for Unity mobile/WebGL shaders
Generate AI videos with Luma Dream Machine via AceDataCloud API. Use when creating videos from text prompts, generating videos from reference images, extending existing videos, or any video generation task with Luma. Supports text-to-video, image-to-video, and video extension.
Single-pass post-processing, URP Renderer Features, and mobile-safe screen effects for Unity
Node budgets, custom lighting, sub-graphs, precision control, and mobile workflows for Unity Shader Graph
Channel packing, variant reduction, shader_feature vs multi_compile, and build size optimization for Unity
Mobile-optimized water shaders with depth coloring, foam, Gerstner waves, refraction, and caustics for Unity URP
WebGL 1.0/2.0 shader restrictions, variant stripping, loop/indexing rules, and browser deployment workarounds for Unity
| name | mobile-shader-optimization |
| description | GPU architecture, precision types, fillrate, overdraw, baked lighting, and LOD optimization for Unity mobile/WebGL shaders |
Use this skill when the user is writing, reviewing, or optimizing Unity shaders for mobile platforms (Android, iOS) or any GPU-constrained environment (Meta Quest VR, low-end devices). Triggers include: mentions of "mobile", "Android", "iOS", "Mali", "Adreno", "PowerVR", "performance", "optimization", "fillrate", "overdraw", "bandwidth", "frame budget", "half precision", "LOD", "baked lighting", or any shader work targeting sub-60fps scenarios. Also trigger when the user is profiling shaders, asking about GPU architecture, or comparing Simple Lit vs Complex Lit in URP.
These rules must always be followed when writing mobile shaders:
Use half precision by default. Only use float for world-space positions, UV coordinates requiring high precision, and depth calculations. On Mali and Adreno GPUs, half triggers 16-bit fast paths that are 2x faster.
Never use discard (clip/alpha-test) on PowerVR or Apple GPUs. It breaks Early-Z/Hidden Surface Removal, forcing the GPU to process fragments it would otherwise skip. Use alpha blending instead, or move discard to a separate pass.
Keep Shader Graph node count under 100 for mobile. Each node compiles to shader instructions. Exceeding ~100 nodes risks exceeding mobile GPU instruction limits and causes performance cliffs.
Never use while or do-while loops in shaders targeting OpenGL ES 2.0 / WebGL 1.0. Only counting for loops are allowed. Even on ES 3.0+, avoid dynamic loop bounds — they prevent the compiler from unrolling.
Avoid dynamic branching on mobile. Mobile GPUs often execute both branches regardless. Use step(), lerp(), saturate() for branchless alternatives.
Multiply scalars before vectors. float3 result = scalar1 * scalar2 * vector is faster than float3 result = vector * scalar1 * scalar2 because the first scalar multiply is 1 ALU op instead of 3.
Limit texture samples. Each sample costs bandwidth. Target ≤4 texture samples per fragment on low-end mobile. Use channel packing to combine maps (see texture-packing-variant-stripping skill).
Use noforwardadd surface shader directive to skip additional per-pixel light passes. Each additional light pass doubles overdraw.
Understanding tile-based rendering is essential for mobile shader optimization.
Mobile GPUs split the screen into small tiles (16×16 for Mali, variable for Adreno) and process each tile entirely on-chip before writing to main memory. This means:
half hardware.half is genuinely 16-bit and 2x throughput. float is 32-bit. No fixed type.// HLSL/ShaderLab precision mapping to GLSL:
// float → highp (32-bit) — positions, UVs, depth
// half → mediump (16-bit) — colors, normals, most calculations
// fixed → lowp (11-bit) — simple color ops only (deprecated on many GPUs)
// USE half FOR:
half4 color = tex2D(_MainTex, uv); // Texture samples
half3 normal = normalize(i.worldNormal); // Normals
half NdotL = saturate(dot(normal, lightDir)); // Lighting
half fresnel = pow(1.0h - NdotL, 4.0h); // Fresnel
// USE float FOR:
float3 worldPos = mul(unity_ObjectToWorld, v.vertex).xyz; // World position
float2 uv = TRANSFORM_TEX(v.texcoord, _MainTex); // UV transforms
float depth = Linear01Depth(rawDepth); // Depth
On fillrate-bound mobile GPUs, vertex count is orders of magnitude lower than fragment count. Move calculations to the vertex shader whenever visually acceptable:
// BAD — per-pixel calculation (runs millions of times)
half4 frag(v2f i) : SV_Target {
half rim = 1.0h - saturate(dot(normalize(i.viewDir), normalize(i.worldNormal)));
half rimPower = pow(rim, _RimPower);
return _RimColor * rimPower;
}
// GOOD — per-vertex calculation (runs thousands of times)
v2f vert(appdata v) {
v2f o;
o.vertex = TransformObjectToHClip(v.vertex.xyz);
float3 viewDir = normalize(GetWorldSpaceViewDir(TransformObjectToWorld(v.vertex.xyz)));
float3 worldNormal = TransformObjectToWorldNormal(v.normal);
o.rimFactor = pow(1.0 - saturate(dot(viewDir, worldNormal)), _RimPower);
return o;
}
half4 frag(v2f i) : SV_Target {
return _RimColor * i.rimFactor; // Just interpolated value
}
Use LOD tiers to serve different shader complexity per device capability:
Shader "Game/Character" {
// LOD 300: Full quality (desktop, high-end mobile)
SubShader {
LOD 300
Tags { "RenderType"="Opaque" "RenderPipeline"="UniversalPipeline" }
Pass {
// Normal map + specular + rim + emission
}
}
// LOD 150: Medium quality (mid-range mobile)
SubShader {
LOD 150
Tags { "RenderType"="Opaque" "RenderPipeline"="UniversalPipeline" }
Pass {
// Diffuse + simple lighting, no normal map
}
}
// LOD 100: Minimum quality (low-end mobile)
SubShader {
LOD 100
Tags { "RenderType"="Opaque" "RenderPipeline"="UniversalPipeline" }
Pass {
// Vertex-lit only, single texture
}
}
Fallback "Universal Render Pipeline/Simple Lit"
}
Set at runtime based on device:
// In a device detection script:
if (SystemInfo.graphicsMemorySize < 1024)
Shader.globalMaximumLOD = 100;
else if (SystemInfo.graphicsMemorySize < 2048)
Shader.globalMaximumLOD = 150;
else
Shader.globalMaximumLOD = 300;
Built-in LOD values: Mobile/VertexLit = 100, Mobile/Diffuse = 150, Mobile/Bumped = 200.
Fully baked lighting eliminates real-time shadow maps and reduces fragment shaders to a single lightmap fetch. This is often the difference between 30 and 60 FPS on low-end devices.
Configure the URP Asset for mobile:
mobile-post-processing skill).When reviewing a mobile shader, verify:
half precisionfloatdiscard/clip on iOS/PowerVR targets (or isolated in separate pass)while/do-while loopsnoforwardadd used on surface shaders (or additional lights disabled in URP)