| name | profiler-guide |
| description | Reference guide for Apple CPU and GPU profiling tools — workflows, recording traces, analysis patterns, and all available instruments |
| user-invocable | false |
Apple Profiler — Plugin Guide
Workflow
CPU profiling (.trace files)
- Record with
xctrace record (see Recording CPU Traces)
- Open with
profiler_open_trace to see metadata and available tables
- Analyze with
profiler_top_functions, profiler_cpu_samples, etc.
GPU profiling (.gputrace files)
- Capture with Metal environment variables or
MTLCaptureManager (see Capturing GPU Traces)
- Open with
profiler_gpu_open to see kernels, command buffers, encoders
- Analyze with
profiler_gpu_timeline, profiler_gpu_dependencies, profiler_gpu_counters
Choosing a Template
Use the template that matches what you're investigating:
| Template | Use when | Key tables |
|---|
| Time Profiler | General CPU hotspot analysis | time-profile (CPU samples with stack traces) |
| CPU Profiler | Detailed CPU profiling with cycle counts | cpu-profile (CPU samples with stack traces) |
| Metal System Trace | GPU performance, Metal API, shader execution | time-profile, metal-gpu-intervals, metal-driver-intervals, metal-application-intervals |
| Allocations | Memory allocation tracking, leaks | allocation tables |
| Leaks | Memory leak detection | leak tables |
| System Trace | Thread scheduling, syscalls, VMFault | system trace tables |
| Swift Concurrency | async/await, actors, task scheduling | Swift actor/task tables |
| SwiftUI | View body evaluation, update triggers | SwiftUI-specific tables |
| App Launch | Launch time analysis | launch-related tables |
| Animation Hitches | Frame drops, hitches, render performance | hitch/frame tables |
| Network | HTTP traffic, connection analysis | HTTP/network tables |
| Hangs | Main thread unresponsiveness | potential-hangs |
| Core ML | ML model inference performance | Core ML tables |
| Game Performance | Combined CPU/GPU/memory for games | Multiple game-related tables |
| Power Profiler | Energy impact, battery drain | power/energy tables |
| RealityKit Trace | AR/3D rendering performance | RealityKit tables |
Recording CPU Traces
Use xctrace record to create .trace files for CPU/system analysis:
xctrace record --template "Time Profiler" --time-limit 10s --launch -- ./my-app arg1 arg2
xctrace record --template "Time Profiler" --time-limit 10s --attach <pid-or-name>
xctrace record --template "Time Profiler" --time-limit 5s --all-processes
xctrace record --template "Metal System Trace" --time-limit 10s --launch -- ./my-metal-app
xctrace record --template "Time Profiler" --output /tmp/recording.trace --launch -- ./my-app
xctrace record --template "Time Profiler" --env METAL_DEVICE_WRAPPER_TYPE=1 --launch -- ./my-app
Profiling a specific code section: Use os_signpost to mark the region of interest, then filter the trace by time range:
import os
let log = OSLog(subsystem: "com.myapp", category: .pointsOfInterest)
os_signpost(.begin, log: log, name: "HotPath")
os_signpost(.end, log: log, name: "HotPath")
Then record with --instrument "os_signpost" alongside the profiler template, use profiler_signpost_intervals to find the time range, and scope profiler_cpu_samples/profiler_top_functions to that window.
Capturing GPU Traces
GPU traces (.gputrace bundles) capture the full Metal command stream — every dispatch, buffer binding, barrier, and optionally shader performance counters.
Method 1: Environment Variables (simplest for CLI tools)
Set MTL_CAPTURE_ENABLED=1 and METAL_CAPTURE_ENABLED=1 to capture the entire Metal workload:
MTL_CAPTURE_ENABLED=1 METAL_CAPTURE_ENABLED=1 ./my-metal-app
MTL_CAPTURE_ENABLED=1 METAL_CAPTURE_ENABLED=1 \
METAL_CAPTURE_OUTPUT_PATH=/tmp/my_capture.gputrace \
./my-metal-app
This captures from the first Metal command to program exit. Good for short-running compute workloads.
Method 2: MTLCaptureManager (profile specific code sections)
For profiling a specific section of execution, add capture bracketing in the Metal code:
import Metal
func profileSection(device: MTLDevice, commandQueue: MTLCommandQueue) {
let captureManager = MTLCaptureManager.shared()
let descriptor = MTLCaptureDescriptor()
descriptor.captureObject = device
descriptor.destination = .gpuTraceDocument
descriptor.outputURL = URL(fileURLWithPath: "/tmp/my_capture.gputrace")
do {
try captureManager.startCapture(with: descriptor)
} catch {
print("Capture failed to start: \(error)")
return
}
captureManager.stopCapture()
}
For C/Objective-C Metal code:
#import <Metal/Metal.h>
MTLCaptureManager *mgr = [MTLCaptureManager sharedCaptureManager];
MTLCaptureDescriptor *desc = [[MTLCaptureDescriptor alloc] init];
desc.captureObject = device;
desc.destination = MTLCaptureDestinationGPUTraceDocument;
desc.outputURL = [NSURL fileURLWithPath:@"/tmp/my_capture.gputrace"];
NSError *err;
[mgr startCaptureWithDescriptor:desc error:&err];
[mgr stopCapture];
Important: The output directory must exist and the .gputrace must not already exist (delete first if re-capturing).
Method 3: Xcode (interactive)
- Open the project in Xcode
- Run with the GPU Frame Capture button (camera icon in debug bar)
- Click "Capture GPU Workload" during execution
- Save the capture as a
.gputrace file
Getting shader performance counters
The profiler_gpu_counters tool requires shader profiling data (streamData) which is only generated when the GPU trace is replayed with profiling enabled in Xcode:
- Open the
.gputrace in Xcode
- Click the Replay button (with shader profiling enabled in settings)
- Xcode writes
streamData into the .gputrace bundle
- Now
profiler_gpu_counters can read occupancy, bandwidth, cache hit rates, etc.
Available Tools
CPU analysis (.trace files — Time Profiler, CPU Profiler, Metal System Trace)
profiler_open_trace — Open trace and return metadata + table list.
profiler_cpu_samples — Individual samples with full stack traces. Use start_time_ns/end_time_ns to scope to a time window.
profiler_top_functions — Aggregated view: which functions consumed the most CPU. Also supports time-range filtering.
Hang detection (requires Hangs instrument)
profiler_hangs — Lists unresponsiveness intervals with type, duration, and thread.
Signpost analysis (requires os_signpost instrument)
profiler_signpost_events — Raw begin/end/event entries. Filter by subsystem, category, name.
profiler_signpost_intervals — Matched begin+end pairs with durations.
Generic table access (works with any .trace)
profiler_list_tables — Discover all available schemas in the trace.
profiler_query_table — Query any table by schema name. Use this for Metal GPU tables, thermal data, disk I/O, or anything not covered by the specialized tools above.
Correlated CPU+GPU analysis (Metal System Trace .trace files)
profiler_correlated_timeline — Time-bucketed view showing CPU and GPU activity side by side. Auto-detects execution phases: CPU_BOUND, GPU_BOUND, BALANCED, PIPELINE_BUBBLE, IDLE. Returns per-bucket breakdown plus summary with bottleneck classification.
GPU trace analysis (.gputrace files)
profiler_gpu_open — Structural overview: kernels, command buffers, encoders, dispatch/barrier counts.
profiler_gpu_timeline — Detailed dispatch events with kernel names, threadgroup sizes, buffer bindings. Filter by kernel pattern, command buffer, or encoder.
profiler_gpu_dependencies — Buffer hazard DAG with critical path analysis. View at dispatch, encoder, kernel, or command buffer scale.
profiler_gpu_counters — Shader profiling counters (occupancy, bandwidth, cache rates). Summary stats or full time-series.
profiler_gpu_scheduling — Scheduling overhead analysis. Identifies inter-encoder gaps (GPU idle between encoder submissions) and dispatch fusion candidates (many small dispatches that could be combined). Returns prioritized recommendations with estimated savings. Requires shader profiling data (streamData).
profiler_gpu_export_perfetto — Export to .pftrace for visualization in ui.perfetto.dev.
Analysis Patterns
"Why is my app slow?"
- Record with Time Profiler:
xctrace record --template "Time Profiler" --time-limit 10s --launch -- ./app
profiler_open_trace to confirm recording
profiler_top_functions with n=20 to find hotspots
profiler_cpu_samples with limit=50 to see stack traces for the hot functions
"Why does the UI hang?"
- Record with Hangs template or include Hangs instrument
profiler_hangs to find hang intervals
- Use the hang's
start_ns and duration_ns to scope profiler_cpu_samples to that time window
"Is my Metal code efficient?" (system-level)
- Record with Metal System Trace:
xctrace record --template "Metal System Trace" --launch -- ./app
profiler_list_tables to see available Metal tables
profiler_query_table with metal-gpu-intervals for GPU execution
profiler_query_table with metal-driver-intervals for driver overhead
profiler_top_functions to see CPU-side costs
"What happened in this specific time window?"
- Use
profiler_cpu_samples (no limit) to find the time range of interest from timestamps
- Call
profiler_top_functions(start_time_ns=X, end_time_ns=Y) to see what was hot in that window
- Call
profiler_cpu_samples(start_time_ns=X, end_time_ns=Y, limit=30) for detailed stack traces
"Optimize my Metal compute kernels" (GPU trace)
- Capture a
.gputrace (see Capturing GPU Traces)
profiler_gpu_open to see kernel list, dispatch counts, command buffer structure
profiler_gpu_scheduling to find scheduling overhead — inter-encoder gaps and fusion candidates
profiler_gpu_dependencies at encoder or kernel scale to find the critical path
profiler_gpu_timeline with kernel_filter to inspect specific dispatches (threadgroup sizes, buffer bindings)
profiler_gpu_counters for shader performance metrics (requires Xcode replay)
profiler_gpu_export_perfetto to create a visual timeline for the user
"Which GPU dispatches could run in parallel?"
profiler_gpu_dependencies with scale=dispatch to see the full buffer dependency DAG
- Check the
critical_path_length vs total_dispatches — large gap means parallelism exists
- Look for
isolated_nodes — dispatches with no dependencies that could overlap with anything
- Use
profiler_gpu_dependencies with scale=kernel to see which kernel types have cross-dependencies
"Profile both CPU and GPU sides"
- Record with Metal System Trace:
xctrace record --template "Metal System Trace" --time-limit 10s --launch -- ./app
profiler_correlated_timeline on the .trace to see CPU and GPU activity side by side with auto-detected phases
- Check the
summary.bottleneck field — it tells you if the app is CPU_BOUND, GPU_BOUND, or BALANCED
- For GPU-bound workloads: capture a
.gputrace and use profiler_gpu_scheduling to find if encoder consolidation would help
- For CPU-bound workloads: use
profiler_top_functions to find the CPU hotspot (is it command encoding? shader compilation? data preparation?)
"Should I combine my GPU operations?"
- Capture a
.gputrace with shader profiling data (see Getting shader performance counters)
profiler_gpu_scheduling — check summary.inter_encoder_gap_pct. If >5%, combining encoders will help
- Check
encoder_gaps for specific gaps and whether they're COMBINABLE (shared kernels) or COMBINABLE_DIFFERENT_KERNELS (different work but no CPU readback needed)
- Check
fusion_candidates for sequences of many small dispatches that could be batched into fewer, larger ones
- Follow the
recommendations — they're prioritized by potential savings
All Available Instruments
These can be added to a recording with --instrument <name>:
Activity Monitor, Allocations, Audio Client, Audio Server, Audio Statistics,
CPU Counters, CPU Profiler, Core Animation Activity, Core Animation Commits,
Core Animation FPS, Core Animation Server, Core ML, Data Faults, Data Fetches,
Data Saves, Disk I/O Latency, Disk Usage, Display, Filesystem Activity,
Filesystem Suggestions, Foundation Models, Frame Lifetimes, GCD Performance,
GPU, HTTP Traffic, Hangs, Hitches, Leaks, Location Energy Model,
Metal Application, Metal GPU Counters, Metal Performance Overview,
Metal Resource Events, Network Connections, Neural Engine, Points of Interest,
Power Profiler, Processor Trace, RealityKit Frames, RealityKit Metrics,
Runloops, Sampler, SceneKit Application, Swift Actors, Swift Tasks, SwiftUI,
System Call Trace, System Load, Thermal State, Thread State Trace,
Time Profiler, VM Tracker, Virtual Memory Trace, dyld Activity,
os_log, os_signpost, stdout/stderr