| name | profiling-api |
| description | Add profiling zones, metrics, and annotations to Kit-based C++ and Python code. Covers Carbonite macros (CARB_PROFILE_ZONE, CARB_PROFILE_FUNCTION, GPU zones), Python profiler API (decorators, begin/end), profiler masks, channels, Tracy plot data, event annotations, and automatic Kit Python function capture (CARB_PROFILING_PYTHON). Use when a developer asks how to add profiling spans to Kit/Carbonite code, configure masks/channels, record custom Tracy plots, or annotate traces with event markers. NOT for capturing traces (use profiling), analyzing traces (use nsys-analyze), or non-Kit Python function tracing (use nvtx-python). |
Profiling API — Instrumenting Kit-Based Code
How to add profiling zones, metrics, and annotations to C++ and Python code in the Carbonite/Kit ecosystem.
For capturing traces, see the profiling skill. For analyzing them, see nsys-analyze.
C++ Profiling Macros
Source: carb/profiler/Profile.h
Scope-Based Zone (most common)
#include <carb/profiler/Profile.h>
constexpr const uint64_t kProfilerMask = 1;
void myFunction() {
CARB_PROFILE_ZONE(kProfilerMask, "My C++ function");
doHeavyWork();
}
Parameters: (maskOrChannel, zoneName, ...variadic_args)
- No variadic args →
ProfileZoneStatic (pre-registered, faster)
- With variadic args →
ProfileZoneDynamic (printf formatting)
Auto Function Name
void myFunction() {
CARB_PROFILE_FUNCTION(kProfilerMask);
}
Manual Begin/End
auto zoneId = CARB_PROFILE_BEGIN(kProfilerMask, "Manual zone");
CARB_PROFILE_END(kProfilerMask, zoneId);
Prefer RAII style (CARB_PROFILE_ZONE) over manual begin/end.
GPU Zones
Kit's RTX renderer uses query-based GPU zone capture:
auto gpuCtx = CARB_PROFILE_CREATE_GPU_CONTEXT("Vulkan GPU", cpuTs, gpuTs, gpuPeriod, "vulkan");
CARB_PROFILE_GPU_QUERY_BEGIN(kProfilerMask, gpuCtx, queryId, "RTX Render Pass");
CARB_PROFILE_GPU_QUERY_END(kProfilerMask, gpuCtx, queryId);
CARB_PROFILE_GPU_SET_QUERY_VALUE(kProfilerMask, gpuCtx, queryId, gpuTimestamp);
Enable GPU zones in Tracy:
--/profiler/gpu/tracyInject/enabled=true
--/rtx/addTileGpuAnnotations=true
Python Profiling API
Decorator (simplest)
import carb.profiler
@carb.profiler.profile
def my_function():
do_something()
Manual begin/end
carb.profiler.begin(1, "My Python operation")
carb.profiler.end(1)
Full IProfiler Interface
profiler = carb.profiler.acquire_profiler_interface()
profiler.begin(mask, name)
profiler.end(mask)
profiler.set_capture_mask(mask) -> int
profiler.get_capture_mask() -> int
profiler.value_float(mask, value, name)
profiler.value_int(mask, value, name)
profiler.value_uint(mask, value, name)
profiler.instant(mask, type, name)
profiler.flow(mask, type, id, name)
profiler.frame(mask, name)
profiler.set_python_profiling_enabled(bool)
profiler.is_python_profiling_enabled() -> bool
Types:
carb.profiler.InstantType.THREAD
carb.profiler.InstantType.PROCESS
carb.profiler.FlowType.BEGIN / END
Profiler Mask
64-bit bitmask controlling which zones are captured: (zone_mask & capture_mask) != 0
constexpr uint64_t kCaptureMaskNone = 0;
constexpr uint64_t kCaptureMaskAll = (uint64_t)-1;
constexpr uint64_t kCaptureMaskDefault = uint64_t(1);
constexpr uint64_t kCaptureMaskProfiler = uint64_t(1) << 63;
If a zone uses mask 0, Carbonite treats it as kCaptureMaskDefault (1).
Workflow: Start with --/app/profilerMask=1 (major spans only, minimal overhead). If more detail needed, remove the arg (defaults to ALL). Always start coarse, then zoom in.
Profiler Channels
Higher-level abstraction over masks, toggled at runtime via settings:
Declaring a Channel (C++)
CARB_PROFILE_DECLARE_CHANNEL("myext.rendering", 1, true, g_myRenderingChannel);
CARB_PROFILE_ZONE(g_myRenderingChannel, "My rendering work");
Runtime Toggle
--/profiler/channels/<name>/enabled=true|false
Commonly disabled during benchmarks (too noisy):
--/profiler/channels/carb.events/enabled=false
--/profiler/channels/carb.tasking/enabled=false
Memory channels:
--/profiler/channels/cpu.memory/enabled=true
--/profiler/channels/cpu.virtualmemory/enabled=true
--/profiler/channels/graphics.memory/enabled=true
Tracy Plot Data (Numeric Metrics)
Record time-series values displayed as graphs in Tracy's Plot view.
C++
float gpuFrameTimeMs = 8.5f;
CARB_PROFILE_VALUE(gpuFrameTimeMs, 1, "GPU Frame Time (ms)");
int32_t triangleCount = 1500000;
CARB_PROFILE_VALUE(triangleCount, 1, "Triangle Count");
uint32_t gpuMemoryMB = 4096;
CARB_PROFILE_VALUE(gpuMemoryMB, 1, "GPU Memory (MB)");
int gpuIndex = 0;
CARB_PROFILE_VALUE(gpuFrameTimeMs, 1, "GPU %d Frame Time", gpuIndex);
Python
profiler.value_float(1, 8.5, "GPU Frame Time (ms)")
profiler.value_int(1, 1500000, "Triangle Count")
profiler.value_uint(1, 4096, "GPU Memory (MB)")
Event Annotations
Instant Events
CARB_PROFILE_EVENT(1, carb::profiler::InstantType::Thread, "Scene loading started");
CARB_PROFILE_EVENT(1, carb::profiler::InstantType::Process, "Phase transition: WARM -> BENCHMARK");
profiler.instant(1, carb.profiler.InstantType.THREAD, "Scene loading started")
profiler.instant(1, carb.profiler.InstantType.PROCESS, "Phase transition")
Display as Tracy messages (recommended):
--/plugins/carb.profiler-tracy.plugin/instantEventsAsMessages=true
command_macro.core Annotations
The omni.kit.command_macro.core extension auto-inserts [command_macro][Measurement] Start/End - <tag> events around benchmark measurements.
Automatic Python Function Capture
Capture all Python function calls without per-function instrumentation:
export CARB_PROFILING_PYTHON=1
Or programmatically:
profiler.set_python_profiling_enabled(True)
Performance warning: Significant overhead. Tracy file size ~4x larger (measured: 275MB → 1.2GB). Never use during benchmark measurement — only in the TRACY analysis phase.
Profiling Backend Summary
| Backend | Plugin | Output | Best For |
|---|
| CPU (ChromeTrace) | carb.profiler-cpu.plugin | .json/.gz | Offline analysis, targeted captures |
| Tracy | carb.profiler-tracy.plugin | .tracy (live capture) | Real-time flame graphs, GPU context, stats |
| NVTX | carb.profiler-nvtx.plugin | .nsys-rep (via nsys) | GPU kernels, CUDA/Vulkan analysis |
CPU backend can be toggled on/off at runtime for targeted capture:
profiler.set_capture_mask(1)
profiler.set_capture_mask(0)