ワンクリックで
analyze-perfetto-trace
// Query and analyze Metro's perfetto compiler traces to find real hot spots and untraced time. Use whenever the user asks about metro compile-time perf, where time is going in a trace, or shares a perfetto screenshot.
// Query and analyze Metro's perfetto compiler traces to find real hot spots and untraced time. Use whenever the user asks about metro compile-time perf, where time is going in a trace, or shares a perfetto screenshot.
| name | analyze-perfetto-trace |
| description | Query and analyze Metro's perfetto compiler traces to find real hot spots and untraced time. Use whenever the user asks about metro compile-time perf, where time is going in a trace, or shares a perfetto screenshot. |
.perfetto-trace) or a perfetto UI screenshot.Metro writes perfetto traces into the configured traceDestination directory. As of the FIR + IR tracing rework, one compilation produces multiple files — one per FIR session and one per IR module fragment — all sharing a common id prefix:
<traceDestination>/<id>-<phase>-<moduleName>.perfetto-trace
<id> is a yyMMdd-HHmmss timestamp generated once per compilation. Every file from the same compilation invocation shares it, so you can group them by prefix.<phase> is fir or ir.<moduleName> is the FIR session name (commonMain, jvmMain, etc.) or the IR IrModuleFragment.name (often main). Filesystem-unsafe characters in module names are replaced with _.Examples: 260505-133503-fir-commonMain.perfetto-trace, 260505-133503-ir-main.perfetto-trace.
When asked "look at the trace", first decide which phase the user is asking about. FIR-side time (checkers, generators, supertype computation) lives in the fir-* files; IR-side time (graph processing, transformers) lives in the ir-* files. They are separate timelines — not a unified trace — so you cannot directly compare durations across files. If multiple older invocations exist in the directory, pick the freshest <id> group unless the user points at a specific file.
Use ./metrow trace (or the underlying scripts/trace-project.sh directly). It publishes Metro to mavenLocal, bumps the target project's metro version in gradle/libs.versions.toml, runs the given compile task with -Pmetro.traceDestination=metro/trace --rerun, locates the freshest .perfetto-trace (variant subdir varies by task), and copies it into tmp/traces/<timestamp>-<version>_<task>.perfetto-trace. The path is also written to tmp/traces/LATEST so you can chain with TRACE=$(cat tmp/traces/LATEST) in analysis.
./metrow trace <project-dir> <gradle-task> [version]
./metrow trace ~/dev/android/personal/CatchUp :app-scaffold:compileDebugKotlin
Pass --open-in-browser (or --open) to additionally launch the trace in ui.perfetto.dev — the script fetches Google's open_trace_in_ui helper once (cached at tmp/open_trace_in_ui) and fires it in the background so the UI can keep streaming the file:
./metrow trace --open-in-browser ~/dev/android/personal/CatchUp :app-scaffold:compileDebugKotlin
Equivalent direct call:
scripts/trace-project.sh [--open-in-browser] <project-dir> <gradle-task> [version]
Use this when the user asks for a "fresh trace" / "re-profile" / has just made a Metro change they want profiled against a real-world project.
For raw-perf iteration on a large generated project (no external repo needed), use benchmark/trace_compile.sh. It runs gradle-profiler against a fresh :app:component:compileKotlin --rerun of the benchmark project, with Metro's perfetto tracing enabled, and picks the iteration whose duration is closest to the measured-mean — so a single representative trace lands in tmp/traces/ (and tmp/traces/LATEST is updated).
benchmark/trace_compile.sh # run + pick + copy
benchmark/trace_compile.sh --open-in-browser # also open in ui.perfetto.dev
TRACE=$(cat tmp/traces/LATEST) # chain into analysis
Prereqs: the benchmark project must be generated for metro mode (cd benchmark && kotlin generate-projects.main.kts --mode metro). gradle-profiler is auto-installed on first run. Use this for "raw compile perf on a 500-module project" iteration loops where you don't need a real-world app like CatchUp.
Use the perfetto python library. It's available via pip but installed against a specific python — on this machine it's Python 3.13, not the default 3.14:
/opt/homebrew/opt/python@3.13/bin/python3.13 -c "from perfetto.trace_processor import TraceProcessor; print('ok')"
If python3 -c "import perfetto.trace_processor" fails with ModuleNotFoundError, switch to the 3.13 binary above. pip install perfetto installs against whatever python3 on PATH points to, which may not be the one actually invoked.
Do not try brew install perfetto — there is no homebrew cask/formula for it.
Always wrap the Python in a single -c invocation so it stays self-contained. The boilerplate:
/opt/homebrew/opt/python@3.13/bin/python3.13 -c "
from perfetto.trace_processor import TraceProcessor
tp = TraceProcessor(trace='<ABSOLUTE_PATH>')
r = list(tp.query('''
<SQL>
'''))
for row in r:
print(f'{row.dur/1e6:6.2f}ms {row.name}')
tp.close()
"
Notes:
trace= must be an absolute path.dur is in nanoseconds — divide by 1e6 for ms.tp.query() returns a rows iterator; wrap in list() if you'll iterate more than once.tp.close() at the end.slice tableThat's the one you'll use 95% of the time. Relevant columns:
| column | meaning |
|---|---|
id | slice id (primary key) |
name | trace span name (e.g. "Build GraphNode") |
dur | duration in nanoseconds |
parent_id | parent slice id (or NULL at top level) |
ts | start timestamp (nanoseconds since trace start) |
SELECT name, dur FROM slice ORDER BY dur DESC LIMIT 30
Gives an ordered view of the biggest spans. Start here.
SELECT name, SUM(dur) AS total_dur, COUNT(*) AS cnt
FROM slice GROUP BY name ORDER BY total_dur DESC LIMIT 30
Useful for per-class spans like "Visit X" that fire hundreds of times.
WITH parent AS (SELECT id FROM slice WHERE name = 'Build binding graph' LIMIT 1)
SELECT name, dur FROM slice
WHERE parent_id = (SELECT id FROM parent)
ORDER BY dur DESC
This is the most important query for finding gaps. Compare the summed children to the parent duration:
total_child = sum(row.dur for row in r)
print(f'children sum: {total_child/1e6:.2f}ms parent: <parent dur>ms gap: <diff>ms')
A large gap means there's untraced work inside the parent. That's where to add instrumentation or investigate.
SELECT name, dur FROM slice
WHERE parent_id = (SELECT id FROM slice WHERE name = 'Core transformers' LIMIT 1)
AND dur > 1e6 -- >1ms
ORDER BY dur DESC
SELECT name, dur FROM slice ORDER BY dur DESC LIMIT 1
If you want "all time spent under parent X including grandchildren":
WITH RECURSIVE descendants(id) AS (
SELECT id FROM slice WHERE name = 'Core transformers'
UNION
SELECT s.id FROM slice s JOIN descendants d ON s.parent_id = d.id
)
SELECT SUM(dur) FROM slice WHERE id IN (SELECT id FROM descendants)
Note: summing descendants double-counts time (parent dur already includes children). Use this for "total CPU in this subtree" questions, not for gap math.
This is the workflow you'll use most often when someone says "X looks slow but it's a black box":
SELECT dur FROM slice WHERE name = '<phase>' LIMIT 1.durs.trace("...") blocks in that phase, and add trace("...") around them.Worked example: on a 161ms catchup build, "Build binding graph" was 11.3ms but direct children summed to 4.2ms. The 7ms gap was construction of BindingLookup/IrBindingGraph/IrBindingStack at the top of generate(), which had no trace. Wrapping that in trace("Construct lookup & graph") surfaced it.
"Metro compiler").cnt in the sum-by-name query tells you whether to optimize per-call work (high cnt, small per-call) or one-shot cost (cnt=1, big dur).name = X?SELECT dur FROM slice WHERE name = '<name>' ORDER BY dur DESC LIMIT 5
Get the id first (SELECT id FROM slice WHERE name = '<name>' ORDER BY dur DESC LIMIT 1), then:
SELECT name, dur FROM slice WHERE parent_id = <id> ORDER BY dur DESC
SELECT thread.name, SUM(slice.dur) AS total
FROM slice JOIN thread_track ON slice.track_id = thread_track.id
JOIN thread ON thread_track.utid = thread.utid
GROUP BY thread.name ORDER BY total DESC