원클릭으로 Manus에서 모든 스킬 실행

profiling-daft

스타3

포크0

업데이트2026년 5월 29일 15:15

Use when profiling or optimizing the runtime of daft or its test suites — finding where time goes, choosing a profiler on macOS, or A/B-validating a perf change. Covers the benchmark-vs-profile split (and the existing bench infra), the macOS Apple-Silicon toolchain (samply, hyperfine, why dtrace is out), idle-gating on a shared machine, the shared-bin/DAFT_BINARY_DIR A/B trick, the EMIT_TIMING-first method, and a baseline map of where the manual-test suite's time actually goes.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

avihut

avihut/daft

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Profiling daft

How to investigate where daft's runtime goes — the binary and the YAML test suite — and how to A/B-validate a fix. Read before any perf/optimization work.

Benchmark vs profile. daft already has rich benchmarking infra (compare wall-clock, prove a change is faster). Do not reinvent it — use it to validate. This skill covers profiling (find the bottleneck), which daft did not document.

Existing benchmarking infra (for validation):

mise run bench:<cmd> — per-command vs competition/baseline (benches/).

mise run bench:tests:integration — TUI bash-vs-YAML; bench:tests:manual — YAML timing.

benches/scenarios/test_manual_scale.sh — percentiles over the manual suite.

DAFT_MANUAL_TEST_EMIT_TIMING=1 — per-scenario [bench] lines (see below).

Method (cheapest, highest-signal first)

Test the presupposition before chasing it. Do the arithmetic first: wall × workers ÷ steps ≈ per-step work. For the manual suite that's ~57s × 10 ÷ 2217 ≈ ~250ms/step — git-operation territory, not process-startup territory. A "turn off feature X" hunch is often refuted by one division.
Mine the existing timing before instrumenting. Run DAFT_MANUAL_TEST_EMIT_TIMING=1 mise run test:manual -- --jobs 1 and aggregate the [bench] scenario="…" elapsed_ms=N setup_ms=N fixture_ms=N template_ms=N lines. This buckets per-scenario cost for free and ranks the slow tail.
Only then add probes. Reuse the DAFT_MANUAL_TEST_EMIT_TIMING gate for new per-scenario timers; env-gate any daft-internal probe (e.g. a counter at a gix::discover() chokepoint) so it ships disabled.
Earn an "it's intrinsic" verdict — don't assume it. If you conclude a hot path can't be cut, prove it by looking inside (sample CPU, count calls), not by inspecting its shape. Redundant per-invocation work hides behind "git is just slow."
CPU sampling is load-robust; wall-clock is not. A flamegraph's relative breakdown survives background load; any wall-clock number (hyperfine, suite Duration) does not — see idle-gating.

macOS Apple-Silicon toolchain

Tool	Use for	Notes
`hyperfine`	wall-clock A/B of a CLI	Runs each command in a block (not interleaved) — idle-gate it. `--warmup`, `-N` (no shell), `--export-json`.
`samply`	CPU flamegraph of daft / the runner	`cargo binstall samply` (or `cargo install`). Needs debug symbols → build `--profile profiling`. Browser-based; follows child processes.
`/usr/bin/sample`	quick text call-tree	Built-in, no install; needs a process living long enough to attach.
`cargo-instruments`	off-CPU / syscall / exec trace	Needs full Xcode (Command Line Tools / `xcode-select --install` is not enough). Only when CPU sampling proves the cost is "spawn + wait."
`criterion` / `divan`	in-process microbench	For isolating one op (e.g. `generate_repo`). Per-process sampling is hopeless at tens-of-ms — bench the op directly.
~~`dtrace` / `dtruss`~~	—	SIP-restricted on macOS; do not rely on it. Use samply.

Short-lived processes (a daft invocation is tens of ms) yield too few samples for per-process attribution — loop the op, or use hyperfine for wall + samply on the aggregate suite run.

daft-specific gotchas

Build with [profile.profiling] (release + debug symbols), never plain release — the release profile is strip = true + opt-level = "z", so samply frames come back blank. Don't cargo clean between build and profile (unpacked split-debuginfo lives in target/**/*.o).
Shared-bin hash invalidation. Editing any .rs changes the shared-bin content hash, forcing a slow opt-z+fat-LTO release rebuild. To A/B a runner (xtask) change cheaply, bypass it: DAFT_BINARY_DIR=<cached release dir> cargo run -p xtask -- manual-test rebuilds only debug xtask.
Don't fork-count with a PATH git shim — it perturbs daft and hangs (git rev-parse blocked). Count forks from code, or instrument the spawn site.
gix::discover() is cached per GitCommand instance, not across them (src/git/mod.rs). A command builds several GitCommands (settings, hooks, itself) → it discovers the repo 2–3×. Watch for this multiplier in any per- command path.
Replicate the test env for standalone profiling or you profile a different code path: DAFT_TESTING=1 (gates background daemons — see below), a DAFT_CONFIG_DIR sandbox, and cwd inside a real worktree.

Idle-gating (shared / multi-agent machines)

Other agents may be building in sibling worktrees. Re-verify idle immediately before each wall-clock bench (CPU sampling is exempt). A simple gate: 1-min loadavg < 5, no rustc > 40% CPU, no manual-test/cargo process, sustained ~90s. A suite run drives its own load to 40–90, so back-to-back runs see decaying self-inflicted averages — interpret accordingly.

`[profile.profiling]`

Checked into the workspace Cargo.toml. Tuned for readable flamegraphs + fast builds (clear frames + quick compile beat faithful-but-opaque fat-LTO for finding redundant calls): -O2, no LTO, many codegen units, full DWARF. Build with cargo build --profile profiling. For absolute-timing fidelity to the shipped binary, profile the size-optimized release instead (slower, opaquer).

Baseline map — where the manual suite's time goes

Measured on a 10-core Apple-Silicon Mac (post-#578). Re-measure after structural changes; treat as orientation, not gospel.

Total: 581 scenarios / 2217 steps. Reported parallel Duration ≈ 57s; full mise run test:manual wall ≈ 64s.
The suite is git-subprocess + filesystem bound (91%), not startup/feature bound. Summed core-work (÷ workers ≈ wall):
- step-loop (daft invocations + git assertions): 506s / 91%
- fixture provision: 45s / 8% (40s is inline repos bypassing the fixture cache)
- template snapshot: 5.6s / 1% (dead work — create_template() runs every scenario but reset() is interactive-only)
- sandbox dir setup: ~0
Per-command cost is git/gix work, not startup. daft startup ≈ 5.5ms (faster than bash -c true); daft worktree-list ≈ 86ms (raw git worktree list ≈ 7ms) — the gap is status-gathering + redundant discovery.
Ruled out: worker oversubscription (Duration flat at --jobs 10/16/24 → CPU-saturated at ncpu); disabling startup features/daemons (already gated under DAFT_TESTING, the runner sets it); disabling WAL/coordinator/gitoxide/hooks (load-bearing → deletes test coverage). The expensive features are already off or are exactly what the scenarios assert.

The actionable wins from that map are tracked as perf issues (lineage #509): redundant gix::discover() (a ships-to-users win, not just harness), the dead template snapshot, and routing inline repos through the fixture cache.

이 저장소의 다른 Skills

같은 저장소

daft-worktree-workflow

avihut/daft

Guides the daft worktree workflow for compartmentalized Git development. Use when working in daft-managed repositories (repos with a .git/ bare directory and branch worktrees as sibling directories), when setting up worktree environment isolation, or when users ask about worktree-based workflows. Covers daft commands, hooks automation via daft.yml, and environment tooling like mise, direnv, nvm, and pyenv.

2026-06-173

daft-tui-design

avihut/daft

Use when designing or revising any TUI screen in daft — picking a layout, deciding what to surface, choosing a color, writing a label or error message, picking keys, designing a confirmation flow, or auditing an existing screen. Companion to daft-tui (which covers mechanics). This skill covers design decisions: hierarchy, scan order, color semantics, microcopy, keybinding doctrine, error and empty states. Synthesised from Tufte / Norman / Wathan-Schoger / Nielsen / Shneiderman plus a six-TUI reference study (gitui, lazygit, bottom, atuin, yazi, k9s).

2026-05-253

daft-tui

avihut/daft

Use when adding, refactoring, or auditing TUI code in daft — pickers, inline renderers, modal screens, key handling, terminal lifecycle, or anywhere `enable_raw_mode` / `EnterAlternateScreen` appears. Covers daft's ratatui 0.30 conventions: panic-safe terminal restore, Stylize-first styling, hjkl + arrow keybindings, edtui integration, presenter/driver/render layering, and the ratatui vs dialoguer prompt boundary.

2026-05-253

diataxis-organize-docs

avihut/daft

Reorganize documentation into the Diataxis framework structure. Splits existing docs into tutorials, how-to guides, reference, and explanation sections.

2026-05-093

writing-recipes

avihut/daft

Style guide for daft recipe pages in docs/recipes/. Use when creating or revising any recipe pattern, walkthrough, or reference page — covers motivation-driven vignette structure, single-axis variants, self-contained minimal recipes, the cap on cross-link density, and VitePress directive formatting that survives prettier.

2026-05-093

name

profiling-daft

description

Profiling daft

How to investigate where daft's runtime goes — the binary and the YAML test suite — and how to A/B-validate a fix. Read before any perf/optimization work.

Benchmark vs profile. daft already has rich benchmarking infra (compare wall-clock, prove a change is faster). Do not reinvent it — use it to validate. This skill covers profiling (find the bottleneck), which daft did not document.

Existing benchmarking infra (for validation):

mise run bench:<cmd> — per-command vs competition/baseline (benches/).

mise run bench:tests:integration — TUI bash-vs-YAML; bench:tests:manual — YAML timing.

benches/scenarios/test_manual_scale.sh — percentiles over the manual suite.

DAFT_MANUAL_TEST_EMIT_TIMING=1 — per-scenario [bench] lines (see below).

Method (cheapest, highest-signal first)

Test the presupposition before chasing it. Do the arithmetic first: wall × workers ÷ steps ≈ per-step work. For the manual suite that's ~57s × 10 ÷ 2217 ≈ ~250ms/step — git-operation territory, not process-startup territory. A "turn off feature X" hunch is often refuted by one division.
Mine the existing timing before instrumenting. Run DAFT_MANUAL_TEST_EMIT_TIMING=1 mise run test:manual -- --jobs 1 and aggregate the [bench] scenario="…" elapsed_ms=N setup_ms=N fixture_ms=N template_ms=N lines. This buckets per-scenario cost for free and ranks the slow tail.
Only then add probes. Reuse the DAFT_MANUAL_TEST_EMIT_TIMING gate for new per-scenario timers; env-gate any daft-internal probe (e.g. a counter at a gix::discover() chokepoint) so it ships disabled.
Earn an "it's intrinsic" verdict — don't assume it. If you conclude a hot path can't be cut, prove it by looking inside (sample CPU, count calls), not by inspecting its shape. Redundant per-invocation work hides behind "git is just slow."
CPU sampling is load-robust; wall-clock is not. A flamegraph's relative breakdown survives background load; any wall-clock number (hyperfine, suite Duration) does not — see idle-gating.

macOS Apple-Silicon toolchain

Tool	Use for	Notes
`hyperfine`	wall-clock A/B of a CLI	Runs each command in a block (not interleaved) — idle-gate it. `--warmup`, `-N` (no shell), `--export-json`.
`samply`	CPU flamegraph of daft / the runner	`cargo binstall samply` (or `cargo install`). Needs debug symbols → build `--profile profiling`. Browser-based; follows child processes.
`/usr/bin/sample`	quick text call-tree	Built-in, no install; needs a process living long enough to attach.
`cargo-instruments`	off-CPU / syscall / exec trace	Needs full Xcode (Command Line Tools / `xcode-select --install` is not enough). Only when CPU sampling proves the cost is "spawn + wait."
`criterion` / `divan`	in-process microbench	For isolating one op (e.g. `generate_repo`). Per-process sampling is hopeless at tens-of-ms — bench the op directly.
~~`dtrace` / `dtruss`~~	—	SIP-restricted on macOS; do not rely on it. Use samply.

Short-lived processes (a daft invocation is tens of ms) yield too few samples for per-process attribution — loop the op, or use hyperfine for wall + samply on the aggregate suite run.

daft-specific gotchas

Build with [profile.profiling] (release + debug symbols), never plain release — the release profile is strip = true + opt-level = "z", so samply frames come back blank. Don't cargo clean between build and profile (unpacked split-debuginfo lives in target/**/*.o).
Shared-bin hash invalidation. Editing any .rs changes the shared-bin content hash, forcing a slow opt-z+fat-LTO release rebuild. To A/B a runner (xtask) change cheaply, bypass it: DAFT_BINARY_DIR=<cached release dir> cargo run -p xtask -- manual-test rebuilds only debug xtask.
Don't fork-count with a PATH git shim — it perturbs daft and hangs (git rev-parse blocked). Count forks from code, or instrument the spawn site.
gix::discover() is cached per GitCommand instance, not across them (src/git/mod.rs). A command builds several GitCommands (settings, hooks, itself) → it discovers the repo 2–3×. Watch for this multiplier in any per- command path.
Replicate the test env for standalone profiling or you profile a different code path: DAFT_TESTING=1 (gates background daemons — see below), a DAFT_CONFIG_DIR sandbox, and cwd inside a real worktree.

Idle-gating (shared / multi-agent machines)

`[profile.profiling]`

Baseline map — where the manual suite's time goes

Measured on a 10-core Apple-Silicon Mac (post-#578). Re-measure after structural changes; treat as orientation, not gospel.

Total: 581 scenarios / 2217 steps. Reported parallel Duration ≈ 57s; full mise run test:manual wall ≈ 64s.
The suite is git-subprocess + filesystem bound (91%), not startup/feature bound. Summed core-work (÷ workers ≈ wall):
- step-loop (daft invocations + git assertions): 506s / 91%
- fixture provision: 45s / 8% (40s is inline repos bypassing the fixture cache)
- template snapshot: 5.6s / 1% (dead work — create_template() runs every scenario but reset() is interactive-only)
- sandbox dir setup: ~0
Per-command cost is git/gix work, not startup. daft startup ≈ 5.5ms (faster than bash -c true); daft worktree-list ≈ 86ms (raw git worktree list ≈ 7ms) — the gap is status-gathering + redundant discovery.
Ruled out: worker oversubscription (Duration flat at --jobs 10/16/24 → CPU-saturated at ncpu); disabling startup features/daemons (already gated under DAFT_TESTING, the runner sets it); disabling WAL/coordinator/gitoxide/hooks (load-bearing → deletes test coverage). The expensive features are already off or are exactly what the scenarios assert.