Run any Skill in Manus with one click

user-test-performance

Use when running deep performance and throughput investigations for HUD publishing paths (widgets/zones/tiles), including MCP HTTP benchmarks and gRPC bidi stream benchmarks. The canonical gRPC widget publish-load benchmark is `examples/widget_publish_load_harness` (Rust); the Python `grpc_widget_publish_perf.py` script is a secondary alternative.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/Tzeusy/tze-hud --skill user-test-performance

Copy and paste this command into Claude Code to install the skill

Source

Tzeusy/tze-hud

Stars0

Forks0

UpdatedMay 10, 2026 at 17:56

File Explorer

24 files

SKILL.md

readonly

More from this repository

same repository

user-test

Tzeusy/tze-hud

Use when validating a cross-machine HUD flow where Butler deploys/runs the full Windows app over SSH+SCP (tailnet default host), then publishes configurable test messages to HUD zones via MCP `publish_to_zone`.

2026-05-110

hud-projection

Tzeusy/tze-hud

Use when an already-running Codex, Claude, opencode, or other LLM session should cooperatively project itself into tze_hud, publish status/output, poll HUD-originated input, acknowledge input, detach, or clean up through an external projection daemon. Do not use for terminal capture, PTY attachment, tmux scraping, process hosting, or direct runtime v1 MCP zone publishing.

2026-04-280

openspec-archive-change

Tzeusy/tze-hud

Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.

2026-04-180

craft-and-care

Tzeusy/tze-hud

Load the project's engineering quality bar before writing, reviewing, or merging code. The about/craft-and-care/ directory defines testing standards, performance budgets, code quality conventions, review expectations, observability requirements, and dependency hygiene for tze_hud. Consult before: writing tests, reviewing PRs, adding dependencies, designing error types, or making performance-sensitive changes. Triggers: "quality bar", "review checklist", "performance budget", "testing standards", "code conventions", "dependency policy".

2026-04-160

lay-and-land

Tzeusy/tze-hud

Load the project's topology maps to understand where components live, how they connect, and what boundaries exist. The about/lay-and-land/ directory contains component inventories, data flow diagrams, dependency maps, deployment topology, and operational docs. Consult before adding new components, modifying integration points, changing deployment, or when unsure where something lives in the system.

2026-04-160

legends-and-lore

Tzeusy/tze-hud

Load tze_hud RFC design contracts to contextualize implementation work. The about/legends-and-lore/ directory contains 13 RFCs that define the wire-level contracts, data models, state machines, protobuf schemas, and quantitative budgets for the tze_hud presence engine. Consult relevant RFCs before implementing features, writing protobuf definitions, designing state machines, choosing field numbers, setting performance budgets, or resolving cross-subsystem integration questions. Use this skill proactively when the task touches: scene graph, runtime kernel, timing/clocks, input handling, session protocol, configuration, system shell/chrome, leases, policy arbitration, events, or resource storage. Selectively load ONLY the RFCs relevant to your current task — do not load all 13 at once.

2026-04-160

Source

Tzeusy

Tzeusy/tze-hud

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name	user-test-performance
description	Use when running deep performance and throughput investigations for HUD publishing paths (widgets/zones/tiles), including MCP HTTP benchmarks and gRPC bidi stream benchmarks. The canonical gRPC widget publish-load benchmark is `examples/widget_publish_load_harness` (Rust); the Python `grpc_widget_publish_perf.py` script is a secondary alternative.

User Test Performance

Run focused performance drills for publish throughput, latency, and transport bottlenecks. This skill is separate from /user-test functional validation and is tuned for repeatable measurement.

Skill-Creator + Brainstorming Contract

Before any run, define and document:

Question: what exact performance hypothesis are we testing?
Target: which target_id are we measuring?
Workload identity: what is the benchmark primary key?
Success criteria: what threshold/regression signal matters?

This skill is built to make those answers machine-auditable and historically comparable.

Mandatory Audit Metrics

For every benchmark run, record these numerics:

End-to-end latency: e2e_latency_ms
Throughput: throughput_rps (and transport-specific variants)
Bytes out: bytes_out
Bytes in: bytes_in
Success/error counts

Additional metrics currently tracked (recommended for regression triage):

MCP: min/p50/p95/p99/max/mean/stddev latency
gRPC: send_phase_ms, result_drain_ms, send_rps, end_to_end_rps
Byte efficiency: bytes_out_per_success, bytes_in_per_success

Future extensions worth adding:

host CPU/GPU/memory at start/end
network RTT/jitter snapshots
error taxonomy buckets over time

Deterministic Primary Key

Every run computes a deterministic primary_key from normalized benchmark fields (target, transport, workload params, pacing, etc.).

Same primary key across different timestamps = same benchmark config
This enables trend lines and regression detection over time

Historical Result Storage

Runs append to:

./.claude/skills/user-test-performance/reference/results.csv

This file is intended for version control. Use timestamped rows grouped by primary_key to compare historical performance.

Target Registry

Targets are defined in:

./.claude/skills/user-test-performance/reference/targets.json

Start with one target (user-test-windows-tailnet, same host as /user-test), then add more (for example, a remote MacBook target) under new target_id keys.

Scripts

scripts/mcp_publish_perf.py
- Benchmarks MCP publishes for widget or zone modes.
- Supports count, concurrency, pacing, target registry, traceability tags, thresholds, and CSV recording.
examples/widget_publish_load_harness (Rust — canonical gRPC widget benchmark)
- Compiled Rust binary; build with cargo build --release -p widget_publish_load_harness.
- Supports --mode burst|paced, --publish-count, --duration-s, --target-rate-rps, --target-p99-rtt-us, --target-throughput-rps, --normalization-mapping-approved, --layer4-output-root (Layer 4 artifact emission), and full target registry via --targets-file (default: ./targets/publish_load_targets.toml).
- Outputs a JSON artifact to benchmarks/publish-load/ by default.
scripts/grpc_widget_publish_perf.py (Python — secondary alternative)
- Benchmarks WidgetPublish on one gRPC bidi stream.
- Supports pacing, target registry, byte accounting, traceability tags, thresholds, and CSV recording.
- Uses local scripts/proto_gen/ stubs (self-contained inside this skill).
scripts/widget_soak_runner.py
- Runs the Rust gRPC widget harness concurrently for agent-alpha, agent-beta, and agent-gamma by default.
- Defaults to a 60-minute paced soak (--duration-s 3600) and writes per-agent artifacts plus soak_summary.json under benchmarks/soak/<timestamp>/.
- Use with the benchmark Windows config (app/tze_hud_app/config/benchmark.toml) and benchmark scheduled task (scripts/windows/install_benchmark_hud_task.ps1).
scripts/compare_results.py
- Compares candidate vs baseline runs from reference/results.csv.
- Reports metric deltas and threshold pass/fail for regression gates.

Run Selection (Progressive Discovery)

Use the minimum run shape that answers the current hypothesis:

Transport bottleneck hypothesis (per-request overhead, HTTP connection churn) -> mcp_publish_perf.py
Stream throughput hypothesis (single bidi stream, drain/result pacing) -> examples/widget_publish_load_harness (Rust, canonical); fallback: scripts/grpc_widget_publish_perf.py
Regression hypothesis (did we get better/worse than prior runs?) -> compare_results.py

If uncertain, start with one fast MCP run (--count 20) and one fast gRPC run (--count 20), then deepen only the path that regresses.

Quick Commands

1) MCP widget: 100 publishes as fast as possible

python3 .claude/skills/user-test-performance/scripts/mcp_publish_perf.py \
  --target-id user-test-windows-tailnet \
  --mode widget \
  --widget-name main-progress \
  --count 100 \
  --concurrency 1 \
  --transition-ms 0

2) MCP zone: 100 publishes over 5 seconds

python3 .claude/skills/user-test-performance/scripts/mcp_publish_perf.py \
  --target-id user-test-windows-tailnet \
  --mode zone \
  --zone-name subtitle \
  --count 100 \
  --duration-ms 5000

3) gRPC widget stream (Rust — canonical): 1000 burst publishes

cargo run --release -p widget_publish_load_harness -- \
  --target-id user-test-windows-tailnet \
  --widget-name main-progress \
  --mode burst \
  --publish-count 1000

4) gRPC widget stream (Rust — canonical): 100 publishes paced over 5 seconds

cargo run --release -p widget_publish_load_harness -- \
  --target-id user-test-windows-tailnet \
  --widget-name main-progress \
  --mode paced \
  --duration-s 5 \
  --target-rate-rps 20

3a) gRPC widget stream (Python — secondary): 100 publishes on one bidi connection

python3 .claude/skills/user-test-performance/scripts/grpc_widget_publish_perf.py \
  --target-id user-test-windows-tailnet \
  --widget-name main-progress \
  --count 100

4a) gRPC widget stream (Python — secondary): 100 publishes over 5 seconds

python3 .claude/skills/user-test-performance/scripts/grpc_widget_publish_perf.py \
  --target-id user-test-windows-tailnet \
  --widget-name main-progress \
  --count 100 \
  --duration-ms 5000

5) Compare latest run vs prior baseline for same primary key

python3 .claude/skills/user-test-performance/scripts/compare_results.py \
  --results-csv .claude/skills/user-test-performance/reference/results.csv \
  --target-id user-test-windows-tailnet \
  --transport mcp_http \
  --mode widget

6) Install the benchmark Windows launch task

Copy app/tze_hud_app/config/benchmark.toml to C:\tze_hud\benchmark.toml, then register the benchmark task from Windows:

powershell -NoProfile -ExecutionPolicy Bypass `
  -File C:\tze_hud\install_benchmark_hud_task.ps1 `
  -BaseDir C:\tze_hud `
  -Psk $env:TZE_HUD_PSK
schtasks /Run /TN TzeHudBenchmarkOverlay

The installer stores the PSK as a DPAPI-protected file for the task user and the runner passes it to tze_hud.exe through TZE_HUD_PSK. It only stops an existing benchmark-config tze_hud.exe process before relaunching; it does not kill the production TzeHudOverlay process by executable name.

7) Three-agent 60-minute widget soak

python3 .claude/skills/user-test-performance/scripts/widget_soak_runner.py \
  --target-id user-test-windows-tailnet \
  --duration-s 3600 \
  --rate-rps 1 \
  --windows-live-metrics-path 'C:\tze_hud\perf\hud-wydpo\windowed_live_metrics.json' \
  --sample-windows-resources \
  --ssh-identity ~/.ssh/ecdsa_home

Traceability and Threshold Flags

The Python MCP and gRPC scripts (mcp_publish_perf.py, grpc_widget_publish_perf.py) support:

Traceability: --trace-spec-ref, --trace-rfc-ref, --trace-doctrine-ref, --trace-budget-ref
Thresholds: --expected-e2e-ms-max, --expected-p95-ms-max, --expected-p99-ms-max, --expected-throughput-rps-min, --expected-error-rate-max

These fields are persisted in results.csv for auditable historical comparisons.

The Rust harness (examples/widget_publish_load_harness) uses structured thresholds via --target-p99-rtt-us and --target-throughput-rps, with traceability embedded in the emitted JSON artifact (RFC-0005 / publish-load-harness spec ID) and Layer 4 artifact output (via --layer4-output-root).

Notes

MCP runtime path in this repo is currently one-request-per-connection (no keep-alive), so high-rate streams are transport-limited.
gRPC byte stats are protobuf payload bytes (ByteSize) and not full wire bytes with transport framing.