원클릭으로
hf-perftest
// Benchmark Gradio app performance. Use when the user wants to load test a Gradio app, compare branches, profile a HF Space, or run A/B tests.
// Benchmark Gradio app performance. Use when the user wants to load test a Gradio app, compare branches, profile a HF Space, or run A/B tests.
| name | hf-perftest |
| description | Benchmark Gradio app performance. Use when the user wants to load test a Gradio app, compare branches, profile a HF Space, or run A/B tests. |
pip install hf-perftest
hf-perftest list-apps
Built-in apps can be used by name (no file path needed): echo_text, file_heavy, image_to_image, streaming_chat, stateful_counter, llm_chat, text_to_image, audio_to_audio, video_to_video.
hf-perftest run \
--app echo_text \
--tiers 1,10,100 \
--requests-per-user 10 \
--output-dir results
Key options:
--app — Built-in app name or path to a Gradio app file (required)--tiers — Comma-separated concurrent user counts (default: 1,10,100)--requests-per-user — Rounds per tier (default: 10)--mode burst|wave — Simultaneous or staggered requests (default: burst)--concurrency-limit — App concurrency limit (default: 1, "none" for unlimited)--mixed-traffic — Add background page loads, uploads, and downloads alongside predictions--num-workers — Number of Gradio workers via GRADIO_NUM_WORKERS (default: 1)--port — App port (default: 7860, auto-increments if occupied)--api-name — Target API endpoint (auto-detected if omitted)Single branch:
hf-perftest run-remote run \
--apps echo_text streaming_chat \
--branch main \
--hardware cpu-upgrade \
--tiers 1,10,100
A/B test:
hf-perftest run-remote ab \
--apps echo_text file_heavy \
--base main \
--branch my-optimization \
--hardware cpu-upgrade \
--tiers 1,10,100
Profile a HF Space:
hf-perftest run-remote run \
--apps owner/space-name \
--sidecar prompts.json \
--api-name /generate \
--branch main \
--hardware gpu-l4-1
Additional remote options:
--hardware — HF Jobs flavor (default: cpu-basic)--sidecar — Prompt files for spaces--timeout — Job timeout (default: 90m)--dry-run — Preview without submitting--run-name — Label for the runhf-perftest result-schema
Prints the directory structure of benchmark results.
For apps with non-text inputs, create a .prompts.json sidecar file. Two formats:
String list (text-only inputs — replaces the first text input):
["A cat sitting on a windowsill", "Sunset over a mountain lake"]
List of lists (full data payloads — sent as-is):
[
["A cat sitting on a windowsill", 1024, 1024, 4, 42, true],
["Sunset over a mountain lake", 1024, 1024, 4, 42, true]
]
Results are saved to <output-dir>/<timestamp>/summary.json with per-tier breakdowns:
client_summary — p50/p90/p95/p99 client latency in ms, success rateserver_summary — Per-phase server timing (queue_wait, preprocess, fn_call, postprocess, total)background_traffic — (if --mixed-traffic) p50/p90/p99 for page loads, uploads, downloadsAlways validate with multiple runs — single runs may be affected by system variance.
hf jobs logs <job_id>
hf jobs inspect <job_id>