with one click
roboflow-inference
// Deployment option comparison (serverless, dedicated, self-hosted, batch) and Workflow execution patterns. For raw API URL patterns, auth, and request/response formats, see roboflow-api-reference.
// Deployment option comparison (serverless, dedicated, self-hosted, batch) and Workflow execution patterns. For raw API URL patterns, auth, and request/response formats, see roboflow-api-reference.
Use when uploading images, labeling, organizing datasets, creating Roboflow projects (detection/segmentation/keypoint/classification), tags, splits, versions, or RoboQL search.
Use when searching for or using public datasets/models on Roboflow Universe (universe.roboflow.com), the open repository of 1M+ computer vision datasets and 50K+ pre-trained models.
Use when training Roboflow models or improving accuracy - covers architecture selection, model IDs, checkpoints, evaluation metrics, and the iterative improvement playbook.
Protocol-level facts for Roboflow REST and Inference APIs — URL patterns, auth, parameters, error codes, and SDK quick-start. For deployment strategy and Workflow execution patterns, see roboflow-inference.
Use when answering questions about Roboflow plans, credit usage, or cost estimation; directs users to roboflow.com/pricing for current dollar amounts.
Use when explaining where Roboflow features live in the app.roboflow.com web app, mapping intents like upload, annotate, train, deploy to specific page URLs.
| name | roboflow-inference |
| description | Deployment option comparison (serverless, dedicated, self-hosted, batch) and Workflow execution patterns. For raw API URL patterns, auth, and request/response formats, see roboflow-api-reference. |
For agents — source-of-truth: This skill is authored in
roboflow/computer-vision-skillsand shipped with the Roboflow plugin. If your client has loaded the plugin (you'll seeroboflow:<name>skills in your available skills list), use those local skills — they're read fresh from disk every session. The same content served as MCP resources atroboflow://skills/<name>/...is a fallback for clients without the plugin and may lag this repo. Don't callReadMcpResourceToolforroboflow://skills/...URIs when a localroboflow:<name>skill is available.
Tip: If you're connected to the Roboflow MCP server, prefer its inference tools over raw HTTP — auth is handled. For workflows the headline tool is
workflows_run(run a saved workflow byworkflow_id— the workflow URL slug; workspace is inferred from the API key — see Finding your workspace slug). For single-model calls usemodels_infer.workflow_specs_runandworkflow_specs_validateexist for narrow inline-spec exceptions described under "Authoring Workflows" below.
Prefer Workflows over direct model inference. Workflows let you chain model + visualization + logic blocks in one call. Direct
models_inferreturns JSON only — no annotated images, and instance segmentation responses can be very large. See workflows and workflow-templates.
Authoring Workflows — don't paste JSON into chat or scripts. Workflows are authored on the Roboflow platform (storage, versioning, and retrieval go through the platform) and run from code by identifier. Two authoring modes — propose / infer the right one from session context, never silently pick:
- Mode A — Agent-driven (MCP, in-session) — for demos, previews, or when the user is committed to in-session "vibe coding". Agent designs the blocks, uses MCP authoring tools to create+save the workflow on the platform during the session (ground the design with
workflow_blocks_list/workflow_blocks_get_schema; validate withworkflow_specs_validate), then runs it.- Mode B — Platform-driven (Roboflow app + in-app agent) — better default for non-trivial / sophisticated cases, when the user prefers visual iteration, when they aren't committed to agent-driven authoring this session, or as the fallback when Mode A hits an issue. Agent proposes the block design and hands the user a link to the Workflows builder; the user builds (manually or with the more context-grounded in-app agent), tests in the preview, saves, and shares the workspace + workflow URL slugs back (both visible in the builder URL:
app.roboflow.com/<workspace-slug>/workflows/<workflow-slug>).Either mode lands at the same run path:
workflows_run(MCP) orclient.run_workflow(workspace_name=..., workflow_id=...)(SDK). Inline specs (workflow_specs_run) are an exception, not a default — only when the user explicitly asks for a throwaway run, and validate the spec first withworkflow_specs_validate. See workflows "Authoring & Deployment" for the full flow.
For live video (webcam, RTSP, file): the MCP
workflows_runtool only handles single static images. For live video, present the user with three options (don't pick one silently): (A) WebRTC → serverless GPU, (B) WebRTC → localinference server, or (C) in-processInferencePipeline. They have different setup costs, dep sizes, and latency characteristics — surface a brief 1-line summary of each and let the user choose. Seeroboflow://skills/inference/workflows("Video Stream" section) for full code and the comparison table.
| Option | Best For | Latency | Scaling | Cost Model | GPU |
|---|---|---|---|---|---|
| Serverless | Getting started, variable traffic | Low | Auto | Per-inference credit | Yes |
| Dedicated | Predictable workloads, low latency | Very low | Manual/autoscale | Per-hour credits | Optional |
| Self-hosted | Full control, edge | Hardware-dependent | Manual | Metered + infra cost | Optional |
| Batch Processing | Large offline datasets, videos | Async (minutes-hours) | Auto-provisioned | Per-job | Optional |
Serverless -- default choice. Zero setup, auto-scales, 20MB upload limit. Use models_infer or workflows_run MCP tools.
Dedicated -- need consistent latency, large models (Florence 2), or high throughput. Development and production tiers available. Subdomain: <name>.roboflow.cloud.
Self-hosted -- deploy Roboflow Inference via Docker on your own hardware (Jetson, cloud VMs, RPi). Same API surface as serverless -- just change api_url.
Batch Processing -- runs a Workflow on uploaded images/videos asynchronously. No real-time requirement. Results delivered as JSON.
Real-time video (webcam/RTSP/file) -- three deployment options; ask the user which one before writing code:
webrtc-gpu-small/medium/large).pip install inference-cli && inference server start (Docker recommended); lowest latency, isolates the heavy CV/model deps inside the server.InferencePipeline in-process — pip install inference in a venv (prefer uv); runs the workflow loop directly in the user's Python process, no separate server. Heavy deps (torch, opencv, onnxruntime) install locally.All three have a slower first run (model download / warmup) before subsequent runs hit cached state — tell the user this so they don't think the script is hung.
roboflow://skills/inference/workflows ("Video Stream" section) for full code and a comparison table.| Tool | Purpose |
|---|---|
models_list | List trained models for a project |
models_get | Get details for a trained model |
models_infer | Run single-model inference on one image via serverless API |
models_train | Start training a model on a dataset version |
models_get_training_status | Check training progress and metrics |
workflows_run | Preferred. Run a saved workflow by workflow_id (the workflow URL slug; workspace is inferred from the API key — see Finding your workspace slug). Optional parameters. |
workflow_specs_validate | Validate an inline workflow spec without running it — use before any inline run. |
workflow_specs_run | Exception only. Run an inline workflow spec — for explicit throwaway runs the user asked for. |
For most operations, prefer the Roboflow MCP tools above — they handle auth and need nothing installed locally. Reach for local Python packages only for the gaps: integration scripts (inference-sdk), Batch Processing / Data Staging (inference-cli), the self-hosted server (inference-cli), and asset scripts that need typed Python objects.
See local-tooling for what to install for which use case, the recommended uv-based env setup, conda / venv fallbacks, and common pitfalls.
For canonical response shapes (object detection, classification, segmentation, keypoint) with all fields including class_id, detection_id, class_confidence, see roboflow://skills/api-reference/inference.
Instance segmentation points arrays are the main culprit for bloated responses. Each detection includes a polygon with potentially hundreds of coordinate pairs. A single image with many detections can return megabytes of JSON.
Mitigation strategies:
class_filter to only return classes you needpoints array when you only need bounding boxesWorkflow image outputs are a second culprit. Visualization blocks (bounding box, polygon, mask, label, halo, …) emit rendered images as base64-encoded blobs inside the response — a 720p annotated frame is hundreds of KB of JSON-escaped string. When you call workflows_run / workflow_specs_run via MCP, this routinely overflows the tool-result token budget. Decode every image-shaped output ({"type": "base64", "value": "..."}) and write it to disk instead of carrying it through agent context. Don't hard-code field names — the output keys are whatever the workflow author declared via JsonField; iterate output.keys() and shape-check.
What it is. A Roboflow-managed cloud service that runs a Workflow over a batch of images or videos asynchronously, provisioning the infrastructure for you. "Ideal for asynchronously processing large amounts of data." — Roboflow docs.
Problem it solves. Bulk inference over thousands to millions of files without standing up your own GPUs, queues, or autoscaler. You hand Roboflow a Workflow plus a batch of inputs, pay per job, and get JSON results back when the job finishes.
Pick it when the data is stored (not live), per-file cost matters more than per-file latency, and minutes-to-hours per job is acceptable. Pick something else when you need real-time per-request results (use Serverless or Dedicated) or air-gapped/on-prem processing (use Self-hosted).
Surfaces: Roboflow web UI, inference rf-cloud CLI, and REST API.
The inference rf-cloud CLI exposes two subcommand groups: data-staging (manage input/output batches) and batch-processing (submit and monitor jobs). Run any command with --help for the full option list.
Minimal end-to-end:
# Stage images
inference rf-cloud data-staging create-batch-of-images \
--images-dir ./my-images --batch-id my-batch
# Submit
inference rf-cloud batch-processing process-images-with-workflow \
--workflow-id my-workflow --batch-id my-batch
# -> prints JOB_ID
# Monitor
inference rf-cloud batch-processing show-job-details --job-id JOB_ID
# Export results
inference rf-cloud data-staging export-batch \
--target-dir ./results --batch-id OUTPUT_BATCH_ID
Data Staging commands — see batch-staging for nuances (data sources, JSONL reference format, multipart batches, webhook notifications):
| Command | Purpose |
|---|---|
data-staging list-batches | List staging batches in the workspace |
data-staging create-batch-of-images | Create an input batch from a local directory, signed-URL JSONL, or cloud-storage path |
data-staging create-batch-of-videos | Same as above, but for video files |
data-staging show-batch-details | Show metadata for a single batch |
data-staging list-batch-content | List file URLs in a batch (filter by part, write JSONL) |
data-staging list-ingest-details | Per-shard ingest status for debugging URL ingests |
data-staging export-batch | Download all files from a batch (e.g. job outputs) to a local directory |
Batch Processing (job) commands — see batch-jobs for nuances (compute configuration, workflow parameters, image-output persistence, aggregation format, video FPS, restarts, TRT compilation):
| Command | Purpose |
|---|---|
batch-processing list-jobs | List jobs in the workspace |
batch-processing show-job-details | Show stages and current status of a single job |
batch-processing process-images-with-workflow | Submit an image-batch job |
batch-processing process-videos-with-workflow | Submit a video-batch job |
batch-processing fetch-logs | Fetch job logs (filter by severity, write JSONL) |
batch-processing abort-job | Terminate a running job |
batch-processing restart-job | Restart a failed job (optionally with new compute settings) |
batch-processing trt-compile | Compile a model to TensorRT for one or more NVIDIA devices |
plans-and-pricing.Full reference: Roboflow Batch Processing docs.