一键在 Manus 中运行任何 Skill

$pwd:

rt-vlm

Name: Rt Vlm
Author: NVIDIA-AI-Blueprints

// Use this skill when working with the RTVI VLM or RT-VLM microservice API on VSS 3.1. Generate dense captions and alerts for stored video files and live RTSP streams via `/v1/generate_captions_alerts`; upload media via `/v1/files`; add and remove live streams with `/v1/streams/add` and `/v1/streams/delete/{stream_id}`; call OpenAI-compatible `/v1/chat/completions`; consume Kafka caption, incident, and error topics; or debug rtvi-vlm responses. For deployment, read `references/deploy-rt-vlm-service.md` first.

在 Manus 中运行

$ git log --oneline --stat

stars:1,481

forks:315

updated:2026年5月7日 23:11

文件资源管理器

7 个文件

SKILL.md

readonly

related-skills.json

同仓库

deploy.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Deploy, debug, or tear down any VSS profile using a compose-centric workflow — config (dry-run) with env overrides, review resolved compose, then compose up. Use this skill when the user says "deploy vss", "deploy `profile`", "debug deploy", "verify deployment", or "why is my vss deploy broken".

2026-05-221.5k

video-summarization.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Summarize a video by calling the VLM NIM or the Long Video Summarization (LVS) microservice directly. For short videos (under 60s) call the VLM's OpenAI-compatible chat completions endpoint; for long videos (60s or longer) call the LVS microservice. Use when asked to summarize a video, describe what happens in a video, analyze a recording, call or debug LVS summarize/model/health/recommended-config/metrics endpoints, or configure and troubleshoot the LVS service that backs long-video summarization.

2026-05-071.5k

vios.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Query VIOS REST APIs: sensor list, recording timelines, video clip extraction, snapshot capture, add/delete sensors and streams

2026-05-071.5k

alerts.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Manage and monitor VSS alerts after the alerts profile is deployed. The deployment's mode (CV vs VLM real-time) is fixed at deploy time and determines the workflow — start/stop real-time alerts via the VSS Agent on a VLM deployment, onboard CV alerts by adding RTSP streams to VIOS on a CV deployment, query incidents, customize verifier prompts. Use when asked to start/stop a real-time alert, check or list alerts, add a camera, use a sample video for alerts, customize alert prompts, or view verdicts.

2026-05-071.5k

report.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Produce video analysis reports by discovering the deployed VSS agent, querying POST /generate for a timestamped captioned summary of the clip, then formatting the agent reply as the standard Video Analysis Report markdown.

2026-05-071.5k

video-analytics.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Query video analytics data and metrics from Elastic search via the VA-MCP server (port 9901). This includes incidents, alerts, sensor data, and metrics. Use for any question about violations, alerts, incidents, object counts, speeds, occupancy, or anything that requires looking up recorded events. This is the primary way to answer a question that requires incidents, alerts and other metrics such as people counts and violations.

2026-05-071.5k

package.json

"author": "NVIDIA-AI-Blueprints"

"repository": "NVIDIA-AI-Blueprints/video-search-and-summarization"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	rt-vlm
description	Use this skill when working with the RTVI VLM or RT-VLM microservice API on VSS 3.1. Generate dense captions and alerts for stored video files and live RTSP streams via `/v1/generate_captions_alerts`; upload media via `/v1/files`; add and remove live streams with `/v1/streams/add` and `/v1/streams/delete/{stream_id}`; call OpenAI-compatible `/v1/chat/completions`; consume Kafka caption, incident, and error topics; or debug rtvi-vlm responses. For deployment, read `references/deploy-rt-vlm-service.md` first.
license	Apache-2.0
metadata	{"version":"3.1.0","github-url":"https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization","tags":"nvidia blueprint operational deployment"}

RTVI VLM Usage API (VSS 3.1)

RTVI VLM is NVIDIA's real-time vision-language microservice: decode video (file or RTSP) → segment into chunks → run a VLM (cosmos-reason1, cosmos-reason2, or any OpenAI-compatible model) → stream dense captions back over SSE/HTTP and publish captions + incident alerts + errors to Kafka. Use this skill whenever you need to hit any /v1/... endpoint on the VSS 3.1 rtvi-vlm microservice: caption generation, file upload, live-stream management, health checks, NIM-compatible chat completions, Prometheus metrics. API reference: https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.

Setup

export BASE_URL="http://localhost:8000"     # RTVI VLM host:port — matches $RTVI_VLM_PORT in compose
export API_KEY="$NGC_API_KEY"               # Bearer token (NGC key works if the service was deployed with NGC auth)

Every request below uses Authorization: Bearer $API_KEY. Health endpoints (/v1/health/*, /v1/ready, /v1/live, /v1/startup) typically work without auth.

Smoke test before use:

curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jq

Quick Start — dense captions from a local video

# 1. Upload the video, capture its file id
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files" \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@/path/to/warehouse.mp4" \
  -F "purpose=vision" \
  -F "media_type=video" | jq -r '.id')

# 2. Generate captions + alerts (SSE stream of chunked responses)
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"id\": \"$FILE_ID\",
    \"prompt\": \"Write a concise dense caption for each 10-second segment of this warehouse video.\",
    \"model\": \"cosmos-reason1\",
    \"chunk_duration\": 10,
    \"stream\": true
  }"

Endpoints

Captions

Generate VLM captions and alerts for videos and live streams.

`POST /v1/generate_captions_alerts` — Generate VLM captions (and alerts) for video/stream

Required:

Field	Type	Description
`id`	string \| array	UUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch
`prompt`	string	User prompt to the VLM (e.g. dense-caption instruction)
`model`	string	Model name — see `GET /v1/models`

Key optional fields:

Field	Type	Default	Description
`system_prompt`	string	—	System prompt; use `<think></think><answer></answer>` tags to enable reasoning on Cosmos Reason
`enable_reasoning`	boolean	false	Turn on reasoning for Cosmos Reason models
`enable_audio`	boolean	false	Transcribe audio (via Riva) and fold into captions
`chunk_duration`	integer	—	Segment video into N-second chunks (`0` = no chunking)
`chunk_overlap_duration`	integer	0	Overlap between consecutive chunks
`num_frames_per_second_or_fixed_frames_chunk`	number	—	FPS (if `use_fps_for_chunking=true`) or fixed frames per chunk
`use_fps_for_chunking`	boolean	false	Interpret above as FPS vs. fixed-frame count
`vlm_input_width` / `vlm_input_height`	int	—	Resize frames before inference (0 = native)
`media_info`	object	—	`{"start_offset_ms": ..., "end_offset_ms": ...}` to process a slice of a file (not live streams)
`stream`	boolean	false	SSE: emit per-chunk caption deltas as `data:` events (recommended for long videos)
`max_tokens` / `temperature` / `top_p` / `top_k` / `seed` / `ignore_eos`			Standard sampling controls
`response_format`	object	—	Query response format object
`mm_processor_kwargs`	object	—	Extra kwargs for the multimodal processor (e.g. size, shortest/longest edge)

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
    "model": "cosmos-reason1",
    "chunk_duration": 10,
    "stream": true
  }'

Response (200, SSE when stream=true): each event payload has start_ts, end_ts, content, and a terminal {"status": "completed"} event. Response (200, non-stream): { "id", "object": "caption", "choices": [{...}], "usage": {...} }.

`DELETE /v1/generate_captions_alerts/{stream_id}` — Stop caption generation for a live stream

Stops inference while leaving the stream registered. Pair with DELETE /v1/streams/delete/{stream_id} to also un-register the RTSP source.

curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

Files

Upload and manage media files consumed by /v1/generate_captions_alerts.

`POST /v1/files` — Upload a media file (multipart)

curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"

Response: { "id", "object": "file", "bytes", "created_at", "filename", "purpose" }.

`GET /v1/files?purpose=vision` — List uploaded files

`GET /v1/files/{file_id}` — File metadata

`GET /v1/files/{file_id}/content` — Download original file content

`DELETE /v1/files/{file_id}` — Delete file (releases asset storage)

Live Stream

RTSP stream lifecycle.

`POST /v1/streams/add` — Register one or more RTSP streams

Required per stream: liveStreamUrl (must start with rtsp://), description. Optional: username, password, sensor_name, and placement metadata (place_name, place_type, place_lat, place_lon, place_alt, place_coordinate_x, place_coordinate_y).

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

`GET /v1/streams/get-stream-info` — List active streams

`DELETE /v1/streams/delete/{stream_id}` — Remove a single stream

`DELETE /v1/streams/delete-batch` — Remove many (`{"stream_ids":[...]}`)

NIM Compatible

OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.

`POST /v1/chat/completions` — OpenAI-compatible chat (text + multimodal)

Required: messages, model. Text-only requests omit id / video_url / image_url.

curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'

`POST /v1/completions` — OpenAI-compatible legacy completions

`GET /v1/version` — `{ "version": "3.1.0-..." }`

`GET /v1/license` — license text

`GET /v1/manifest` — NIM manifest

`GET /v1/health/live` · `GET /v1/health/ready` — NIM-style probes

Models · Metadata · Metrics · Health Check

`GET /v1/models` — List loaded VLMs: `{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/metadata` — Service metadata (build, release, image tag)

`GET /v1/metrics` — Prometheus metrics (plain text)

`GET /v1/ready` · `GET /v1/live` · `GET /v1/startup` — Kubernetes-style probes

Common Workflows

The four scenarios from the VSS 3.1 RT-VLM Usage Skill requirements.

1. Dense captions from a stored video file

# Upload → capture file id → generate captions (SSE stream)
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files" \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d "{
    \"id\": \"$FILE_ID\",
    \"prompt\": \"Describe warehouse events in 1 sentence per 10s chunk.\",
    \"model\": \"cosmos-reason1\",
    \"chunk_duration\": 10,
    \"stream\": true
  }"

# When done, free storage:
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"

2. Dense captions from an RTSP live stream

# Register the stream
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}' \
  | jq -r '.results[0].id')

# Start continuous caption generation (runs until stream stops or DELETE)
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d "{
    \"id\": \"$STREAM_ID\",
    \"prompt\": \"Describe each event; start each sentence with a timestamp.\",
    \"model\": \"cosmos-reason1\",
    \"chunk_duration\": 10,
    \"num_frames_per_second_or_fixed_frames_chunk\": 2,
    \"use_fps_for_chunking\": true,
    \"stream\": true
  }" &

# Tear down when finished:
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID"  -H "Authorization: Bearer $API_KEY"

Pre-req: the container was started with:

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_TOPIC=vision-llm-messages

RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

HOST_IP=

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: \nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"


**Consume alerts from Kafka** (when using the VSS foundational Kafka container).
Kafka values are NvSchema protobuf payloads, so use `print.value=false` for a
clean validation pass that shows timestamp, key, and headers without dumping
binary payload bytes:
```bash
docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

If Kafka is not running in the VSS mdx-kafka container, use the Kafka CLI from the host running the broker:

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

Incident protobuf (ext.proto :: Incident) key fields: sensorId, timestamp, end, objectIds, frameIds, place, analyticsModule, category, isAnomaly (true for alerts), llm (nested VisionLLM), info map including triggerPhrase, verdict, requestId, chunkIdx, streamId, alertCategory (if the deployment supports the alert_category query field — post-3.1).

3. Kafka workflows (alerts + message bus)

Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in references/kafka-workflows.md.

Error Reference

Code	Meaning	Common Cause
400	Bad Request	Missing required field (`id`, `prompt`, `model`); unsupported `media_type`; unknown `model` name
401	Unauthorized	Missing/invalid `Authorization: Bearer $API_KEY` — or wrong key format (expect `nvapi-...`)
404	Not Found	`file_id` deleted / stream_id not registered / wrong endpoint path (note: `{stream_id}` is required on `DELETE /v1/streams/delete/{stream_id}`)
413	Payload Too Large	Uploaded file exceeds server `MAX_FILE_SIZE`; increase or pre-chunk the video
422	Unprocessable Entity	Pydantic schema violation — e.g. `use_fps_for_chunking=true` without `num_frames_per_second_or_fixed_frames_chunk`; stream ids supplied to a file-only field like `media_info`
429	Rate Limited	Too many concurrent streams — raise `VLM_BATCH_SIZE` or spread across instances
500	Server Error	VLM inference exception (OOM, model unavailable) — check `docker logs rtvi-vlm-*`
503	Service Busy	Startup not complete (model still downloading) or upstream NIM dependency unhealthy

Gotchas

3.1 GA endpoint is /v1/generate_captions_alerts, not /v1/generate_captions. The rename lands in a post-3.1 build. For VSS 3.1 releases (rtvi_vlm/26.01.x–26.02.3), always use the _alerts suffix. https://docs.nvidia.com/vss/latest/real-time-vlm-api.html is the canonical reference.
No URL-based input in 3.1 GA — the url/media_type/creation_time fields were added post-3.1. You must upload via POST /v1/files first and then pass the returned id.
Alert trigger = the tokens "yes" or "true" in the VLM response (case-insensitive). There is no per-request alert flag. Design prompts with an explicit Anomaly Detected: Yes/No line and set system_prompt to constrain the model to Yes/No answers (per the VSS docs). Every chunk is published to KAFKA_TOPIC; matched chunks additionally go to KAFKA_INCIDENT_TOPIC with isAnomaly=true, info["triggerPhrase"] set to the matched tokens, and info["verdict"]="confirmed".
No alert_category query field in the 3.1 OpenAPI spec. The Kafka incident topic defaults incident.category = "vlm-alert" on 3.1. Post-3.1 builds expose an optional alert_category request field to override incident.category.
Kafka topics are server-side config, not per-request. The KAFKA_* env vars (via compose RTVI_VLM_KAFKA_* rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.
stream=true returns Server-Sent Events, not chunked JSON. Use curl -N (no buffering). Each event is data: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n, terminated by data: {"status":"completed"}\n\n. Without stream=true the server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.
chunk_duration=0 disables chunking — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed max_model_len.
Default frame budget caps at VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES (256). Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten chunk_duration to stay within budget.
enable_reasoning requires a Cosmos Reason model. Passing it with Qwen3-VL or other non-reasoning models is a no-op.
/v1/metrics requires auth, unlike /v1/health/*. Prometheus scrapers need the Bearer token.
File upload is multipart, not JSON. Use -F file=@path -F purpose=vision -F media_type=video; a -d body returns 422.
Live-stream lifecycle requires two deletes to fully tear down: DELETE /v1/generate_captions_alerts/{stream_id} stops inference; DELETE /v1/streams/delete/{stream_id} un-registers the stream. Skipping the second leaks RTSP connection resources.

rt-vlm

同仓库更多 Skills

同仓库更多 Skills

RTVI VLM Usage API (VSS 3.1)

Setup

Quick Start — dense captions from a local video

Endpoints

Captions

POST /v1/generate_captions_alerts — Generate VLM captions (and alerts) for video/stream

DELETE /v1/generate_captions_alerts/{stream_id} — Stop caption generation for a live stream

Files

POST /v1/files — Upload a media file (multipart)

GET /v1/files?purpose=vision — List uploaded files

GET /v1/files/{file_id} — File metadata

GET /v1/files/{file_id}/content — Download original file content

DELETE /v1/files/{file_id} — Delete file (releases asset storage)

Live Stream

POST /v1/streams/add — Register one or more RTSP streams

GET /v1/streams/get-stream-info — List active streams

DELETE /v1/streams/delete/{stream_id} — Remove a single stream

DELETE /v1/streams/delete-batch — Remove many ({"stream_ids":[...]})

NIM Compatible

POST /v1/chat/completions — OpenAI-compatible chat (text + multimodal)

POST /v1/completions — OpenAI-compatible legacy completions

GET /v1/version — { "version": "3.1.0-..." }

GET /v1/license — license text

GET /v1/manifest — NIM manifest

GET /v1/health/live · GET /v1/health/ready — NIM-style probes

Models · Metadata · Metrics · Health Check

GET /v1/models — List loaded VLMs: { "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata — Service metadata (build, release, image tag)

GET /v1/metrics — Prometheus metrics (plain text)

GET /v1/ready · GET /v1/live · GET /v1/startup — Kubernetes-style probes

Common Workflows

1. Dense captions from a stored video file

2. Dense captions from an RTSP live stream

Pre-req: the container was started with:

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_TOPIC=vision-llm-messages

RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

HOST_IP=

3. Kafka workflows (alerts + message bus)

Error Reference

Gotchas

RTVI VLM Usage API (VSS 3.1)

Setup

Quick Start — dense captions from a local video

Endpoints

Captions

POST /v1/generate_captions_alerts — Generate VLM captions (and alerts) for video/stream

DELETE /v1/generate_captions_alerts/{stream_id} — Stop caption generation for a live stream

Files

POST /v1/files — Upload a media file (multipart)

GET /v1/files?purpose=vision — List uploaded files

GET /v1/files/{file_id} — File metadata

GET /v1/files/{file_id}/content — Download original file content

DELETE /v1/files/{file_id} — Delete file (releases asset storage)

Live Stream

POST /v1/streams/add — Register one or more RTSP streams

GET /v1/streams/get-stream-info — List active streams

DELETE /v1/streams/delete/{stream_id} — Remove a single stream

DELETE /v1/streams/delete-batch — Remove many ({"stream_ids":[...]})

NIM Compatible

POST /v1/chat/completions — OpenAI-compatible chat (text + multimodal)

POST /v1/completions — OpenAI-compatible legacy completions

GET /v1/version — { "version": "3.1.0-..." }

GET /v1/license — license text

GET /v1/manifest — NIM manifest

GET /v1/health/live · GET /v1/health/ready — NIM-style probes

Models · Metadata · Metrics · Health Check

GET /v1/models — List loaded VLMs: { "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata — Service metadata (build, release, image tag)

GET /v1/metrics — Prometheus metrics (plain text)

GET /v1/ready · GET /v1/live · GET /v1/startup — Kubernetes-style probes

Common Workflows

1. Dense captions from a stored video file

2. Dense captions from an RTSP live stream

Pre-req: the container was started with:

RTVI_VLM_KAFKA_ENABLED=true

`POST /v1/generate_captions_alerts` — Generate VLM captions (and alerts) for video/stream

`DELETE /v1/generate_captions_alerts/{stream_id}` — Stop caption generation for a live stream

`POST /v1/files` — Upload a media file (multipart)

`GET /v1/files?purpose=vision` — List uploaded files

`GET /v1/files/{file_id}` — File metadata

`GET /v1/files/{file_id}/content` — Download original file content

`DELETE /v1/files/{file_id}` — Delete file (releases asset storage)

`POST /v1/streams/add` — Register one or more RTSP streams

`GET /v1/streams/get-stream-info` — List active streams

`DELETE /v1/streams/delete/{stream_id}` — Remove a single stream

`DELETE /v1/streams/delete-batch` — Remove many (`{"stream_ids":[...]}`)

`POST /v1/chat/completions` — OpenAI-compatible chat (text + multimodal)

`POST /v1/completions` — OpenAI-compatible legacy completions

`GET /v1/version` — `{ "version": "3.1.0-..." }`

`GET /v1/license` — license text

`GET /v1/manifest` — NIM manifest

`GET /v1/health/live` · `GET /v1/health/ready` — NIM-style probes

`GET /v1/models` — List loaded VLMs: `{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/metadata` — Service metadata (build, release, image tag)

`GET /v1/metrics` — Prometheus metrics (plain text)

`GET /v1/ready` · `GET /v1/live` · `GET /v1/startup` — Kubernetes-style probes

`POST /v1/generate_captions_alerts` — Generate VLM captions (and alerts) for video/stream

`DELETE /v1/generate_captions_alerts/{stream_id}` — Stop caption generation for a live stream

`POST /v1/files` — Upload a media file (multipart)

`GET /v1/files?purpose=vision` — List uploaded files

`GET /v1/files/{file_id}` — File metadata

`GET /v1/files/{file_id}/content` — Download original file content

`DELETE /v1/files/{file_id}` — Delete file (releases asset storage)

`POST /v1/streams/add` — Register one or more RTSP streams

`GET /v1/streams/get-stream-info` — List active streams

`DELETE /v1/streams/delete/{stream_id}` — Remove a single stream

`DELETE /v1/streams/delete-batch` — Remove many (`{"stream_ids":[...]}`)

`POST /v1/chat/completions` — OpenAI-compatible chat (text + multimodal)

`POST /v1/completions` — OpenAI-compatible legacy completions

`GET /v1/version` — `{ "version": "3.1.0-..." }`

`GET /v1/license` — license text

`GET /v1/manifest` — NIM manifest

`GET /v1/health/live` · `GET /v1/health/ready` — NIM-style probes

`GET /v1/models` — List loaded VLMs: `{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/metadata` — Service metadata (build, release, image tag)

`GET /v1/metrics` — Prometheus metrics (plain text)

`GET /v1/ready` · `GET /v1/live` · `GET /v1/startup` — Kubernetes-style probes