| name | deploy |
| description | Deploy, debug, or tear down any VSS profile using a compose-centric workflow — config (dry-run) with env overrides, review resolved compose, then compose up. Use this skill when the user says "deploy vss", "deploy `profile`", "debug deploy", "verify deployment", or "why is my vss deploy broken". |
| license | Apache-2.0 |
| metadata | {"version":"3.1.0","github-url":"https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization","tags":"nvidia blueprint deployment"} |
VSS Deploy
Deploy any VSS profile using a compose-centric workflow: build env overrides, generate resolved compose (dry-run), review, then deploy. Replaces direct dev-profile.sh execution with validated, auditable steps.
Profile Routing
| User says | Profile | Reference |
|---|
| "deploy vss" / "deploy base" | base | references/base.md |
| "deploy alerts" / "alert verification" / "real-time alerts" | alerts | references/alerts.md |
| "deploy for incident report" | alerts | references/alerts.md |
| "deploy lvs" / "video summarization" | lvs | references/lvs.md |
| "deploy search" / "video search" | search | references/search.md |
Edge hardware routing (DGX Spark, AGX/IGX Thor): see references/edge.md
for the 4B-LLM recipe (config_edge.yml + standalone vLLM on port 30081). Edge
platforms share a single unified-memory GPU between LLM and VLM, so the
Nemotron Edge 4B is the default and the Nemotron Nano 9B v2 FP8 is an option
when memory allows.
When to Use
- Deploy VSS / start VSS / bring up a profile
- Deploy a specific profile (base, alerts, lvs, search)
- Do a dry-run / preview what will be deployed
- Change deployment config (hardware, LLM mode, GPU assignment)
- Tear down a running deployment
- Debug or verify an existing deployment (see Debugging a Deployment)
How it works
Run docker compose commands directly on the host:
Before Deploying
- Repo path — find
video-search-and-summarization/ on disk. Check TOOLS.md if available.
- NGC CLI & API key — see
references/ngc.md. Check $NGC_CLI_API_KEY is set.
- System prerequisites (GPU VRAM, driver, Docker, NVIDIA Container Toolkit) — canonical reference is the VSS prerequisites page. That page lists supported hardware, per-profile GPU requirements, and the minimum driver/CUDA version per NIM. Read it and pick the LLM/VLM placement that fits the host — don't guess thresholds from this skill.
Pre-flight Check
Run before every deploy. Do not proceed if any check fails.
nvidia-smi --query-gpu=index,name --format=csv,noheader
docker info 2>/dev/null | grep -i "runtimes"
docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5
If check 2 or 3 fails, see references/prerequisites.md.
Deployment Flow
Always follow this sequence. Never skip the dry-run.
Step 0 — Tear down any existing deployment
If a deployment already exists, tear it down first. Full procedure (resolved.yml-driven path, container-name catch-all patterns covering dev-profile compose files, why leftovers cause /sensor/list 502s) lives in references/teardown.md.
If a resolved.yml from a prior deploy exists, prefer it — it
knows about all compose-profile services that were brought up.
if [ -f "$REPO/deployments/resolved.yml" ]; then
docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans
fi
Catch-all: remove every VSS-stack container the dev-profile compose
files bring up. Without this, leftovers from a prior deploy linger
(especially the *-smc set, which the alerts compose profile shares
with the *-dev set on host networking and port 30000) and either:
- bind ports the new deploy needs → second sensor-ms fails to bind
→ /sensor/list returns 502 (issue #151), or
- pass the new deploy's container-name health checks while serving
stale data from the prior deploy's DB.
The patterns below cover everything declared in
deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,
deployments/agents/, deployments/proxy/, and the dev-profile-*
compose files.
docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f
If this is the host's first deploy, the `docker compose down`
line is a no-op (exit 0 with no containers to stop) — safe to run
unconditionally.
### Step 1 — Gather context
Discover what's available on the host and cross-reference with the
[VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html)
to choose a deployment shape that fits.
| Value | How to determine |
|---|---|
| **Profile** | Match user intent to routing table above. Default: `base` |
| **Repo path** | Find `video-search-and-summarization/` on disk |
| **Hardware** | `nvidia-smi --query-gpu=name,memory.total --format=csv,noheader` → look up per-GPU VRAM against the prerequisites page |
| **LLM/VLM placement** | Pick `local_shared`, `local`, or `remote` per LLM/VLM based on available GPUs + `$LLM_REMOTE_URL` / `$VLM_REMOTE_URL` / `$NGC_CLI_API_KEY`. If no combination on this host satisfies the prerequisites, stop and report the blocker instead of silently picking another shape. |
| **API keys** | `NGC_CLI_API_KEY` for local NIMs, `NVIDIA_API_KEY` for remote |
| **Host IP** | `hostname -I \| awk '{print $1}'` |
**Hardware profile mapping:**
| GPU name contains | HARDWARE_PROFILE | Recommended LLM path |
|---|---|---|
| H100 | `H100` | Nano 9B v2 (NIM) |
| L40S | `L40S` | Nano 9B v2 (NIM) |
| RTX 6000 Ada, RTX PRO 6000 | `RTXPRO6000BW` | Nano 9B v2 (NIM) |
| GB10 (DGX Spark) | `DGX-SPARK` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| IGX | `IGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| AGX | `AGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| Other | `OTHER` | — |
**Minimum GPU count per (profile × mode × platform).** Canonical source
is the [VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html);
reproduced here so the skill can fail fast when the host is too small:
| Profile | Mode | H100 / RTX PRO 6000 (Blackwell) | L40S | DGX-Spark / IGX-Thor / AGX-Thor |
|---|---|---|---|---|
| `base` | shared (`local_shared` LLM + VLM) | **1** | — (48 GB/GPU too small) | **1** (Edge 4B + VLM, unified memory) |
| `base` | dedicated (`local` LLM + VLM) | **2** | **2** | — |
| `base` | `remote-llm` | **1** (VLM local) | **1** (VLM local) | **1** (remote LLM only) |
| `base` | `remote-vlm` | **1** (LLM local) | **1** (LLM local) | — |
| `base` | `remote-all` | **0** | **0** | **0** |
| `lvs` | shared | **1** | — | - |
| `lvs` | dedicated | **2** | **2** | — |
| `lvs` | `remote-llm/vlm` | 1 | 1 | - |
| `lvs` | `remote-all` | 0 | 0 | - |
| `alerts` (verification / CV) | shared | **2** | — | — |
| `alerts` (verification / CV) | dedicated | **3** | **3** | — |
| `alerts` (verification / CV) | `remote-all` | 1 | 1 | 1 |
| `alerts` (verification / CV) | `remote-llm/vlm` | 2 | 2 | 1 |
| `alerts` (real-time / VLM) | shared | **2** | — | — |
| `alerts` (real-time / VLM) | dedicated | **3** | **3** | — |
| `alerts` (real-time / VLM) | `remote-llm` | 2 | 2 | 1 |
| `search` | shared | **2** | — | - |
| `search` | dedicated | **3** | **3** | — |
| `search` | `remote-*` | **2** | **2** | - |
A few hard rules encoded in the table:
- **L40S can't do `shared`.** 48 GB is not enough VRAM for LLM + VLM
on a single GPU. Fall back to `dedicated` or a `remote-*` mode.
- **L40S needs +1 GPU for alerts / search vs H100** because the
shared-on-one-GPU trick doesn't work — RT-CV / Embed1 must take
their own GPU, and LLM+VLM still need a second.
- **DGX-Spark / Thor are early-access for most profiles.** Only
`base` + `lvs` are expected to fully land locally; `alerts` /
`search` currently require a remote LLM. See
[`references/edge.md`](references/edge.md).
If the host's (GPU count × VRAM) combination doesn't appear above,
**stop and report the blocker** — don't silently pick a different
mode.
> **Edge shared mode requires Edge 4B + `HF_TOKEN`.** On DGX Spark and AGX/IGX
> Thor, both LLM and VLM must fit in unified memory, AND the standard
> `nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1` image has a broken arm64
> manifest. You must run `NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8` as a
> standalone vLLM container on port 30081 with the agent pointed at it via
> `--use-remote-llm`. Full recipe and the mandatory `HF_TOKEN` verification
> step are in [`references/edge.md`](references/edge.md).
### Step 1b — Prepare the data directory
The data directory layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in [`references/data-directory.md`](references/data-directory.md). Read that file before deploying for the first time on a host or when changing profiles.
# Profile-specific subdirs:
# alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"
# search → mkdir -p "$DATA/models"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
# If you created $DATA/models above, also: chmod -R 777 "$DATA/models"
FORBIDDEN: chown -R ubuntu:ubuntu $MDX_DATA_DIR (or any recursive chown).
This is "good housekeeping" to a shell-admin instinct but is the deploy-
breaking command in this stack. You will observe a "healthy" deploy
(containers Up, endpoints 200) while the video pipeline is silently broken.
Use chmod -R 777 on the specific subdirs above — nothing else.
Known per-container uid gotchas (each uses a bind mount under $DATA):
| Container | Image | Runs as | Mount path | Symptom if permissions wrong |
|---|
centralizedb-dev | postgres:17.6-alpine | uid 70 | $DATA/data_log/vst/postgres/db | Can't read own PGDATA → VST sensor_details query fails → uploaded videos never appear in /vst/api/v1/sensor/streams → warehouse E2E check returns empty |
mdx-redis | redis:8.2.2-alpine | uid 999 | $DATA/data_log/redis/log, /redis/data | "Can't open the log file: Permission denied" → redis dies → envoy-streamprocessing dies (needs Redis Lua script) → stream pipeline broken |
elasticsearch | elasticsearch | uid 1000 | $DATA/data_log/elastic/{data,logs} | "AccessDeniedException" on startup → ES refuses to start |
vst / sensor-ms-dev | vst | uid 1000 | $DATA/data_log/vst/* (videos, clips) | 403 on ingest or stream write |
chmod -R 777 $DATA/data_log covers all of these. Do NOT chown them to
individual uids — containers that init their own dirs on first start (like
postgres) will then re-chown to their uid and a later chown back to ubuntu
breaks them.
If postgres is already broken (common when redeploying without a clean
data-dir):
sudo rm -rf "$DATA/data_log/vst/postgres"
docker restart centralizedb-dev
Step 1c — If deploying on Brev, set up secure-link env vars
Brev-specific env vars (BREV_ENV_ID, secure-link patterns) are documented in references/brev.md.
Step 2 — Build env_overrides
Produce an env_overrides dict from the user request and the gathered context: choose remote/local LLM/VLM, set credentials, point at endpoints, set platform-specific flags. The full mapping (every override key, when it applies, defaults, profile-specific differences) lives in references/env-overrides.md.
Step 3 — Config / dry-run
Env file location: <repo>/deployments/developer-workflow/dev-profile-<profile>/.env
This is the authoritative .env. Every verifier, healthcheck, and
post-deploy tool reads from this path. When you apply env overrides
(from Step 2 or from the user's prompt), write them directly to this
file — not to generated.env.
generated.env is a scratchpad that dev-profile.sh produces during
its own internal flow; it is NOT read by the verifier and is wiped on
the next invocation. An agent that uses dev-profile.sh as a one-shot
deploy but leaves the base .env untouched will silently fail env
checks even when the stack comes up cleanly. If you used
dev-profile.sh and see generated.env on disk, copy its key/value
lines back into the base .env, or re-apply your sed commands
against the base .env after the fact. The base .env is the source
of truth.
REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env
cd $REPO/deployments
docker compose --env-file $ENV_FILE config > resolved.yml
The resolved YAML is saved to <repo>/deployments/resolved.yml.
Step 3b — Verify resolved.yml has no unexpanded ${...} tokens
Unexpanded ${VAR} tokens in resolved.yml mean compose did not see those env values. Diagnostic procedure and common culprits live in references/troubleshooting.md.
Step 4 — Review
Show the user a summary of what will be deployed:
- Profile name and hardware
- LLM/VLM models and mode (local/remote/local_shared)
- Services that will start
- GPU device assignment
- Key endpoints (UI port, agent port)
Ask: "Looks good — deploy now?" and wait for confirmation before Step 5.
Exception — autonomous mode. If the user's request already asks
you to run autonomously (e.g. "deploy X autonomously", "run without
confirmation", "non-interactive"), skip the confirmation prompt and
proceed straight to Step 5. This path exists so automated eval /
CI invocations don't hang waiting for a human reply they'll never
get. In all other cases, a human must approve.
Step 5 — Deploy
cd $REPO/deployments
docker compose -f resolved.yml up -d
Do NOT use --force-recreate on retries. It destroys already-warm
NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture
per NIM. If the previous up -d partially failed, fix the root cause
(usually perms or an env typo) and just re-run up -d — Docker will
re-create only the containers whose config changed or that are down.
Deploy takes ~10-20 min on first run (image pulls + model downloads). Monitor:
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>
Deploy is complete when all mdx-* containers show Up status.
Step 6 — Report endpoints
| Profile | Agent UI | REST API | Other |
|---|
| base | :3000 | :8000 (Swagger at /docs) | — |
| alerts | :3000 | :8000 | VIOS dashboard :30888/vst/ |
| lvs | :3000 | :8000 | — |
| search | :3000 | :8000 | — |
Use workflow skills after deployment:
- alerts / incident-report → alert management and incident queries
- video-search → semantic video search
- video-summarization → long video summarization
- vios → camera/stream management via VIOS
- video-analytics → Elasticsearch queries
Tear Down
cd $REPO/deployments
docker compose -f resolved.yml down
Debugging a Deployment
Use this workflow when the user asks to "debug the deploy", "verify it's working",
"why is the agent not responding", or similar. The goal is to confirm the full
video-ingestion-to-agent-answer path, not just that containers are "Up".
Each profile reference doc (e.g. references/base.md) has a
Debugging section listing the exact commands to run for that profile.
Quick checks (all profiles)
docker ps --format 'table {{.Names}}\t{{.Status}}'
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK"
curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"
curl -sf http://localhost:30082/v1/models | python3 -m json.tool
curl -sf http://localhost:30081/v1/models | python3 -m json.tool
End-to-end video sanity check
After the quick checks above pass, drive a real query through the agent — e.g.
ask it over the REST API or UI to describe a video you've uploaded to VST.
If the agent returns a non-empty answer, the upload → ingest → inference →
reply path is healthy. If it fails, docker logs vss-agent shows which stage
tripped.
Troubleshooting
unknown or invalid runtime name: nvidia → NVIDIA Container Toolkit not installed or Docker not restarted. See references/prerequisites.md.
- NGC auth error → re-export
NGC_CLI_API_KEY or follow references/ngc.md.
- GPU not detected → run
sudo modprobe nvidia && sudo modprobe nvidia_uvm, then retry.
docker compose up fails with "no resolved.yml" → run the dry-run (docker compose config > resolved.yml, Step 3) first.
- cosmos-reason2-8b crash → must redeploy the full stack (known issue: NIM cannot restart alone).