Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

$pwd:

deploy

Name: Deploy
Author: NVIDIA-AI-Blueprints

// Deploy, debug, or tear down any VSS profile using a compose-centric workflow — config (dry-run) with env overrides, review resolved compose, then compose up. Use this skill when the user says "deploy vss", "deploy `profile`", "debug deploy", "verify deployment", or "why is my vss deploy broken".

Executar no Manus

$ git log --oneline --stat

stars:1.481

forks:315

updated:22 de maio de 2026 às 20:47

Explorador de arquivos

19 arquivos

SKILL.md

readonly

related-skills.json

mesmo repositório

rt-vlm.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Use this skill when working with the RTVI VLM or RT-VLM microservice API on VSS 3.1. Generate dense captions and alerts for stored video files and live RTSP streams via `/v1/generate_captions_alerts`; upload media via `/v1/files`; add and remove live streams with `/v1/streams/add` and `/v1/streams/delete/{stream_id}`; call OpenAI-compatible `/v1/chat/completions`; consume Kafka caption, incident, and error topics; or debug rtvi-vlm responses. For deployment, read `references/deploy-rt-vlm-service.md` first.

2026-05-071.5k

video-summarization.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Summarize a video by calling the VLM NIM or the Long Video Summarization (LVS) microservice directly. For short videos (under 60s) call the VLM's OpenAI-compatible chat completions endpoint; for long videos (60s or longer) call the LVS microservice. Use when asked to summarize a video, describe what happens in a video, analyze a recording, call or debug LVS summarize/model/health/recommended-config/metrics endpoints, or configure and troubleshoot the LVS service that backs long-video summarization.

2026-05-071.5k

vios.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Query VIOS REST APIs: sensor list, recording timelines, video clip extraction, snapshot capture, add/delete sensors and streams

2026-05-071.5k

alerts.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Manage and monitor VSS alerts after the alerts profile is deployed. The deployment's mode (CV vs VLM real-time) is fixed at deploy time and determines the workflow — start/stop real-time alerts via the VSS Agent on a VLM deployment, onboard CV alerts by adding RTSP streams to VIOS on a CV deployment, query incidents, customize verifier prompts. Use when asked to start/stop a real-time alert, check or list alerts, add a camera, use a sample video for alerts, customize alert prompts, or view verdicts.

2026-05-071.5k

report.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Produce video analysis reports by discovering the deployed VSS agent, querying POST /generate for a timestamped captioned summary of the clip, then formatting the agent reply as the standard Video Analysis Report markdown.

2026-05-071.5k

video-analytics.md

from "NVIDIA-AI-Blueprints/video-search-and-summarization"

Query video analytics data and metrics from Elastic search via the VA-MCP server (port 9901). This includes incidents, alerts, sensor data, and metrics. Use for any question about violations, alerts, incidents, object counts, speeds, occupancy, or anything that requires looking up recorded events. This is the primary way to answer a question that requires incidents, alerts and other metrics such as people counts and violations.

2026-05-071.5k

package.json

"author": "NVIDIA-AI-Blueprints"

"repository": "NVIDIA-AI-Blueprints/video-search-and-summarization"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Administradores de redes e sistemas de computadorInformática e Matemática15-1244L4

name	deploy
description	Deploy, debug, or tear down any VSS profile using a compose-centric workflow — config (dry-run) with env overrides, review resolved compose, then compose up. Use this skill when the user says "deploy vss", "deploy `profile`", "debug deploy", "verify deployment", or "why is my vss deploy broken".
license	Apache-2.0
metadata	{"version":"3.1.0","github-url":"https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization","tags":"nvidia blueprint deployment"}

VSS Deploy

Deploy any VSS profile using a compose-centric workflow: build env overrides, generate resolved compose (dry-run), review, then deploy. Replaces direct dev-profile.sh execution with validated, auditable steps.

Profile Routing

User says	Profile	Reference
"deploy vss" / "deploy base"	`base`	`references/base.md`
"deploy alerts" / "alert verification" / "real-time alerts"	`alerts`	`references/alerts.md`
"deploy for incident report"	`alerts`	`references/alerts.md`
"deploy lvs" / "video summarization"	`lvs`	`references/lvs.md`
"deploy search" / "video search"	`search`	`references/search.md`

Edge hardware routing (DGX Spark, AGX/IGX Thor): see references/edge.md for the 4B-LLM recipe (config_edge.yml + standalone vLLM on port 30081). Edge platforms share a single unified-memory GPU between LLM and VLM, so the Nemotron Edge 4B is the default and the Nemotron Nano 9B v2 FP8 is an option when memory allows.

When to Use

Deploy VSS / start VSS / bring up a profile
Deploy a specific profile (base, alerts, lvs, search)
Do a dry-run / preview what will be deployed
Change deployment config (hardware, LLM mode, GPU assignment)
Tear down a running deployment
Debug or verify an existing deployment (see Debugging a Deployment)

How it works

Run docker compose commands directly on the host:

# 1. Apply env overrides to the profile .env file
# 2. docker compose --env-file .env config > resolved.yml   (dry-run)
# 3. Review resolved.yml
# 4. docker compose -f resolved.yml up -d

Before Deploying

Repo path — find video-search-and-summarization/ on disk. Check TOOLS.md if available.
NGC CLI & API key — see references/ngc.md. Check $NGC_CLI_API_KEY is set.
System prerequisites (GPU VRAM, driver, Docker, NVIDIA Container Toolkit) — canonical reference is the VSS prerequisites page. That page lists supported hardware, per-profile GPU requirements, and the minimum driver/CUDA version per NIM. Read it and pick the LLM/VLM placement that fits the host — don't guess thresholds from this skill.

Pre-flight Check

Run before every deploy. Do not proceed if any check fails.

# 1. GPU visible
nvidia-smi --query-gpu=index,name --format=csv,noheader

# 2. NVIDIA runtime in Docker
docker info 2>/dev/null | grep -i "runtimes"

# 3. NVIDIA runtime works end-to-end
docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5

If check 2 or 3 fails, see references/prerequisites.md.

Deployment Flow

Always follow this sequence. Never skip the dry-run.

Step 0 — Tear down any existing deployment

If a deployment already exists, tear it down first. Full procedure (resolved.yml-driven path, container-name catch-all patterns covering dev-profile compose files, why leftovers cause /sensor/list 502s) lives in references/teardown.md.

If a resolved.yml from a prior deploy exists, prefer it — it

knows about all compose-profile services that were brought up.

if [ -f "$REPO/deployments/resolved.yml" ]; then docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans fi

Catch-all: remove every VSS-stack container the dev-profile compose

files bring up. Without this, leftovers from a prior deploy linger

(especially the *-smc set, which the alerts compose profile shares

with the *-dev set on host networking and port 30000) and either:

- bind ports the new deploy needs → second sensor-ms fails to bind

→ /sensor/list returns 502 (issue #151), or

- pass the new deploy's container-name health checks while serving

stale data from the prior deploy's DB.

The patterns below cover everything declared in

deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,

deployments/agents/, deployments/proxy/, and the dev-profile-*

compose files.

docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f


If this is the host's first deploy, the `docker compose down`
line is a no-op (exit 0 with no containers to stop) — safe to run
unconditionally.

### Step 1 — Gather context

Discover what's available on the host and cross-reference with the
[VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html)
to choose a deployment shape that fits.

| Value | How to determine |
|---|---|
| **Profile** | Match user intent to routing table above. Default: `base` |
| **Repo path** | Find `video-search-and-summarization/` on disk |
| **Hardware** | `nvidia-smi --query-gpu=name,memory.total --format=csv,noheader` → look up per-GPU VRAM against the prerequisites page |
| **LLM/VLM placement** | Pick `local_shared`, `local`, or `remote` per LLM/VLM based on available GPUs + `$LLM_REMOTE_URL` / `$VLM_REMOTE_URL` / `$NGC_CLI_API_KEY`. If no combination on this host satisfies the prerequisites, stop and report the blocker instead of silently picking another shape. |
| **API keys** | `NGC_CLI_API_KEY` for local NIMs, `NVIDIA_API_KEY` for remote |
| **Host IP** | `hostname -I \| awk '{print $1}'` |

**Hardware profile mapping:**

| GPU name contains | HARDWARE_PROFILE | Recommended LLM path |
|---|---|---|
| H100 | `H100` | Nano 9B v2 (NIM) |
| L40S | `L40S` | Nano 9B v2 (NIM) |
| RTX 6000 Ada, RTX PRO 6000 | `RTXPRO6000BW` | Nano 9B v2 (NIM) |
| GB10 (DGX Spark) | `DGX-SPARK` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| IGX | `IGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| AGX | `AGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| Other | `OTHER` | — |

**Minimum GPU count per (profile × mode × platform).** Canonical source
is the [VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html);
reproduced here so the skill can fail fast when the host is too small:

| Profile | Mode | H100 / RTX PRO 6000 (Blackwell) | L40S | DGX-Spark / IGX-Thor / AGX-Thor |
|---|---|---|---|---|
| `base` | shared (`local_shared` LLM + VLM) | **1** | — (48 GB/GPU too small) | **1** (Edge 4B + VLM, unified memory) |
| `base` | dedicated (`local` LLM + VLM) | **2** | **2** | — |
| `base` | `remote-llm` | **1** (VLM local) | **1** (VLM local) | **1** (remote LLM only) |
| `base` | `remote-vlm` | **1** (LLM local) | **1** (LLM local) | — |
| `base` | `remote-all` | **0** | **0** | **0** |
| `lvs` | shared | **1** | — | - |
| `lvs` | dedicated | **2** | **2** | — |
| `lvs` | `remote-llm/vlm` | 1 | 1 | - |
| `lvs` | `remote-all` | 0 | 0 | - |
| `alerts` (verification / CV) | shared | **2**  | — | — |
| `alerts` (verification / CV) | dedicated | **3** | **3**  | — |
| `alerts` (verification / CV) | `remote-all` | 1 | 1 | 1 |
| `alerts` (verification / CV) | `remote-llm/vlm` | 2 | 2 | 1 |
| `alerts` (real-time / VLM) | shared | **2** | — | — |
| `alerts` (real-time / VLM) | dedicated | **3** | **3**  | — |
| `alerts` (real-time / VLM) | `remote-llm` | 2 | 2 | 1 |
| `search` | shared | **2** | — | - |
| `search` | dedicated | **3** | **3**  | — |
| `search` | `remote-*` | **2**  | **2** | - |

A few hard rules encoded in the table:

- **L40S can't do `shared`.** 48 GB is not enough VRAM for LLM + VLM
  on a single GPU. Fall back to `dedicated` or a `remote-*` mode.
- **L40S needs +1 GPU for alerts / search vs H100** because the
  shared-on-one-GPU trick doesn't work — RT-CV / Embed1 must take
  their own GPU, and LLM+VLM still need a second.
- **DGX-Spark / Thor are early-access for most profiles.** Only
  `base` + `lvs` are expected to fully land locally; `alerts` /
  `search` currently require a remote LLM. See
  [`references/edge.md`](references/edge.md).

If the host's (GPU count × VRAM) combination doesn't appear above,
**stop and report the blocker** — don't silently pick a different
mode.

> **Edge shared mode requires Edge 4B + `HF_TOKEN`.** On DGX Spark and AGX/IGX
> Thor, both LLM and VLM must fit in unified memory, AND the standard
> `nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1` image has a broken arm64
> manifest. You must run `NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8` as a
> standalone vLLM container on port 30081 with the agent pointed at it via
> `--use-remote-llm`. Full recipe and the mandatory `HF_TOKEN` verification
> step are in [`references/edge.md`](references/edge.md).

### Step 1b — Prepare the data directory

The data directory layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in [`references/data-directory.md`](references/data-directory.md). Read that file before deploying for the first time on a host or when changing profiles.
# Profile-specific subdirs:
#   alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"
#   search → mkdir -p "$DATA/models"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
# If you created $DATA/models above, also: chmod -R 777 "$DATA/models"

FORBIDDEN: chown -R ubuntu:ubuntu $MDX_DATA_DIR (or any recursive chown).

This is "good housekeeping" to a shell-admin instinct but is the deploy- breaking command in this stack. You will observe a "healthy" deploy (containers Up, endpoints 200) while the video pipeline is silently broken. Use chmod -R 777 on the specific subdirs above — nothing else.

Known per-container uid gotchas (each uses a bind mount under $DATA):

Container	Image	Runs as	Mount path	Symptom if permissions wrong
`centralizedb-dev`	postgres:17.6-alpine	uid 70	`$DATA/data_log/vst/postgres/db`	Can't read own PGDATA → VST `sensor_details` query fails → uploaded videos never appear in `/vst/api/v1/sensor/streams` → warehouse E2E check returns empty
`mdx-redis`	redis:8.2.2-alpine	uid 999	`$DATA/data_log/redis/log`, `/redis/data`	"Can't open the log file: Permission denied" → redis dies → `envoy-streamprocessing` dies (needs Redis Lua script) → stream pipeline broken
`elasticsearch`	elasticsearch	uid 1000	`$DATA/data_log/elastic/{data,logs}`	"AccessDeniedException" on startup → ES refuses to start
`vst` / `sensor-ms-dev`	vst	uid 1000	`$DATA/data_log/vst/*` (videos, clips)	403 on ingest or stream write

chmod -R 777 $DATA/data_log covers all of these. Do NOT chown them to individual uids — containers that init their own dirs on first start (like postgres) will then re-chown to their uid and a later chown back to ubuntu breaks them.

If postgres is already broken (common when redeploying without a clean data-dir):

sudo rm -rf "$DATA/data_log/vst/postgres"  # postgres re-initializes on next start
docker restart centralizedb-dev

Step 1c — If deploying on Brev, set up secure-link env vars

Brev-specific env vars (BREV_ENV_ID, secure-link patterns) are documented in references/brev.md.

Step 2 — Build env_overrides

Produce an env_overrides dict from the user request and the gathered context: choose remote/local LLM/VLM, set credentials, point at endpoints, set platform-specific flags. The full mapping (every override key, when it applies, defaults, profile-specific differences) lives in references/env-overrides.md.

Step 3 — Config / dry-run

Env file location: <repo>/deployments/developer-workflow/dev-profile-<profile>/.env

This is the authoritative .env. Every verifier, healthcheck, and post-deploy tool reads from this path. When you apply env overrides (from Step 2 or from the user's prompt), write them directly to this file — not to generated.env.

generated.env is a scratchpad that dev-profile.sh produces during its own internal flow; it is NOT read by the verifier and is wiped on the next invocation. An agent that uses dev-profile.sh as a one-shot deploy but leaves the base .env untouched will silently fail env checks even when the stack comes up cleanly. If you used dev-profile.sh and see generated.env on disk, copy its key/value lines back into the base .env, or re-apply your sed commands against the base .env after the fact. The base .env is the source of truth.

REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env

# Read current .env, apply overrides, write back
# (read lines, update matching keys, append new keys, write)

# Resolve compose
cd $REPO/deployments
docker compose --env-file $ENV_FILE config > resolved.yml

The resolved YAML is saved to <repo>/deployments/resolved.yml.

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

Unexpanded ${VAR} tokens in resolved.yml mean compose did not see those env values. Diagnostic procedure and common culprits live in references/troubleshooting.md.

Step 4 — Review

Show the user a summary of what will be deployed:

Profile name and hardware
LLM/VLM models and mode (local/remote/local_shared)
Services that will start
GPU device assignment
Key endpoints (UI port, agent port)

Ask: "Looks good — deploy now?" and wait for confirmation before Step 5.

Exception — autonomous mode. If the user's request already asks you to run autonomously (e.g. "deploy X autonomously", "run without confirmation", "non-interactive"), skip the confirmation prompt and proceed straight to Step 5. This path exists so automated eval / CI invocations don't hang waiting for a human reply they'll never get. In all other cases, a human must approve.

Step 5 — Deploy

cd $REPO/deployments
docker compose -f resolved.yml up -d

Do NOT use --force-recreate on retries. It destroys already-warm NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture per NIM. If the previous up -d partially failed, fix the root cause (usually perms or an env typo) and just re-run up -d — Docker will re-create only the containers whose config changed or that are down.

Deploy takes ~10-20 min on first run (image pulls + model downloads). Monitor:

# Container status
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

# Logs for a specific service
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>

Deploy is complete when all mdx-* containers show Up status.

Step 6 — Report endpoints

Profile	Agent UI	REST API	Other
base	`:3000`	`:8000` (Swagger at `/docs`)	—
alerts	`:3000`	`:8000`	VIOS dashboard `:30888/vst/`
lvs	`:3000`	`:8000`	—
search	`:3000`	`:8000`	—

Use workflow skills after deployment:

alerts / incident-report → alert management and incident queries
video-search → semantic video search
video-summarization → long video summarization
vios → camera/stream management via VIOS
video-analytics → Elasticsearch queries

Tear Down

cd $REPO/deployments
docker compose -f resolved.yml down

Debugging a Deployment

Use this workflow when the user asks to "debug the deploy", "verify it's working", "why is the agent not responding", or similar. The goal is to confirm the full video-ingestion-to-agent-answer path, not just that containers are "Up".

Each profile reference doc (e.g. references/base.md) has a Debugging section listing the exact commands to run for that profile.

Quick checks (all profiles)

# 1. All expected containers Up
docker ps --format 'table {{.Names}}\t{{.Status}}'

# 2. Agent API + UI responding
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK"
curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"

# 3. VLM NIM responding (base/lvs profiles)
curl -sf http://localhost:30082/v1/models | python3 -m json.tool

# 4. LLM NIM responding
curl -sf http://localhost:30081/v1/models | python3 -m json.tool

End-to-end video sanity check

After the quick checks above pass, drive a real query through the agent — e.g. ask it over the REST API or UI to describe a video you've uploaded to VST. If the agent returns a non-empty answer, the upload → ingest → inference → reply path is healthy. If it fails, docker logs vss-agent shows which stage tripped.

Troubleshooting

unknown or invalid runtime name: nvidia → NVIDIA Container Toolkit not installed or Docker not restarted. See references/prerequisites.md.
NGC auth error → re-export NGC_CLI_API_KEY or follow references/ngc.md.
GPU not detected → run sudo modprobe nvidia && sudo modprobe nvidia_uvm, then retry.
docker compose up fails with "no resolved.yml" → run the dry-run (docker compose config > resolved.yml, Step 3) first.
cosmos-reason2-8b crash → must redeploy the full stack (known issue: NIM cannot restart alone).

deploy

Mais deste repositório

Mais deste repositório

VSS Deploy

Profile Routing

When to Use

How it works

Before Deploying

Pre-flight Check

Deployment Flow

Step 0 — Tear down any existing deployment

If a resolved.yml from a prior deploy exists, prefer it — it

knows about all compose-profile services that were brought up.

Catch-all: remove every VSS-stack container the dev-profile compose

files bring up. Without this, leftovers from a prior deploy linger

(especially the *-smc set, which the alerts compose profile shares

with the *-dev set on host networking and port 30000) and either:

- bind ports the new deploy needs → second sensor-ms fails to bind

→ /sensor/list returns 502 (issue #151), or

- pass the new deploy's container-name health checks while serving

stale data from the prior deploy's DB.

The patterns below cover everything declared in

deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,

deployments/agents/, deployments/proxy/, and the dev-profile-*

compose files.

Step 1c — If deploying on Brev, set up secure-link env vars

Step 2 — Build env_overrides

Step 3 — Config / dry-run

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

Step 4 — Review

Step 5 — Deploy

Step 6 — Report endpoints

Tear Down

Debugging a Deployment

Quick checks (all profiles)

End-to-end video sanity check

Troubleshooting

VSS Deploy

Profile Routing

When to Use

How it works

Before Deploying

Pre-flight Check

Deployment Flow

Step 0 — Tear down any existing deployment

If a resolved.yml from a prior deploy exists, prefer it — it

knows about all compose-profile services that were brought up.

Catch-all: remove every VSS-stack container the dev-profile compose

files bring up. Without this, leftovers from a prior deploy linger

(especially the *-smc set, which the alerts compose profile shares

with the *-dev set on host networking and port 30000) and either:

- bind ports the new deploy needs → second sensor-ms fails to bind

→ /sensor/list returns 502 (issue #151), or

- pass the new deploy's container-name health checks while serving

stale data from the prior deploy's DB.

The patterns below cover everything declared in

deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,

deployments/agents/, deployments/proxy/, and the dev-profile-*

compose files.

Step 1c — If deploying on Brev, set up secure-link env vars

Step 2 — Build env_overrides

Step 3 — Config / dry-run

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

Step 4 — Review

Step 5 — Deploy

Step 6 — Report endpoints

Tear Down

Debugging a Deployment

Quick checks (all profiles)

End-to-end video sanity check

Troubleshooting