| name | run |
| description | ALWAYS invoke this skill before running any sparkrun CLI commands. Never run sparkrun directly via Bash without loading this skill first. Covers launching, monitoring, stopping, and checking status of inference workloads on NVIDIA DGX Spark. |
Provides complete reference for launching, monitoring, and stopping LLM inference workloads using sparkrun on NVIDIA DGX Spark systems. Covers the full lifecycle: browse recipes, check VRAM fit, launch jobs, view logs, check status, stop workloads, run benchmarks, tune kernels, and manage the inference proxy.
<Use_When>
- User wants to run an LLM inference model on DGX Spark
- User wants to check status of running workloads
- User wants to stop a running inference job
- User wants to view logs from a running workload
- User wants to preview VRAM requirements before launching
- User wants to benchmark an inference workload
- User wants to tune MoE kernels for better performance
- User wants to manage the inference proxy
- User wants to monitor cluster metrics
- User asks "how do I run", "start", "launch", "deploy" a model
</Use_When>
<Do_Not_Use_When>
- User wants to install sparkrun or set up a cluster -- use the setup skill instead
- User wants to manage recipe registries or create custom recipes -- use the registry skill instead
- User is asking about sparkrun internals or development
</Do_Not_Use_When>
Run a Recipe
sparkrun run <recipe> --tp 1 --no-follow
sparkrun run <recipe> --hosts <ip> --no-follow
sparkrun run <recipe> --cluster <name> --no-follow
sparkrun run <recipe> --hosts <ip1>,<ip2>,... --no-follow
sparkrun run <recipe> --tp <N> --no-follow
sparkrun run <recipe> --port 9000 --gpu-mem 0.8 --no-follow
sparkrun run <recipe> -o max_model_len=8192 -o attention_backend=triton --no-follow
sparkrun run <recipe> --served-model-name my-model --no-follow
sparkrun run <recipe> --pp 2 --tp 2 --no-follow
sparkrun run <recipe> --max-model-len 32768 --no-follow
sparkrun run <recipe> --ensure --no-follow
sparkrun run <recipe> --restart unless-stopped --no-follow
sparkrun run <recipe> --no-rm --no-follow
sparkrun run <recipe> --transfer-mode push --no-follow
sparkrun run <recipe> --dry-run
CRITICAL: Always use --no-follow when running from an agent/skill context to avoid blocking on log streaming. Then use sparkrun cluster status or sparkrun logs separately to check on the job.
Check Status
sparkrun cluster status
sparkrun cluster status --cluster <name>
sparkrun cluster status --hosts <ip1>,<ip2>,...
sparkrun cluster check-job <recipe> --cluster <name>
sparkrun cluster check-job <cluster_id> --hosts <ip1>,<ip2>
sparkrun cluster check-job <recipe> --check-http-models
sparkrun cluster check-job <recipe> --json
Monitor Cluster Metrics
sparkrun cluster monitor --cluster <name>
sparkrun cluster monitor --cluster <name> --simple
sparkrun cluster monitor --cluster <name> --json
sparkrun cluster monitor --cluster <name> --interval 5
View Logs
sparkrun logs <recipe> --cluster <name>
sparkrun logs <recipe> --hosts <ip1>,<ip2>,...
sparkrun logs <recipe> --tp <N>
sparkrun logs <cluster_id>
sparkrun logs <cluster_id> --cluster <name>
sparkrun logs <recipe> --tail 200
Stop a Workload
sparkrun stop <recipe> --cluster <name>
sparkrun stop <recipe> --hosts <ip1>,<ip2>,...
sparkrun stop <recipe> --tp <N>
sparkrun stop <cluster_id>
sparkrun stop <cluster_id> --cluster <name>
sparkrun stop --all --cluster <name>
sparkrun stop --all --hosts <ip1>,<ip2>,...
sparkrun stop <recipe> --dry-run
Browse and Inspect Recipes
sparkrun list
sparkrun list --all
sparkrun list --registry <name>
sparkrun list --runtime vllm
sparkrun list <query>
sparkrun recipe search <query>
sparkrun recipe search <query> --registry <name> --runtime sglang
sparkrun recipe show <recipe>
sparkrun recipe show <recipe> --tp <N>
sparkrun recipe validate <recipe>
sparkrun recipe vram <recipe> --tp <N> --max-model-len 32768
sparkrun recipe export <recipe>
sparkrun recipe export <recipe> --json
sparkrun recipe export <recipe> --save out.yaml
Use sparkrun recipe search as the first attempt when looking for a particular recipe. Use sparkrun recipe show when given a specific recipe name or file -- it may not appear in search results.
Benchmark
sparkrun benchmark <recipe> --tp 1
sparkrun benchmark <recipe> --cluster <name>
sparkrun benchmark <recipe> --tp 2 --profile <profile_name>
sparkrun benchmark <recipe> --skip-run --tp 1
sparkrun benchmark <recipe> --no-stop --tp 1
sparkrun benchmark <recipe> -b depth=0,2048,4096 -b tg=32,128
sparkrun benchmark <recipe> --framework llama-benchy --timeout 3600
sparkrun benchmark <recipe> --dry-run
Kernel Tuning
sparkrun tune sglang <recipe> --hosts <ip>
sparkrun tune sglang <recipe> --cluster <name> --tp 1 --tp 2 --tp 4
sparkrun tune sglang <recipe> -H <ip> --parallel 2
sparkrun tune vllm <recipe> --hosts <ip>
sparkrun tune vllm <recipe> --cluster <name> --tp 4
Inference Proxy
sparkrun proxy start --cluster <name>
sparkrun proxy start --port 4000
sparkrun proxy status
sparkrun proxy models
sparkrun proxy models --refresh
sparkrun proxy load <recipe> --cluster <name>
sparkrun proxy unload <recipe> --cluster <name>
sparkrun proxy alias add my-model "Qwen/Qwen3-1.7B"
sparkrun proxy alias remove my-model
sparkrun proxy alias list
sparkrun proxy stop
<Tool_Usage>
All sparkrun commands are executed via the Bash tool. No MCP tools are required.
When running workloads:
- Always use
--no-follow flag with sparkrun run
- After launching, run
sparkrun cluster status to confirm containers are running
- Use the logs/stop commands from status output to manage jobs
- For monitoring, use
--simple or --json mode (TUI requires interactive terminal)
</Tool_Usage>
<Key_Options>
sparkrun run options:
| Option | Description |
|---|
--hosts, -H | Comma-separated host list |
--hosts-file | File with hosts (one per line) |
--cluster | Use a saved cluster |
--tp, --tensor-parallel | Override tensor parallelism (= node count) |
--pp, --pipeline-parallel | Override pipeline parallelism |
--port | Override serve port |
--gpu-mem | GPU memory utilization (0.0-1.0) |
--max-model-len | Override maximum model context length |
--served-model-name | Override the served model name |
--image | Override container image |
-o KEY=VALUE | Override any recipe default |
--ensure | Only launch if not already running; exit 0 if already up |
--no-rm | Don't auto-remove containers on exit |
--restart POLICY | Docker restart policy (no, always, unless-stopped, on-failure[:N]) |
--transfer-mode | Resource transfer mode (auto, local, push, delegated) |
--dry-run, -n | Show what would be done |
--no-follow | Don't attach to logs after launch |
--foreground | Run in foreground (blocking) |
</Key_Options>
<Important_Notes>
- Always use
--no-follow when running from an automated/agent context to avoid blocking
- Use
sparkrun cluster status after launching to confirm containers are running
--tp N must match the number of hosts (DGX Spark = 1 GPU per host)
sparkrun stop and sparkrun logs accept both recipe names and cluster IDs as targets
- If
--tp, --port, or --served-model-name were used during run, pass the same values to stop and logs
- Use
sparkrun show <recipe> --tp N to preview VRAM estimates before running
- Container names follow the pattern
sparkrun_{hash}_{role} where the hash is derived from runtime + model + sorted hosts + overrides
- Ctrl+C while following logs detaches safely -- it never kills the inference job
- Use
sparkrun stop --all to stop all sparkrun containers without specifying a recipe
--solo is deprecated; use --tp 1 instead
- Recipe names support
@registry/name syntax for explicit registry selection
sparkrun update upgrades sparkrun itself (if installed via uv) and updates all registries
</Important_Notes>
Task: {{ARGUMENTS}}