with one click
dstack
// dstack is an open-source control plane for GPU provisioning and orchestration across GPU clouds, Kubernetes, and on-prem clusters.
// dstack is an open-source control plane for GPU provisioning and orchestration across GPU clouds, Kubernetes, and on-prem clusters.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | dstack |
| description | dstack is an open-source control plane for GPU provisioning and orchestration across GPU clouds, Kubernetes, and on-prem clusters. |
dstack provisions and orchestrates workloads across GPU clouds, Kubernetes, and on-prem via fleets.
When to use this skill:
*.dstack.yml configurationsdstack operates through three core components:
dstack server - Can run locally, remotely, or via dstack Sky (managed)dstack CLI - Applies configurations and manages or inspects fleets, runs,
logs, events, volumes, gateways, and offers; it uses project configurations
stored in ~/.dstack/config.yml, which can be managed with dstack projectdstack configuration files - YAML files ending with .dstack.ymldstack apply shows a plan and submits configuration changes. For run
configurations, it attaches when the run reaches running by default: it
configures SSH access, forwards declared ports, and streams logs. With -d, it
submits and exits.
echo "n" | dstack apply -f <config>dstack apply -f <config> -y -ddstack ps -vdstack attach locally and share the outputCRITICAL: Never propose dstack CLI commands or YAML syntaxes that don't exist.
--help--helpNEVER do the following:
--helpdstack apply for runs without -d in automated contexts (blocks indefinitely)echo "y" | when -y flag is availabledstack <command> --help first.dstack --help # List all commands
dstack apply -h <configuration type> # Flags for apply per configuration type (dev-environment, task, service, fleet, etc)
dstack fleet --help # Fleet subcommands
dstack ps --help # Flags for ps
Commands that stream indefinitely in the foreground:
dstack attachdstack apply without -d for runsdstack ps -wAgents should avoid blocking: use -d, timeouts, or background attach. When attach is needed, run it in the background by default (nohup ...), but describe it to the user simply as "attach" unless they ask for a live foreground session. Prefer dstack ps -v and poll in a loop if the user wants to watch status.
All other commands: Use 10-60s timeout. Most complete within this range. While waiting, monitor the output - it may contain errors, warnings, or prompts requiring attention.
Confirmation handling:
dstack apply, dstack stop, dstack fleet delete require confirmation-y flag to auto-confirm when user has already approveddstack stop, always use -y after the user confirms to avoid interactive promptsecho "n" | to preview dstack apply plan without executing (avoid echo "y" |, prefer -y)Best practices:
dstack apply (unless it's an exception)-y flag to skip confirmation prompts-d)After submitting a run with -d (dev-environment, task, service), first determine whether submission failed. If the apply output shows errors (validation, no offers, etc.), stop and surface the error.
If the run was submitted, do a quick status check with dstack ps -v, then guide the user through relevant next steps:
If you need to prompt for next actions, be explicit about the dstack step and command (avoid vague questions). When speaking to the user, refer to the action as "attach" (not "background attach").
dstack ps -v every 10-20s if the user wants updates.running, attach to surface the IDE link/port forwarding/SSH alias, then ask whether to open the IDE link. Never open links without explicit approval.dstack logs for progress; attach only if full log replay is required.dstack attach runs until interrupted and blocks the terminal. Agents must avoid indefinite blocking. If a brief attach is needed, use a timeout to capture initial output (IDE link, SSH alias) and then detach.
Note: dstack attach writes SSH alias info under ~/.dstack/ssh/config (and may update ~/.ssh/config) to enable ssh <run name>, IDE connections, port forwarding, and real-time logs (dstack attach --logs). If the sandbox cannot write there, the alias will not be created.
Permissions guardrail: If dstack attach fails due to sandbox permissions, request permission escalation to run it outside the sandbox. If escalation isn’t approved or attach still fails, ask the user to run dstack attach locally and share the IDE link/SSH alias output.
Background attach (non-blocking default for agents):
nohup dstack attach <run name> --logs > /tmp/<run name>.attach.log 2>&1 & echo $! > /tmp/<run name>.attach.pid
Then read the output:
tail -n 50 /tmp/<run name>.attach.log
Offer live follow only if asked:
tail -f /tmp/<run name>.attach.log
Stop the background attach (preferred):
kill "$(cat /tmp/<run name>.attach.pid)"
If the PID file is missing, fall back to a specific match (avoid killing all attaches):
pkill -f "dstack attach <run name>"
Why this helps: it keeps the attach session alive (including port forwarding) while the agent remains usable. IDE links and SSH instructions appear in the log file -- surface them and ask whether to open the link (open "<link>" on macOS, xdg-open "<link>" on Linux) only after explicit approval.
If background attach fails in the sandbox (permissions writing ~/.dstack or ~/.ssh, timeouts), request escalation to run attach outside the sandbox. If not approved, ask the user to run attach locally and share the IDE link/SSH alias.
"Run something": When the user asks to run a workload (dev environment, task, service), use dstack apply with the appropriate configuration. Note: dstack run only supports dstack run get --json for retrieving run details -- it cannot start workloads.
"Connect to" or "open" a dev environment: If a dev environment is already running, use dstack attach <run name> --logs (agent runs it in the background by default) to surface the IDE URL (cursor://, vscode://, etc.) and SSH alias. If sandboxed attach fails, request escalation or ask the user to run attach locally and share the link.
dstack supports run configurations (dev environments, tasks, and services) and infrastructure configurations (fleets, volumes, and gateways). Configuration files can be named <name>.dstack.yml or simply .dstack.yml.
Common parameters: All run configurations (dev environments, tasks, services) support many parameters including:
repo) or mount existing repos (repos)files; see concept docs for examples)image); use docker: true if you want to use Docker from inside the container (VM-based backends only)env), often via .envrc. Secrets are supported but less common.volumes), specify disk sizeBest practices:
name property for easier managementenv section (e.g., - HF_TOKEN), not values. Recommend storing actual values in a .envrc file alongside the configuration, applied via source .envrc && dstack apply.python and image are mutually exclusive in run configurations. If image is set, do not set python.files and repos intent policyUse files and repos only when the user intends to use local/repo files inside the run.
files or repos as appropriate.files guidance:
files path is placed under the run's working_dir (default or set by user).repos + image/working directory guidance:
repos (e.g., .:/dstack/run).working_dir to the same path.dstack default images, the default working_dir is already /dstack/run.Use for: Interactive development with IDE integration (VS Code, Cursor, etc.).
type: dev-environment
name: cursor
python: "3.12"
ide: vscode
resources:
gpu: 80GB
Concept documentation | Configuration reference
Use for: Batch jobs, training runs, fine-tuning, web applications, any executable workload.
Key features: Distributed training (multi-node) and port forwarding for web apps.
type: task
name: train
python: "3.12"
env:
- HUGGING_FACE_HUB_TOKEN
commands:
- uv pip install -r requirements.txt
- uv run python train.py
ports:
- 8501 # Optional: expose ports for web apps
resources:
gpu: A100:40GB:2
Port forwarding: When you specify ports, dstack apply forwards them to localhost while attached. Use dstack attach <run name> to reconnect and restore port forwarding. The run name becomes an SSH alias (e.g., ssh <run name>) for direct access.
Distributed training: Multi-node tasks are supported (e.g., via nodes) and require fleets that support inter-node communication (see placement: cluster in fleets).
Concept documentation | Configuration reference
Use for: Deploying models or web applications as production endpoints.
Key features: OpenAI-compatible model serving, auto-scaling (RPS/queue), custom gateways with HTTPS.
type: service
name: llama31
python: "3.12"
env:
- HF_TOKEN
commands:
- uv pip install vllm
- uv run vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
port: 8000
model: meta-llama/Meta-Llama-3.1-8B-Instruct
resources:
gpu: 80GB
disk: 200GB
Service endpoints:
<server URL>/proxy/services/<project name>/<run name>/https://<run name>.<gateway domain>/auth is false, include Authorization: Bearer <user token> on service requests.model is set, service.model.base_url from dstack run get <run name> --json provides the model endpoint. For OpenAI-compatible models (the default, unless format is set otherwise), this will be service.url + /v1.curl -sS -X POST "https://<run name>.<gateway domain>/v1/chat/completions" \
-H "Authorization: Bearer <user token>" \
-H "Content-Type: application/json" \
-d '{"model":"<model name>","messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
Concept documentation | Configuration reference
Use for: Pre-provisioning infrastructure for workloads, managing on-prem GPU servers, creating auto-scaling instance pools.
type: fleet
name: my-fleet
nodes: 0..2
resources:
gpu: 24GB..
disk: 200GB
spot_policy: auto # other values: spot, on-demand
idle_duration: 5m
On-demand provisioning: When nodes is a range (e.g., 0..2), dstack creates a template and provisions instances on demand within the min/max. Use idle_duration to terminate idle instances.
Distributed workloads: Use placement: cluster for fleets intended for multi-node tasks that require inter-node networking.
SSH fleet (on-prem or pre-provisioned):
type: fleet
name: on-prem-fleet
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
hosts:
- 192.168.1.10
- 192.168.1.11
Concept documentation | Configuration reference
Use for: Persistent storage for datasets, model checkpoints, training artifacts.
type: volume
name: my-volume
backend: aws
region: us-east-1
resources:
disk: 500GB
Instance volumes (local, ephemeral, often optional):
type: dev-environment
# ... other config
volumes:
- instance_path: /dstack-cache/pip
path: /root/.cache/pip
optional: true
- instance_path: /dstack-cache/huggingface
path: /root/.cache/huggingface
optional: true
Mounting volumes: Use volumes in dev environments, tasks, and services. Network volumes persist independently; instance volumes are tied to the instance lifecycle.
Concept documentation | Configuration reference
Use for: Gateways are optional for basic service endpoints. They are required when a service uses auto-scaling or rate limits, needs HTTPS on a custom domain, requires WebSockets, or cannot work with the server proxy path prefix.
type: gateway
name: my-gateway
backend: aws
region: us-east-1
domain: example.com
Concept documentation | Configuration reference
Important behavior:
dstack apply shows a plan with estimated costs and may ask for confirmation-d), it submits and exits without attachingWorkflow for applying run configurations (dev-environment, task, service):
Show plan:
echo "n" | dstack apply -f config.dstack.yml
Display the FULL output including the offers table and cost estimate. Do NOT summarize or reformat.
Wait for user confirmation. Do NOT proceed if:
Execute (only after user confirms):
dstack apply -f config.dstack.yml -y -d
Verify apply status:
dstack ps -v
Workflow for infrastructure (fleet, volume, gateway):
Show plan:
echo "n" | dstack apply -f infra.dstack.yml
Display the FULL output. Do NOT summarize or reformat.
Wait for user confirmation.
Execute:
dstack apply -f infra.dstack.yml -y
Verify: Use dstack fleet, dstack volume, or dstack gateway respectively.
# Create/update fleet
dstack apply -f fleet.dstack.yml
# List fleets
dstack fleet
# Get fleet details
dstack fleet get my-fleet
# Get fleet details as JSON (for troubleshooting)
dstack fleet get my-fleet --json
# Delete entire fleet (use -y when user already confirmed)
dstack fleet delete my-fleet -y
# Delete specific instance from fleet (use -y when user already confirmed)
dstack fleet delete my-fleet -i <instance num> -y
# List all runs
dstack ps
# Verbose output with full details
dstack ps -v
# JSON output (for troubleshooting/scripting)
dstack ps --json
# Get specific run details as JSON
dstack run get my-run-name --json
# Attach and replay logs from start (preferred, unless asked otherwise)
dstack attach my-run-name --logs
# Attach without replaying logs (restores port forwarding + SSH only)
dstack attach my-run-name
# Stream logs (tail mode)
dstack logs my-run-name
# Debug mode (includes additional runner logs)
dstack logs my-run-name -d
# Fetch logs from specific replica (multi-node runs)
dstack logs my-run-name --replica 1
# Fetch logs from specific job
dstack logs my-run-name --job 0
# Stop specific run (use -y after user confirms)
dstack stop my-run-name -y
# Abort (force stop)
dstack stop my-run-name --abort
Offers represent available instance configurations that match resource
requirements. If --fleet is omitted, dstack offer checks all configured
backends. Listing offers does not create capacity; submitting a run still
requires at least one fleet that can provision or reuse matching instances.
Use --fleet to inspect offers available through specific fleets.
# Filter by specific backend
dstack offer --backend aws
# Filter by GPU type
dstack offer --gpu A100
# Filter by GPU memory
dstack offer --gpu 24GB..80GB
# Combine filters
dstack offer --backend aws --gpu A100:80GB
# Limit to a specific fleet
dstack offer --fleet my-fleet
# Combine offers from multiple fleets
dstack offer --fleet my-fleet --fleet other-fleet
# JSON output (for troubleshooting/scripting)
dstack offer --json
With one --fleet, dstack offer shows offers available through that fleet. With multiple --fleet, it combines offers available through the selected fleets. Identical backend offers are shown once, while matching existing instances stay separate.
Max offers: By default, dstack offer returns first N offers (output also
includes the total number). Use --max-offers N to increase the limit.
Grouping: Prefer --group-by gpu for aggregated output across all offers,
not --max-offers. Other supported fields are backend, region, and
count; region requires backend.
When diagnosing issues with dstack workloads or infrastructure:
Use JSON output for detailed inspection:
dstack fleet get my-fleet --json
dstack run get my-run --json
dstack ps -n 10 --json
dstack offer --json
Check verbose run status:
dstack ps -v
Examine logs with debug output:
dstack logs my-run -d
Attach with log replay:
dstack attach my-run --logs
Common issues:
dstack offer; if submitting a run, ensure at least one fleet can provision or reuse matching instancesdstack apply output for specific errorsdstack ps -v to see provisioning status; consider spot vs on-demandWhen errors occur:
Core documentation:
Additional concepts:
Guides:
Accelerator-specific examples:
Full documentation: https://dstack.ai/llms-full.txt