| name | truefoundry-deploy |
| description | Deploys applications to TrueFoundry. Handles single HTTP services, async/queue workers, multi-service projects, and declarative manifest apply. Supports `tfy apply`, `tfy deploy`, docker-compose translation, and CI/CD pipelines. Use when deploying apps, applying manifests, shipping services, or orchestrating multi-service deployments. |
| license | MIT |
| compatibility | Requires Bash, curl, and access to a TrueFoundry instance |
| metadata | {"disable-model-invocation":"true"} |
| allowed-tools | Bash(tfy*) Bash(*/tfy-api.sh *) Bash(*/tfy-version.sh *) Bash(docker *) Bash(tfy deploy*) Bash(curl *) |
Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
Deploy to TrueFoundry
Route user intent to the right deployment workflow. Load only the references you need.
Intent Router
| User Intent | Action | Reference |
|---|
| "deploy", "deploy my app", "ship this" | Single HTTP service | deploy-service.md |
| "mount this file", "mount config file", "mount certificate file", "mount key file" | Single service with file mounts (no image rebuild) | deploy-service.md |
| "tfy apply", "apply manifest", "deploy from yaml" | Declarative manifest apply | deploy-apply.md |
| "deploy everything", "full stack", docker-compose, "docker-compose.yaml", "compose.yaml" | Multi-service: use compose as source of truth | deploy-multi.md + compose-translation.md |
| "async service", "queue consumer", "worker" | Async/queue service | deploy-async.md |
| "deploy LLM", "serve model" | Model serving intent (may be ambiguous) | Ask user: dedicated model serving (llm-deploy) or generic service deploy (deploy) |
| "deploy helm chart" | Helm chart intent | Confirm Helm path and collect chart details, then proceed with helm workflow |
| "deploy postgres docker", "dockerized postgres", "deploy redis docker", "database in docker/container" | Containerized database intent | Proceed with deploy workflow (do not route to Helm) |
| "deploy database", "deploy postgres", "deploy redis" | Ambiguous infra intent | Ask user: Helm chart (helm) or containerized service (deploy) |
Load only the reference file matching the user's intent. Do not preload all references.
General Principle: Ask, Don't Assume
When in doubt, ask. If any deployment parameter is ambiguous or missing — branch, workspace, image, port, resources, environment — ask the user rather than picking a value and proceeding silently. A wrong assumption can deploy to the wrong environment, from the wrong branch, or with the wrong configuration. The cost of one extra question is always lower than the cost of a bad deploy.
Examples of things to ask rather than assume:
- Which workspace to deploy to (even if only one exists)
- Which branch to build from (especially if the manifest branch differs from the local branch)
- Whether to use the existing manifest as-is or update it
- Which Docker image tag or registry to use
- Whether the service should be public or internal
Do NOT silently default to the current value of anything that could have changed or that the user has not explicitly confirmed for this deployment.
Prerequisites (All Workflows)
grep '^TFY_' .env 2>/dev/null || true
env | grep '^TFY_' 2>/dev/null || true
export TFY_HOST="${TFY_HOST:-${TFY_BASE_URL%/}}"
tfy --version 2>/dev/null || echo "Install: pip install 'truefoundry==0.5.0'"
ls tfy-manifest.yaml truefoundry.yaml 2>/dev/null
TFY_BASE_URL and TFY_API_KEY must be set (env or .env).
TFY_HOST must be set before any tfy CLI command. The export above handles this automatically.
TFY_WORKSPACE_FQN required. HARD RULE: Never auto-pick a workspace. Always ask the user to confirm, even if only one workspace exists or a preference is saved. See references/prerequisites.md for the full workspace confirmation flow.
- For full credential setup, see
references/prerequisites.md.
WARNING: Never use source .env. The tfy-api.sh script handles .env parsing automatically. For shell access: grep KEY .env | cut -d= -f2-
CRITICAL: tfy apply vs tfy deploy
HARD RULE: tfy apply does NOT support build_source.type: local. If the manifest has a local build source, you MUST use tfy deploy -f <manifest>. Using tfy apply with a local build source will fail with: must match exactly one schema in oneOf.
| Scenario | Command | Works? |
|---|
Pre-built image (image.type: image) | tfy apply -f manifest.yaml | Yes |
build_source.type: git | tfy apply -f manifest.yaml | Yes |
build_source.type: git | tfy deploy -f manifest.yaml | Yes |
build_source.type: local | tfy deploy -f manifest.yaml | Yes |
build_source.type: local | tfy apply -f manifest.yaml | NO — will fail |
Before running any deploy command, check the manifest:
- If
build_source.type: local → use tfy deploy -f
- Otherwise →
tfy apply -f is fine
Pre-Flight Manifest Validation (MANDATORY)
Before attempting any deploy/apply, run these checks. Fix issues before deploying — do not deploy a known-bad manifest.
1. Exposed port requires host
If any port has expose: true, it must have a host field. Deploying without it will fail with: Host must be provided to expose port.
Auto-generate the host if missing:
TFY_API_SH=~/.claude/skills/truefoundry-deploy/scripts/tfy-api.sh
CLUSTER_ID=$(echo "$TFY_WORKSPACE_FQN" | cut -d: -f1)
bash $TFY_API_SH GET "/api/svc/v1/clusters/$CLUSTER_ID"
Pattern: {service-name}-{workspace-name}.{base_domain}
2. Local build source requires tfy deploy
If the manifest contains build_source.type: local, ensure the deploy command is tfy deploy -f, NOT tfy apply.
3. capacity_type compatibility
spot_fallback_on_demand is not supported on all clusters. If you're unsure, use on_demand or omit capacity_type entirely to let the platform decide. Valid safe values: on_demand, spot.
4. build_spec.type must be exact
Only dockerfile and tfy-python-buildpack are valid. Do NOT use docker, build, python, or any other value.
5. Git branch mismatch (existing manifest + git source)
If an existing manifest has build_source.type: git with a branch_name set, compare it to the current local branch before deploying:
grep -h 'branch_name:' "$MANIFEST_FILE" 2>/dev/null | head -1 | sed 's/.*branch_name:[[:space:]]*//'
git branch --show-current 2>/dev/null
If the branches differ, stop and ask the user:
The manifest specifies branch_name: {manifest_branch}, but your current local branch is {current_branch}.
Which branch should be deployed?
- Keep manifest branch:
{manifest_branch} (deploy as-is, no manifest change)
- Use current branch:
{current_branch} (update branch_name in the manifest)
Never silently override the manifest's branch_name with the current local branch.
Quick Ops (Inline)
Apply a manifest (pre-built image or git source)
export TFY_HOST="${TFY_HOST:-${TFY_BASE_URL%/}}"
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
tfy apply -f tfy-manifest.yaml
Deploy from local source
export TFY_HOST="${TFY_HOST:-${TFY_BASE_URL%/}}"
tfy deploy -f truefoundry.yaml --no-wait
Reminder: tfy apply does NOT support build_source.type: local. Use tfy deploy -f for local builds.
Minimal service manifest template
name: my-service
type: service
image:
type: image
image_uri: docker.io/myorg/my-api:v1.0
ports:
- port: 8000
expose: false
app_protocol: http
resources:
cpu_request: 0.5
cpu_limit: 1
memory_request: 512
memory_limit: 1024
ephemeral_storage_request: 1000
ephemeral_storage_limit: 2000
env:
LOG_LEVEL: info
replicas: 1
workspace_fqn: "WORKSPACE_FQN_HERE"
Public access template (when expose: true)
ports:
- port: 8000
expose: true
host: my-service-my-workspace.ml.your-org.truefoundry.cloud
app_protocol: http
Host is REQUIRED when expose: true. Auto-generate it: {service-name}-{workspace-name}.{base_domain}. Get base_domain from cluster discovery (see cluster-discovery.md).
Check deployment status
TFY_API_SH=~/.claude/skills/truefoundry-deploy/scripts/tfy-api.sh
bash $TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=SERVICE_NAME'
Or use the applications skill.
Post-Deploy Monitoring (MANDATORY)
HARD RULE: After every successful tfy apply or tfy deploy command, you MUST monitor the deployment to completion. Do NOT stop after the apply/deploy command returns. Do NOT ask the user "should I monitor?" — just do it. Do NOT say "you can check the status" — YOU check the status. The deployment is not done until you confirm a terminal state.
Monitoring procedure
Immediately after deploy/apply succeeds, start polling. Do not wait for the user to ask.
Poll loop — execute this yourself, do not delegate to the user:
TFY_API_SH=~/.claude/skills/truefoundry-deploy/scripts/tfy-api.sh
bash $TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=SERVICE_NAME'
Or use MCP tool call if available:
tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "SERVICE_NAME"})
How to check: The response is at data[0].deployment.currentStatus. Use state.isTerminalState as the authoritative check.
Terminal states (state.isTerminalState === true) — stop polling:
DEPLOY_SUCCESS → report success, replicas, endpoint URL
BUILD_FAILED, DEPLOY_FAILED, FAILED → fetch logs, diagnose, suggest fix (see below)
PAUSED → report paused
CANCELLED → report cancelled
Non-terminal states — keep polling, report progress each time:
INITIALIZED → "Deployment initialized, waiting..."
BUILDING (status) or transition BUILDING → "Build in progress..."
BUILD_SUCCESS → "Build succeeded, deploying..."
ROLLOUT_STARTED or transition DEPLOYING → "Deploying (X/Y replicas ready)..."
DEPLOY_FAILED_WITH_RETRY → "Deploy failed, retrying..."
On success
- Report final status and replicas (e.g., "2/2 ready")
- Show endpoint URL if service has an exposed port
- Run a quick HTTP health check if endpoint is available:
curl -sf -o /dev/null -w '%{http_code}' "https://ENDPOINT_URL" || true
On failure
- Fetch recent logs (last 5 minutes) using
logs skill or direct API
- Identify root cause from logs (OOMKilled, CrashLoopBackOff, ImagePullBackOff, port mismatch, probe failure, build error)
- Follow deploy-debugging.md for diagnosis
- Apply one fix and retry once; if still failed, report to user with summary and log excerpt and stop
On timeout (10 minutes)
Report current state and elapsed time. Do NOT silently give up — tell the user:
Monitoring timed out after 10 minutes. Current status: ROLLOUT_STARTED (transition: DEPLOYING).
The deployment is still in progress. You can re-run monitoring or check the TrueFoundry dashboard.
NEVER end your response after a deploy/apply command without reporting a terminal deployment status (state.isTerminalState === true). If you are about to end your response and you have not confirmed DEPLOY_SUCCESS, DEPLOY_FAILED, BUILD_FAILED, FAILED, PAUSED, or CANCELLED, you are violating this rule — go back and poll.
Post-Deploy Configuration (Ask After Success)
After deployment succeeds (DEPLOY_SUCCESS), ask the user about the following configuration options. Do not silently skip these — present them as a checklist and let the user decide.
1. Public vs Private URL
Ask the user:
Your service is deployed. How should it be accessed?
1. **Public URL** — Accessible from the internet (expose: true with a host)
2. **Private/Internal only** — Only accessible within the cluster (expose: false)
If the user picks public and the port doesn't already have expose: true + host, update the manifest and redeploy.
2. Authentication
Ask the user:
Do you want to add authentication to your service?
1. **No auth** — Anyone with the URL can access it
2. **TrueFoundry login** — Users must log in via TrueFoundry (truefoundry_oauth)
3. **JWT auth** — Verify JWT tokens from a custom identity provider
4. **Basic auth** — Username/password protection
If the user picks an auth option, add the appropriate auth block to the port configuration and redeploy.
3. Auto-shutdown vs Always Running
Ask the user:
Should the service auto-shutdown when idle?
1. **Always running** — Keep replicas up at all times (default)
2. **Auto-shutdown after idle** — Scale to zero after no requests for a period (saves cost)
→ Recommended wait_time: 900 seconds (15 min) for dev, longer for staging
If the user picks auto-shutdown, add the auto_shutdown block to the manifest:
auto_shutdown:
wait_time: 900
Skip these prompts if the user explicitly said they don't want changes, or if this is a redeploy of an existing service that already has these configured.
REST API fallback (when CLI unavailable)
See references/cli-fallback.md for converting YAML to JSON and deploying via tfy-api.sh.
Auto-Detection: Single vs Multi-Service
Before creating any manifest, scan the project:
- Check for
docker-compose.yml, docker-compose.yaml, or compose.yaml first. If present (or user mentions docker-compose), treat it as the primary source of truth: load deploy-multi.md and compose-translation.md, generate manifests from the compose file, wire services per service-wiring.md, then complete deployment. Do not ask the user to manually create manifests when a compose file exists.
- Look for multiple
Dockerfile files across the project
- Check for service directories with their own dependency files in
services/, apps/, frontend/, backend/
- Compose file present or user says "docker-compose" → Multi-service from compose: load
deploy-multi.md + compose-translation.md
- Single service → Load
references/deploy-service.md
- Multiple services (no compose) → Load
references/deploy-multi.md
Multi-Service Deployment Order (MANDATORY)
HARD RULE: When deploying multiple services, you MUST deploy in dependency order, create secrets between tiers, and wire services before deploying dependents. Never deploy all services at once.
Tier-by-tier flow:
TIER 0: Infrastructure (DB, Cache, Queue) → deploy → wait for pods ready → create TFY secrets
TIER 1: Backend (APIs, workers) → deploy with secrets + DNS wiring → verify connectivity
TIER 2: Frontend / gateway → deploy with backend URLs → verify end-to-end
Key rules:
- Create TFY secret groups with infra credentials between Tier 0 and Tier 1 — never put raw passwords in manifests
- SPA frontends (React, Vue) MUST use backend's public URL, not internal DNS
DEPLOY_SUCCESS does NOT mean Helm pods are ready — poll actual readiness
- Present the dependency graph and deploy plan to the user before deploying
For step-by-step orchestration, examples, and common patterns, see deploy-ordering.md. For dependency graphs, DNS wiring, and compose translation, see deploy-multi.md, service-wiring.md, and dependency-graph.md.
Secrets Handling (MANDATORY: Always Use TFY Secrets)
HARD RULE: NEVER put sensitive values directly in the manifest env block. ALWAYS create a TrueFoundry secret group first, then reference the secrets using tfy-secret:// format. This is non-negotiable — even for "quick" or "test" deployments.
Workflow for any env var that looks sensitive (matches *PASSWORD*, *SECRET*, *TOKEN*, *KEY*, *API_KEY*, *DATABASE_URL*, *CONNECTION_STRING*, *CREDENTIALS*, or any value the user explicitly says is sensitive):
- Ask the user for the secret values (or confirm they want to store them)
- Create a secret group using the
secrets skill:
- Reference them in the manifest with
tfy-secret:// format:
env:
LOG_LEVEL: info
DB_PASSWORD: tfy-secret://my-org:my-service-secrets:DB_PASSWORD
API_KEY: tfy-secret://my-org:my-service-secrets:API_KEY
Pattern: tfy-secret://<TENANT_NAME>:<SECRET_GROUP_NAME>:<SECRET_KEY> where TENANT_NAME is the subdomain of TFY_BASE_URL.
If the user provides a raw secret value in the manifest or asks you to put it directly in env:
- Warn them: "Secrets should not be stored as plain text in manifests."
- Offer to create a TFY secret group for them
- Only proceed with raw values if the user explicitly insists after the warning
Use the secrets skill for guided secret group creation. For the full workflow, see references/deploy-service.md (Secrets Handling section).
File Mounts (Config, Secrets, Shared Data)
When users ask to mount files into a deployment, prefer manifest mounts over Dockerfile edits:
type: secret for sensitive file content (keys, certs, credentials)
type: config_map for non-sensitive config files
type: volume for writable/shared runtime data
See references/deploy-service.md (File Mounts section) for the end-to-end workflow.
Shared References
These references are available for all workflows — load as needed:
| Reference | Contents |
|---|
manifest-schema.md | Complete YAML field reference (single source of truth) |
manifest-defaults.md | Per-service-type defaults with YAML templates |
cli-fallback.md | CLI detection and REST API fallback pattern |
cluster-discovery.md | Extract cluster ID, base domains, available GPUs |
resource-estimation.md | CPU, memory, GPU sizing rules of thumb |
health-probes.md | Startup, readiness, liveness probe configuration |
gpu-reference.md | GPU types and VRAM reference |
container-versions.md | Pinned container image versions |
prerequisites.md | Credential setup and .env configuration |
rest-api-manifest.md | Full REST API manifest reference |
Workflow-Specific References
| Reference | Used By |
|---|
deploy-api-examples.md | deploy-service |
deploy-errors.md | deploy-service |
deploy-scaling.md | deploy-service |
load-analysis-questions.md | deploy-service |
codebase-analysis.md | deploy-service |
tfy-apply-cicd.md | deploy-apply |
tfy-apply-extra-manifests.md | deploy-apply |
deploy-ordering.md | deploy-multi (tier-by-tier orchestration) |
compose-translation.md | deploy-multi |
dependency-graph.md | deploy-multi |
multi-service-errors.md | deploy-multi |
multi-service-patterns.md | deploy-multi |
service-wiring.md | deploy-multi |
deploy-debugging.md | All deploy/apply (when status is failed) |
async-errors.md | deploy-async |
async-queue-configs.md | deploy-async |
async-python-library.md | deploy-async |
async-sidecar-deploy.md | deploy-async |
Composability
- Find workspace: Use
workspaces skill
- Monitor rollout: Use
monitor skill to track deployment progress
- Check what's deployed: Use
applications skill
- View logs: Use
logs skill
- Manage secrets: Use
secrets skill
- Deploy Helm charts: Use
helm skill
- Deploy LLMs: Use
llm-deploy skill
- Test after deploy: Use
service-test skill
Success Criteria
- User confirmed service name, resources, port, and deployment source before deploying
- Deployment URL and status reported back to the user
- Deployment status verified automatically immediately after apply/deploy (no extra prompt)
- Health probes configured for production deployments
- Secrets stored securely (not hardcoded in manifests)
- For multi-service: all services wired together and working end-to-end