// Systematically debug failed Omnistrate instance deployments using a progressive workflow that identifies root causes efficiently while avoiding token limits. Applies to deployment failures, probe issues, and helm-based resources.
| name | Debugging Omnistrate Deployments |
| description | Systematically debug failed Omnistrate instance deployments using a progressive workflow that identifies root causes efficiently while avoiding token limits. Applies to deployment failures, probe issues, and helm-based resources. |
Tool: mcp__omnistrate-platform__omnistrate-ctl_instance_describe
Flags: --deployment-status --output json
Extract:
Key Benefit: Returns concise status, significantly reduces token usage vs full describe
Tool: mcp__omnistrate-platform__omnistrate-ctl_workflow_list
Flags: --instance-id <id> --output json
Extract workflow IDs, types, and start/end times for failed deployments.
Phase 1 - Summary (Always Start Here):
omctl workflow events <workflow-id> --service-id <id> --environment-id <id> --output json
Extract:
Phase 2 - Detail (Only for Failed Steps):
omctl workflow events <workflow-id> --service-id <id> --environment-id <id> \
--resource-key <name> --step-types <type> --detail --output json
Use parameters:
--resource-key: Target specific resource--step-types: Filter to specific step (Bootstrap, Compute, Deployment, Network, Storage, Monitoring)--detail: Include full event details (use sparingly)--since/--until: Time-bound queriesExtract from detail view:
Pod Event Timeline: Create ASCII visualizations showing deployment progression:
HH:MM:SS ┬─── ✗ FailedScheduling
│ pod/app-0: Insufficient memory
│
HH:MM:SS ├─── ⚡ TriggeredScaleUp
│ nodegroup-1: adding 2 nodes
│
HH:MM:SS ├─── 📥 Pulling image:latest
│ (duration: 2m15s)
│
HH:MM:SS └─── ✅ Started
3/3 pods Running
Symbols: ✗ failed, ✅ success, ⚡ autoscaler, 💾 storage, 📥 image, 🚀 runtime, ⚠️ warning
When: Resource DEPLOYING with probe failures, containers Running but not Ready, no conclusive evidence from previous steps
Tool: mcp__omnistrate-platform__omnistrate-ctl_deployment-cell_update-kubeconfig + kubectl
omctl deployment-cell update-kubeconfig <cell-id> --kubeconfig /tmp/kubeconfig
kubectl get pods -n <instance-id> --kubeconfig /tmp/kubeconfig
kubectl logs <pod-name> -c service -n <instance-id> --kubeconfig /tmp/kubeconfig --tail=50
Look for:
When: Helm resources with conflicting status, need application credentials, deployment state unclear
Tool: Same kubeconfig setup with --role cluster-admin + helm
omctl deployment-cell update-kubeconfig <cell-id> --kubeconfig /tmp/kubeconfig --role cluster-admin
helm list -n <instance-id> --kubeconfig /tmp/kubeconfig
helm status <release-name> -n <instance-id> --kubeconfig /tmp/kubeconfig
Extract:
Tool: mcp__omnistrate-platform__omnistrate-ctl_operations_events
Use time windows from workflow analysis, filter by relevant event types.
--output jsonSee OMNISTRATE_SRE_REFERENCE.md for: