with one click
troubleshoot-virt
// Troubleshoot stuck VMs and migrations in OpenShift Virtualization and MTV/Forklift. Use when VMs won't start, DataVolumes are stuck, migrations fail, or cluster resources are exhausted.
// Troubleshoot stuck VMs and migrations in OpenShift Virtualization and MTV/Forklift. Use when VMs won't start, DataVolumes are stuck, migrations fail, or cluster resources are exhausted.
Generate bash e2e verification scripts for MTV/Forklift bugs and features through a guided workflow (gather context, write test plan, get approval, generate script). Use when the user asks to create a test, write a test script, verify a bug fix, build an e2e test, generate a verification script, or mentions an MTV/Forklift Jira ticket (MTV-<number>) together with testing.
Check Ceph storage health on OpenShift OCS/ODF clusters. Use when PVCs are stuck in Pending, storage provisioning fails, Ceph is degraded, OSDs are full, or cluster storage needs diagnosis.
General OpenShift (OCP) cluster health check. Use when the cluster is unhealthy, nodes are NotReady, operators are degraded, pods are crashing, etcd is slow, networking issues occur, or a general cluster diagnosis is needed.
Use the oc mtv CLI to manage VM migrations. Use this skill when the user wants to migrate VMs from vSphere, oVirt, OpenStack, OVA, EC2, or HyperV to OpenShift/KubeVirt.
Use oc virt (or kubectl virt) to manage KubeVirt virtual machines. Use this skill when the user wants to create, start, stop, or manage VMs on OpenShift/Kubernetes.
Install and configure the CLI plugins for Forklift/MTV, Prometheus metrics, and Kubernetes debug queries. Use when CLI tools (oc mtv, oc metrics, oc debug-queries) are not available, or when the user wants to set up the tools.
| name | troubleshoot-virt |
| description | Troubleshoot stuck VMs and migrations in OpenShift Virtualization and MTV/Forklift. Use when VMs won't start, DataVolumes are stuck, migrations fail, or cluster resources are exhausted. |
Use this guide when VMs or migrations are stuck, failing, or behaving unexpectedly.
This skill requires:
oc debug-queries (kubectl-debug-queries) -- for listing resources, logs, eventsoc mtv (kubectl-mtv) -- for MTV health, plans, providersoc metrics (kubectl-metrics) -- for node resource usageIf any tool is missing, install with:
curl -sSL https://raw.githubusercontent.com/yaacov/kubectl-debug-queries/main/install.sh | bash
curl -sSL https://raw.githubusercontent.com/yaacov/kubectl-mtv/main/install.sh | bash
curl -sSL https://raw.githubusercontent.com/yaacov/kubectl-metrics/main/install.sh | bash
When something is stuck, check these in order:
oc debug-queries list --resource nodes --all-namespaces
oc metrics query --query "avg(instance:node_cpu:ratio) * 100"
oc metrics query --query "(1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes)) * 100"
oc metrics query --query "sum(kube_node_status_condition{condition='Ready',status='true'})"
Check what's consuming resources on a specific node:
oc debug-queries list --resource pods --all-namespaces --query "where spec.nodeName = '<node-name>'"
Check for node conditions:
oc debug-queries get --resource node --name <node-name> --namespace default
If nodes show MemoryPressure or DiskPressure, VMs and migration pods cannot be scheduled.
A default StorageClass is required for DataVolumes to work. Without it, PVCs won't provision.
oc debug-queries list --resource storageclass --all-namespaces
If none is default, set one (requires shell):
oc annotate storageclass <name> storageclass.kubernetes.io/is-default-class=true
CDI uses StorageProfiles to determine accessModes and volumeMode for each StorageClass. A misconfigured profile can cause DataVolumes to fail.
oc debug-queries list --resource storageprofile --all-namespaces
oc debug-queries get --resource storageprofile --name <storageclass-name> --namespace default --output yaml
A healthy StorageProfile has status.claimPropertySets populated with accessModes and volumeMode.
DataVolumes manage the lifecycle of importing/cloning disk images into PVCs.
oc debug-queries list --resource dv --namespace <namespace>
oc debug-queries get --resource dv --name <dv-name> --namespace <namespace>
Common DV phases: ImportScheduled -> ImportInProgress -> Succeeded. Pending (stuck) usually means a storage or scheduling problem.
oc debug-queries list --resource pvc --namespace <namespace>
oc debug-queries get --resource pvc --name <pvc-name> --namespace <namespace>
Stuck in Pending = no StorageClass, no capacity, or WaitForFirstConsumer binding.
When a DataVolume is importing, CDI creates temporary pods. If those pods are stuck, the DV won't progress.
oc debug-queries list --resource pods --namespace <namespace> --query "where name ~= '.*importer.*|.*clone.*|.*upload.*'"
oc debug-queries get --resource pod --name <importer-pod> --namespace <namespace>
oc debug-queries logs --name <importer-pod> --namespace <namespace>
oc debug-queries get --resource vm --name <vm-name> --namespace <namespace>
oc debug-queries get --resource vmi --name <vm-name> --namespace <namespace>
Common stuck reasons:
Each running VM has a virt-launcher pod. If the pod is stuck, the VM won't start.
oc debug-queries list --resource pods --namespace <namespace> --selector "kubevirt.io=virt-launcher"
oc debug-queries get --resource pod --name <virt-launcher-pod> --namespace <namespace>
oc debug-queries logs --name <virt-launcher-pod> --namespace <namespace>
oc debug-queries logs --name <virt-launcher-pod> --namespace <namespace> --container compute
Namespace events often reveal the root cause faster than anything else.
oc debug-queries events --namespace <namespace> --sort-by time_desc
oc debug-queries events --namespace <namespace> --query "where type = 'Warning'"
oc debug-queries events --namespace <namespace> --name <vm-name> --resource VirtualMachine
The operator namespace varies by installation (commonly openshift-mtv or konveyor-forklift). Always discover it first:
oc mtv health --all-namespaces --skip-logs
The health output includes "Namespace: ". Use that value for all subsequent commands in this section.
The health command includes built-in log analysis by default. Use one of:
oc mtv health --all-namespaces
This includes log analysis (default 100 lines per pod). For deeper log analysis:
oc mtv health --all-namespaces --log-lines 200
For a fast check without log analysis:
oc mtv health --all-namespaces --skip-logs
Forklift runs in the namespace discovered above.
oc debug-queries list --resource pods --namespace <forklift-namespace>
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container inventory
Key pods: forklift-controller (main migration controller), forklift-api, forklift-validation, forklift-volume-populator-controller.
Before writing log queries, discover the actual field names and values:
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 5 --output json
This shows the parsed fields (level, message, logger, source, fields.*) and their actual values for the target workload.
Forklift controllers use the logger field (e.g., plan|ocp, storageMap|ocp, provider) rather than source (which is empty). Filter by logger:
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --query "where logger ~= 'plan.*'"
Tip: If a level query returns no matches, check what levels the workload actually uses. Level strings vary by workload -- controller-runtime logs normalize to ERROR, INFO, DEBUG; klog-format logs may normalize to E, W, I, F. Run with --output json and --tail 5 first to see actual level values.
Filter logs by structured fields (e.g., provider name):
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --query "where fields.provider ~= '.*<provider-name>.*'"
For newest-first log output (useful when checking recent errors):
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --sort-by time_desc --query "where level = 'ERROR'"
Full-text search when you don't know which field contains the value:
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --query "where raw_line ~= '.*<search-term>.*'"
oc mtv get plan -n <namespace>
oc mtv get plan --name <plan-name> -n <namespace>
oc mtv get plan --name <plan-name> --vms -n <namespace>
oc mtv get plan --name <plan-name> --disk -n <namespace>
oc mtv describe plan --name <plan-name> -n <namespace>
Forklift reports most failures through status conditions and Kubernetes events, not ERROR-level logs. Prioritize these over log searching:
oc debug-queries get --resource plans --name <plan-name> --namespace <ns> --output json --query "select name, status.conditions"
oc debug-queries events --namespace <ns> --query "where involvedObject.name ~= '.*<plan-name>.*'"
oc debug-queries get --resource storagemaps --name <storage-map-name> --namespace <ns> --output json --query "select name, status.conditions"
oc debug-queries get --resource networkmaps --name <network-map-name> --namespace <ns> --output json --query "select name, status.conditions"
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --since 10m --tail 200 --query "where fields.plan ~= '.*<plan-name>.*'"
During migration, Forklift creates pods in the target namespace (not the operator namespace):
oc debug-queries list --resource pods --namespace <namespace> --query "where name ~= '.*virt-v2v.*|.*populator.*|.*importer.*'"
oc debug-queries logs --name <virt-v2v-pod> --namespace <namespace>
oc mtv get provider -n <namespace>
oc mtv describe provider --name <provider-name> -n <namespace>
The KubeVirt operator components run in openshift-cnv (OpenShift) or kubevirt namespace.
oc debug-queries list --resource pods --namespace openshift-cnv --query "where name ~= '.*virt-operator.*|.*virt-controller.*|.*virt-handler.*|.*virt-api.*|.*cdi-.*'"
Check for pod restarts (sign of instability):
oc metrics query --query "topk(10, sort_desc(kube_pod_container_status_restarts_total))"
Logs from key components:
oc debug-queries logs --name deployment/virt-controller --namespace openshift-cnv
oc debug-queries logs --name deployment/cdi-deployment --namespace openshift-cnv
oc debug-queries list nodes + oc metrics query CPU utilization + oc debug-queries get vmi for scheduling errorsoc debug-queries list storageclass (look for default), oc debug-queries get storageprofileclaimPropertySetsoc debug-queries list pods with query for importer, then oc debug-queries logsoc mtv health, oc mtv get plan with --vms --disk flags, converter pod logsoc debug-queries list pvc, oc debug-queries get vmiWhen you need to discover available flags or verify syntax:
oc mtv <command> --help
oc debug-queries logs --help
oc metrics query --help