Run any Skill in Manus with one click

$pwd:

troubleshoot-virt

Name: Troubleshoot Virt
Author: kubev2v

// Troubleshoot stuck VMs and migrations in OpenShift Virtualization and MTV/Forklift. Use when VMs won't start, DataVolumes are stuck, migrations fail, or cluster resources are exhausted.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 3, 2026 at 12:13

SKILL.md

readonly

related-skills.json

same repository

mtv-test.md

from "kubev2v/mtv-skills"

Generate bash e2e verification scripts for MTV/Forklift bugs and features through a guided workflow (gather context, write test plan, get approval, generate script). Use when the user asks to create a test, write a test script, verify a bug fix, build an e2e test, generate a verification script, or mentions an MTV/Forklift Jira ticket (MTV-<number>) together with testing.

2026-05-310

check-ceph-health.md

from "kubev2v/mtv-skills"

Check Ceph storage health on OpenShift OCS/ODF clusters. Use when PVCs are stuck in Pending, storage provisioning fails, Ceph is degraded, OSDs are full, or cluster storage needs diagnosis.

2026-05-070

check-ocp-health.md

from "kubev2v/mtv-skills"

General OpenShift (OCP) cluster health check. Use when the cluster is unhealthy, nodes are NotReady, operators are degraded, pods are crashing, etcd is slow, networking issues occur, or a general cluster diagnosis is needed.

2026-05-070

kubectl-mtv.md

from "kubev2v/mtv-skills"

Use the oc mtv CLI to manage VM migrations. Use this skill when the user wants to migrate VMs from vSphere, oVirt, OpenStack, OVA, EC2, or HyperV to OpenShift/KubeVirt.

2026-05-070

kubectl-virt.md

from "kubev2v/mtv-skills"

Use oc virt (or kubectl virt) to manage KubeVirt virtual machines. Use this skill when the user wants to create, start, stop, or manage VMs on OpenShift/Kubernetes.

2026-05-030

mcp-setup.md

from "kubev2v/mtv-skills"

Install and configure the CLI plugins for Forklift/MTV, Prometheus metrics, and Kubernetes debug queries. Use when CLI tools (oc mtv, oc metrics, oc debug-queries) are not available, or when the user wants to set up the tools.

2026-05-030

package.json

"author": "kubev2v"

"repository": "kubev2v/mtv-skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Network and Computer Systems AdministratorsComputer and Mathematical Occupations15-1244L4

name	troubleshoot-virt
description	Troubleshoot stuck VMs and migrations in OpenShift Virtualization and MTV/Forklift. Use when VMs won't start, DataVolumes are stuck, migrations fail, or cluster resources are exhausted.

Troubleshooting VMs and Migrations

Use this guide when VMs or migrations are stuck, failing, or behaving unexpectedly.

Required CLI Tools

This skill requires:

oc debug-queries (kubectl-debug-queries) -- for listing resources, logs, events
oc mtv (kubectl-mtv) -- for MTV health, plans, providers
oc metrics (kubectl-metrics) -- for node resource usage

If any tool is missing, install with:

curl -sSL https://raw.githubusercontent.com/yaacov/kubectl-debug-queries/main/install.sh | bash
curl -sSL https://raw.githubusercontent.com/yaacov/kubectl-mtv/main/install.sh | bash
curl -sSL https://raw.githubusercontent.com/yaacov/kubectl-metrics/main/install.sh | bash

Quick Triage Checklist

When something is stuck, check these in order:

Node resources -- is the cluster out of CPU/memory/pods?
Storage -- is the default StorageClass set? Are PVCs bound? Are DataVolumes progressing?
VM status -- what does the VM/VMI conditions say?
Pod status -- is the virt-launcher or importer pod stuck/erroring?
Events -- what do namespace events say?

1. Node Resources

oc debug-queries list --resource nodes --all-namespaces

oc metrics query --query "avg(instance:node_cpu:ratio) * 100"
oc metrics query --query "(1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes)) * 100"
oc metrics query --query "sum(kube_node_status_condition{condition='Ready',status='true'})"

Check what's consuming resources on a specific node:

oc debug-queries list --resource pods --all-namespaces --query "where spec.nodeName = '<node-name>'"

Check for node conditions:

oc debug-queries get --resource node --name <node-name> --namespace default

If nodes show MemoryPressure or DiskPressure, VMs and migration pods cannot be scheduled.

2. Storage

Default StorageClass

A default StorageClass is required for DataVolumes to work. Without it, PVCs won't provision.

oc debug-queries list --resource storageclass --all-namespaces

If none is default, set one (requires shell):

oc annotate storageclass <name> storageclass.kubernetes.io/is-default-class=true

StorageProfile

CDI uses StorageProfiles to determine accessModes and volumeMode for each StorageClass. A misconfigured profile can cause DataVolumes to fail.

oc debug-queries list --resource storageprofile --all-namespaces
oc debug-queries get --resource storageprofile --name <storageclass-name> --namespace default --output yaml

A healthy StorageProfile has status.claimPropertySets populated with accessModes and volumeMode.

DataVolumes (DV)

DataVolumes manage the lifecycle of importing/cloning disk images into PVCs.

oc debug-queries list --resource dv --namespace <namespace>
oc debug-queries get --resource dv --name <dv-name> --namespace <namespace>

Common DV phases: ImportScheduled -> ImportInProgress -> Succeeded. Pending (stuck) usually means a storage or scheduling problem.

PVCs

oc debug-queries list --resource pvc --namespace <namespace>
oc debug-queries get --resource pvc --name <pvc-name> --namespace <namespace>

Stuck in Pending = no StorageClass, no capacity, or WaitForFirstConsumer binding.

CDI Importer/Cloner Pods

When a DataVolume is importing, CDI creates temporary pods. If those pods are stuck, the DV won't progress.

oc debug-queries list --resource pods --namespace <namespace> --query "where name ~= '.*importer.*|.*clone.*|.*upload.*'"
oc debug-queries get --resource pod --name <importer-pod> --namespace <namespace>
oc debug-queries logs --name <importer-pod> --namespace <namespace>

3. VM Status

oc debug-queries get --resource vm --name <vm-name> --namespace <namespace>
oc debug-queries get --resource vmi --name <vm-name> --namespace <namespace>

Common stuck reasons:

Unschedulable: not enough CPU/memory on any node
DataVolumeError: boot disk DV failed
ErrImagePull: containerdisk image not found
Guest agent not connected: VM running but no agent

4. Pod Status (virt-launcher)

Each running VM has a virt-launcher pod. If the pod is stuck, the VM won't start.

oc debug-queries list --resource pods --namespace <namespace> --selector "kubevirt.io=virt-launcher"
oc debug-queries get --resource pod --name <virt-launcher-pod> --namespace <namespace>
oc debug-queries logs --name <virt-launcher-pod> --namespace <namespace>
oc debug-queries logs --name <virt-launcher-pod> --namespace <namespace> --container compute

5. Events

Namespace events often reveal the root cause faster than anything else.

oc debug-queries events --namespace <namespace> --sort-by time_desc
oc debug-queries events --namespace <namespace> --query "where type = 'Warning'"
oc debug-queries events --namespace <namespace> --name <vm-name> --resource VirtualMachine

6. Migration Troubleshooting (MTV/Forklift)

Discover the Forklift namespace

The operator namespace varies by installation (commonly openshift-mtv or konveyor-forklift). Always discover it first:

oc mtv health --all-namespaces --skip-logs

The health output includes "Namespace: ". Use that value for all subsequent commands in this section.

Quick health check

The health command includes built-in log analysis by default. Use one of:

oc mtv health --all-namespaces

This includes log analysis (default 100 lines per pod). For deeper log analysis:

oc mtv health --all-namespaces --log-lines 200

For a fast check without log analysis:

oc mtv health --all-namespaces --skip-logs

Forklift pods

Forklift runs in the namespace discovered above.

oc debug-queries list --resource pods --namespace <forklift-namespace>
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main
oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container inventory

Key pods: forklift-controller (main migration controller), forklift-api, forklift-validation, forklift-volume-populator-controller.

Querying Forklift logs

Before writing log queries, discover the actual field names and values:

oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 5 --output json

This shows the parsed fields (level, message, logger, source, fields.*) and their actual values for the target workload.

Forklift controllers use the logger field (e.g., plan|ocp, storageMap|ocp, provider) rather than source (which is empty). Filter by logger:

oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --query "where logger ~= 'plan.*'"

Tip: If a level query returns no matches, check what levels the workload actually uses. Level strings vary by workload -- controller-runtime logs normalize to ERROR, INFO, DEBUG; klog-format logs may normalize to E, W, I, F. Run with --output json and --tail 5 first to see actual level values.

Filter logs by structured fields (e.g., provider name):

oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --query "where fields.provider ~= '.*<provider-name>.*'"

For newest-first log output (useful when checking recent errors):

oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --sort-by time_desc --query "where level = 'ERROR'"

Full-text search when you don't know which field contains the value:

oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --tail 200 --query "where raw_line ~= '.*<search-term>.*'"

Migration plan status

oc mtv get plan -n <namespace>
oc mtv get plan --name <plan-name> -n <namespace>
oc mtv get plan --name <plan-name> --vms -n <namespace>
oc mtv get plan --name <plan-name> --disk -n <namespace>
oc mtv describe plan --name <plan-name> -n <namespace>

Debug a failing plan

Forklift reports most failures through status conditions and Kubernetes events, not ERROR-level logs. Prioritize these over log searching:

Check plan status conditions first:

oc debug-queries get --resource plans --name <plan-name> --namespace <ns> --output json --query "select name, status.conditions"

Check events for the plan and its mappings:

oc debug-queries events --namespace <ns> --query "where involvedObject.name ~= '.*<plan-name>.*'"

Check mapping status conditions:

oc debug-queries get --resource storagemaps --name <storage-map-name> --namespace <ns> --output json --query "select name, status.conditions"
oc debug-queries get --resource networkmaps --name <network-map-name> --namespace <ns> --output json --query "select name, status.conditions"

Only then check controller logs for reconcile context:

oc debug-queries logs --name deployment/forklift-controller --namespace <forklift-namespace> --container main --since 10m --tail 200 --query "where fields.plan ~= '.*<plan-name>.*'"

Migration pods (per-VM)

During migration, Forklift creates pods in the target namespace (not the operator namespace):

oc debug-queries list --resource pods --namespace <namespace> --query "where name ~= '.*virt-v2v.*|.*populator.*|.*importer.*'"
oc debug-queries logs --name <virt-v2v-pod> --namespace <namespace>

Provider connectivity

oc mtv get provider -n <namespace>
oc mtv describe provider --name <provider-name> -n <namespace>

7. KubeVirt Operator Pods

The KubeVirt operator components run in openshift-cnv (OpenShift) or kubevirt namespace.

oc debug-queries list --resource pods --namespace openshift-cnv --query "where name ~= '.*virt-operator.*|.*virt-controller.*|.*virt-handler.*|.*virt-api.*|.*cdi-.*'"

Check for pod restarts (sign of instability):

oc metrics query --query "topk(10, sort_desc(kube_pod_container_status_restarts_total))"

Logs from key components:

oc debug-queries logs --name deployment/virt-controller --namespace openshift-cnv
oc debug-queries logs --name deployment/cdi-deployment --namespace openshift-cnv

8. Common Stuck Scenarios

VM stuck in Scheduling

Cause: Not enough CPU/memory on any schedulable node
Check: oc debug-queries list nodes + oc metrics query CPU utilization + oc debug-queries get vmi for scheduling errors
Fix: Free up node resources, scale cluster, or use a smaller instance type

DataVolume stuck in Pending

Cause: No default StorageClass, or StorageProfile misconfigured
Check: oc debug-queries list storageclass (look for default), oc debug-queries get storageprofile
Fix: Set a default StorageClass, ensure StorageProfile has claimPropertySets

DataVolume stuck in ImportInProgress

Cause: Importer pod failing (network, auth, image not found)
Check: oc debug-queries list pods with query for importer, then oc debug-queries logs
Fix: Check source URL, credentials, network policies

Migration plan stuck

Cause: Provider unreachable, disk transfer stalled, converter pod OOM
Check: oc mtv health, oc mtv get plan with --vms --disk flags, converter pod logs
Fix: Check provider connectivity, increase converter memory via settings, check storage throughput

VM stuck in Pending after migration

Cause: Target PVCs not bound, insufficient resources for target VM
Check: oc debug-queries list pvc, oc debug-queries get vmi
Fix: Ensure target storage has capacity, check node resources

Self-Learning Rule

When you need to discover available flags or verify syntax:

oc mtv <command> --help
oc debug-queries logs --help
oc metrics query --help

troubleshoot-virt

More from this repository

More from this repository

Troubleshooting VMs and Migrations

Required CLI Tools

Quick Triage Checklist

1. Node Resources

2. Storage

Default StorageClass

StorageProfile

DataVolumes (DV)

PVCs

CDI Importer/Cloner Pods

3. VM Status

4. Pod Status (virt-launcher)

5. Events

6. Migration Troubleshooting (MTV/Forklift)

Discover the Forklift namespace

Quick health check

Forklift pods

Querying Forklift logs

Migration plan status

Debug a failing plan

Migration pods (per-VM)

Provider connectivity

7. KubeVirt Operator Pods

8. Common Stuck Scenarios

VM stuck in Scheduling

DataVolume stuck in Pending

DataVolume stuck in ImportInProgress

Migration plan stuck

VM stuck in Pending after migration

Self-Learning Rule

Troubleshooting VMs and Migrations

Required CLI Tools

Quick Triage Checklist

1. Node Resources

2. Storage

Default StorageClass

StorageProfile

DataVolumes (DV)

PVCs

CDI Importer/Cloner Pods

3. VM Status

4. Pod Status (virt-launcher)

5. Events

6. Migration Troubleshooting (MTV/Forklift)

Discover the Forklift namespace

Quick health check

Forklift pods

Querying Forklift logs

Migration plan status

Debug a failing plan

Migration pods (per-VM)

Provider connectivity

7. KubeVirt Operator Pods

8. Common Stuck Scenarios

VM stuck in Scheduling

DataVolume stuck in Pending

DataVolume stuck in ImportInProgress

Migration plan stuck

VM stuck in Pending after migration

Self-Learning Rule