Skip to main content
Version: 3.6

Troubleshooting, Known Issues, and Limitations for KubeVirt VMs

This page provides troubleshooting steps, known issues, and limitations for KubeVirt VMs managed with Portworx.

Troubleshooting

Reverse migration issues with forklifted VMs

When you migrate a VM from vSphere to OpenShift by using the Migration Toolkit for Virtualization (MTV) or the Forklift operator, the system creates a conversion pod (virt-v2v) in the target namespace. This pod performs disk transformation and remains in a Completed state after migration.

By default, the pod is not deleted to allow log inspection. However, it can block reverse migration using Stork. As a result, PVCs and PVs in the namespace may remain stuck in a Terminating state, preventing recreation.

Workaround

Before starting a reverse migration, delete the virt-v2v conversion pod:

  1. Identify the pod:

    oc get pods -n <target-namespace> -l forklift.app=virt-v2v
  2. Confirm the pod is in the Completed state.

  3. Delete the pod:

    oc delete pod <pod-name> -n <namespace>
note

This issue applies to Forklift-based migrations where the virt-v2v pod is retained after migration for observability and debugging.

Known issues

KubeVirt VM eviction during abrupt node power-off

When a node running a KubeVirt VM experiences abrupt power loss, KubeVirt applies the default eviction policy (LiveMigrate). KubeVirt attempts to migrate the VM to another node when the source node becomes unavailable.

If the VM uses Portworx volumes with replication (for example, repl2 or repl3), the VM may restart on another node. In some cases, the VM may boot into safe mode due to the interruption. Restart the VM to recover from safe mode.

Thick-provisioned PVCs after VMware-to-KubeVirt migration

When you migrate VMs from a VMware environment to a KubeVirt environment with VolumeMode: Block, the persistent volume claims (PVCs) may appear fully utilized or thick-provisioned. This behavior can increase pool usage and trigger unnecessary volume expansion.

Golden image and golden PVC behavior

A golden image is a preconfigured VM disk used as a template for consistent VM provisioning. A golden PVC is a pre-provisioned PVC that contains a golden image. You can clone these PVCs to create new VMs.

  • Golden PVCs created via HTTP import can behave like thick-provisioned volumes, consuming more physical storage than the actual image size. Cloned volumes from these PVCs, particularly when using Portworx raw block volumes, may exhibit increased capacity usage and degraded performance compared to sharedv4 volumes. To optimize space efficiency, run defragmentation inside the guest VM.
  • VMs and their associated PVCs may be scheduled on the same node as the golden PVC. This can lead to resource bottlenecks. To avoid this, maintain multiple golden PVCs.
  • If you create a golden PVC with a replication factor of 3 and a node or Portworx restart occurs during VM creation, the resulting cloned PVC may be created with a replication factor of 2. There is no automatic correction, but replication can be manually restored using repl add on the affected PVC.
  • When multiple VMs are created at the same time from a single golden PVC, some VMs may display a "Running" status while the guest operating system is unresponsive. To recover the VM, perform a power cycle:
    • OpenShift: Use the OpenShift UI or virtctl CLI.
    • SUSE Virtualization: Use the SUSE Virtualization UI or virtctl CLI.
  • When multiple VM disks are cloned from a single template PVC, all the resulting clone volumes reside on the same set of replica nodes. This can create I/O hot spots and degrade performance for those nodes. To reduce this risk, create multiple template PVCs distributed across different nodes and round-robin between them when provisioning new VMs.

OpenShift-specific known issues

  • A known issue in libvirt affects the use of 4K block volumes and may cause VMs to pause due to I/O errors. This issue is resolved in OpenShift Container Platform (OCP) version 4.16 or later. For more information, see Red Hat solution.

  • OpenShift Virtualization (OSV) versions 4.18.4 or earlier contain a known issue that incorrectly handles discards when used with Portworx block devices. As a workaround, disable discards on the Portworx volume. If discards are disabled, the StorageClass must include the parameter nodiscard: true.
    The nodiscard setting can also be toggled after PVC creation by using the command pxctl volume update --nodiscard on <your_pvc_name>. After updating, restart the pod for the change to take effect.

Limitations

KubeVirt VMs might not automatically fail over to another node

On OpenShift Container Platform, when a node hosting KubeVirt virtual machines (VMs) becomes unavailable, the VMs might not automatically fail over to another node. This behavior is expected. To address this, you can either deploy a node remediation operator for automated handling or manually trigger a failover:

Deploy a node remediation operator

To automate failover and ensure that VMs are rescheduled on healthy nodes, deploy the Node Health Check (NHC) Operator in OCP.

  1. Install the NHC Operator using the OpenShift Web Console or OpenShift CLI.

    note

    The NHC Operator includes Self-Node Remediation (SNR) functionality by default.

  2. Configure a Node Health Check to monitor worker nodes and specify a remediation duration. For detailed configuration steps, see Red Hat documentation.

    note

    Ensure that you select the worker nodes when creating the Node Health Check.

Once configured, the NHC Operator detects node failures, drains unhealthy nodes after the specified duration, and triggers the rescheduling of pods, including VMs, to healthy nodes.

Manually trigger VM failover

If automated remediation is not feasible, you can manually trigger VM failover by draining and removing the unavailable node.

  1. Drain the unavailable node to safely evict workloads from it:

    oc drain <down-node>
  2. Delete the node from the OCP cluster:

    oc delete node <down-node>
  3. Wait for the pods and VMs to terminate and restart on a healthy node.

After rescheduling, the KubeVirt VMs will return to a Ready state.