Manage storage for KubeVirt VMs in ARO
This feature is under Directed Availability. Please engage with your Portworx representative if you are interested and need to enable it in your environment under the current guidelines.
KubeVirt is an extension for Kubernetes that offers the flexibility to manage traditional VM-based workloads alongside modern containerized applications within a Kubernetes framework.
Portworx provides resources that VMs can use for both their initial startup process and for retaining data even when they are not running. To utilize OpenShift features, such as live migration, these volumes must have the ReadWriteMany
access mode.
Follow the instructions on this page to create a StorageClass, which you can use to create the necessary PVCs.
Prerequisites
- An OpenShift cluster that supports KubeVirt.
- OpenShift Virtualization is enabled.
Create a StorageClass
To ensure PVCs are compatible with KubeVirt virtual machines, they must be configured with the ReadWriteMany
access mode and use NFS version 3.0 with nolock
mount option as shown below in the sharedv4_mount_options
parameter. To meet these requirements, create PVCs from the StorageClass with the following parameters configured:
sharedv4: "true"
sharedv4_mount_options: vers=3.0,nolock
-
Create the
px-kubevirt-sc.yaml
file:apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: portworx-rwx-kubevirt
provisioner: pxd.portworx.com
parameters:
repl: "3"
sharedv4: "true"
sharedv4_mount_options: vers=3.0,nolock
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: truenote- The
volumeBindingMode=WaitForFirstConsumer
flag enables Portworx to intelligently place the volumes. For more information, see the KubeVirt page.- Note that the PVCs used by the VMs directly, should not include the annotation
cdi.kubevirt.io/storage.bind.immediate.requested=true
. This is because such an annotation overrides theWaitForFirstConsumer
setting in the StorageClass.
- Note that the PVCs used by the VMs directly, should not include the annotation
- When migrating Forklift from vSphere to OpenShift Container Platform with Migration Toolkit for Virtualization, use
volumeBindingMode=immediate
for a successful migration.
- The
-
Run the following command to apply your StorageClass:
oc apply -f px-kubevirt-sc.yaml
Portworx optimizes volume placement and access for KubeVirt VMs using principles of hyperconvergence and collocation.
-
Shared volumes for live migration: Live migration requires two
virt-launcher
pods to run simultaneously, with the same volume mounted in both pods. The type of shared volume—either bind-mounted or NFS-mounted—depends on the volume's attachment at thevirt-launcher
pod creation:- Bind-mount: Used when the volume is attached on the same node where the
virt-launcher
pod starts, optimizing performance through hyperconvergence. - NFS-mount: Used when the volume is attached on a different node. Volumes can switch between bind-mount and NFS by live-migrating or restarting the VM.
- Bind-mount: Used when the volume is attached on the same node where the
-
Collocation: Portworx ensures that multiple volumes used by a single KubeVirt VM are placed on the same set of replica nodes. This automatic collocation simplifies achieving hyperconvergence.
Create a PVC
You can create PVCs using one of the following methods, and Portworx will automatically recognize them as KubeVirt volumes. Ensure these PVCs are configured with the RWX
access mode:
- Virtualization tab in the OpenShift web console
- Konveyor Forklift or Migration Toolkit for Virtualization
- Containerized data importer's (CDI) DataVolume
Once PVCs are created, run the following command to verify if they have the RWX
access mode:
oc get pvc -n <vm-namespace>
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
<your-kubevirt-pvc> Bound pvc-xxxx-xxx-xxx 1Gi RWX portworx-rwx-kubevirt 15h
The output should show the PVCs with the RWX
access mode.
If you are creating a PVC using some other mechanism, then ensure the following:
- Add the
portworx.io/app: kubevirt
annotation to the PVC spec. This ensures that Portworx will apply KubeVirt-specific logic when processing the volume. - Maintain the same HA or replication factor for all volumes associated with a VM.
Create a VM
Refer to the applicable version of the OpenShift documentation to create a KubeVirt VM.
For OpenShift version 4.14 or newer, add the usePopulators=false
annotation to your VM spec, as shown below:
dataVolumeTemplates:
- metadata:
annotations:
cdi.kubevirt.io/storage.usePopulator: "false"
Once the VMs are created, each VM will start running in a virt-launcher
pod.
KubeVirt facilitates the live migration of VMs with Portworx ReadWriteMany
volumes. However, the underlying libvirtd lacks this capability, prohibiting such live migrations. To address this, the Stork webhook controller modifies the virt-launcher
pod manifest. It achieves this by inserting a special shared library through LD_PRELOAD. This library intercepts the statfs()
system call made by libvirtd when accessing a Portworx volume. Here is the code of this shared library.
Portworx ensures:
- The newly created VMs (even with operators such as Konveyor Forklift or Migration toolkit for Virtualization), have their volumes collocated during creation. Stork will schedule the VMs on nodes where volume replicas exist, making the VMs hyperconverged (bind mounted).
- During planned node maintenance, OpenShift will live-migrate the VMs out of that node. When OpenShift reboots the node, Portworx will perform a sharedv4 service (NFS) failover, and as part of this failover, it will live-migrate the VMs to ensure they are hyperconverged once again.
- Existing VMs with non-collocated volumes will be identified and corrected by a background job.
Manage KubeVirt VMs during Portworx node upgrades
When upgrading Portworx on a node, the Portworx Operator manages KubeVirt VMs by initiating a live migration before the upgrade begins. Here’s what happens during this process:
-
Eviction notice: As the operator attempts to evict VMs from a node, it generates an event on the storage node stating:
Warning: UpdatePaused - The update of the storage node <node-name> has been paused because there are 3 KubeVirt VMs running on the node. Portworx will live-migrate the VMs, and the update will proceed once no VMs remain on this node.
-
Migration failure: If the operator cannot successfully live-migrate a VM, the upgrade is paused, and the following event is recorded:
Warning: FailedToEvictVM - Live migration <migration-name> failed for VM <vm-namespace>/<vm-name> on node <node-name>. Please stop or migrate the VM manually to continue the update of the storage node.
Manage KubeVirt VMs with adjusted filesystem overhead
When creating VMs from templates on Portworx filesystem volumes, you might encounter an error due to insufficient filesystem overhead. To resolve this issue, follow the steps below to increase the overhead:
When you add a virtual machine disk to a PVC that uses the filesystem volume mode, you must ensure that there is enough space on the PVC for the VM disk and for file system overhead, such as metadata. By default, OpenShift Virtualization reserves 5.5% of the PVC space for overhead, reducing the space available for virtual machine disks by that amount. You can configure a different overhead value by editing the HyperConverged Operator (HCO) object.
Prerequisite
- Install the OpenShift CLI (
oc
).
The following procedure explains how to change the default file system overhead value to 8%:
-
Edit the HCO object:
oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
-
Populate the fields to set the overhead to 8%. For example:
spec:
filesystemOverhead:
global: "0.08"
storageClass:
<storage_class_name>: "0.08"where:
global
: The default file system overhead percentage used for any storage classes that do not already have a set value. Settingglobal: "0.08"
reserves 8% of the PVC for file system overhead.<storage_class_name>
: If you want to set the overhead for a specific storage class to 8%, replace <storage_class_name> with the name of your storage class.
-
Save and exit the editor to update the HCO object.
-
Verify changes to CDIConfig:
oc get cdiconfig -o yaml
-
View your specific changes to CDIConfig:
oc get cdiconfig -o jsonpath='{.items..status.filesystemOverhead}'
By following these steps, you can adjust the filesystem overhead to 8%, ensuring that there is enough space on the PVC for the VM disk and file system overhead. For further details, refer to the Red Hat article.
Opt out of Live Migration
If you prefer not to have the operator live-migrate VMs during the Portworx upgrade, add the operator.libopenstorage.org/evict-vms-during-update: false
annotation to the StorageCluster object. This annotation can also be used to resume an update that has been paused due to the presence of KubeVirt VMs.
This feature is designed to upgrade one node at a time, which is the default setting. Upgrading multiple nodes simultaneously can lead to VMs being paused or restarted, and may stall the upgrade process.
Further Reading
For more details on virtual machine live migration, refer to the Virtual machine live migration documentation.