Manage Shared Block Device (RWX Block) for KubeVirt VMs
The Portworx RWX Block volume is designed and qualified only for use with the KubeVirt VM use case. It isn't intended for use as a generic ReadWriteMany (RWX) block volume type in other scenarios. Contact support to confirm suitability for other use cases (outside of the KubeVirt VM) before deployment.
KubeVirt is an extension for Kubernetes that offers the flexibility to manage traditional VM-based workloads alongside modern containerized applications within a Kubernetes framework.
Portworx supports volumes for virtual machines (VMs) as storage disks in various configurations. To enable OpenShift features such as live migration, these volumes must support the ReadWriteMany (RWX) access mode. Portworx supports the RWX block volume type, which offers improved performance in certain scenarios, and also supports the RWX file system (FS) volume type.
Portworx does not support the following configurations with Shared Raw Block for KubeVirt VMs:
- Synchronous disaster recovery
- Asynchronous disaster recovery for virtual machines (VMs) on Portworx Raw Block volumes that were migrated from another environment, such as VMware
Prerequisites
- An OpenShift cluster that supports KubeVirt.
- OpenShift Virtualization is enabled.
- Ensure that you are running the latest versions of OpenShift Virtualization and the Migration Toolkit for Virtualization Operator that are compatible with your current OpenShift version. Failing to do so may result in issues with live migration operations and other functionalities.
- Review the Known issues.
Create a StorageClass
Create a StorageClass if one does not already exist. The following is an example StorageClass.
OpenShift Virtualization (OSV) versions 4.18.3 and earlier have a known issue which handle discards incorrectly when used with Portworx block devices. As a workaround, you can disable discards for the Portworx volume by including the parameter nodiscard: true
in the StorageClass.
-
Create the
px-kubevirt-sc.yaml
file:StorageClassapiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: px-rwx-block-kubevirt
provisioner: pxd.portworx.com
parameters:
repl: "3"
nodiscard: "true" # Disables discard operations on the block device to help avoid known compatibility issues on OpenShift Container Platform (OCP) versions 4.18 and earlier.
volumeBindingMode: Immediate
allowVolumeExpansion: true -
Run the following command to apply your StorageClass:
oc apply -f px-kubevirt-sc.yaml
Create a PVC
-
Using the above StorageClass, define a PVC with the following configuration:
accessModes
:ReadWriteMany
for shared volume access.volumeMode
:Block
to create a raw block device volume
PersistentVolumeClaimapiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rwx-disk-1
labels:
portworx.io/app: kubevirt
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
storageClassName: px-rwx-block-kubevirt
volumeMode: Block -
Apply this PVC to your cluster:
kubectl apply -f pvc.yaml
When deploying KubeVirt VMs, reference the PVC created in the previous step to attach the Portworx RWX raw block volume to the VM. Ensure the VM configuration specifies the correct StorageClass and volume.
Once the VM is running with the specified PVC, you can perform live migration using OpenShift's native functionality. The shared RWX volume ensures data consistency during the migration process by allowing simultaneous read/write operations for the source and destination nodes.
If you are manually creating a PVC and attaching it to a VM, ensure the following:
- Add the
portworx.io/app: kubevirt
annotation to the PVC spec. This ensures that Portworx will apply KubeVirt-specific logic when processing the volume. - Maintain the same HA or replication factor for all volumes associated with a VM.
VM configuration guidelines for Portworx raw block volumes
When using Portworx RWX block volumes VMs, specific configurations are required to ensure compatibility and performance. The following guidance outlines considerations for root and data disks, block sizes, and bootloaders.
Block size and bootloader compatibility
The VM disk configuration defaults to a 512-byte block size. The hypervisor makes it compatible so that 512-byte operations work over a provisioned Portworx storage disk with a 4096-byte block size. Therefore, no changes are needed because block size compatibility is ensured.
- Portworx block volumes always use a 4096-byte block size.
- VM disks default to a 512-byte block size unless otherwise specified. Specifying a logical block size of 512-byte and a physical block size of 4096-byte in the VM disk specification is an optional configuration detail within the VM that may help applications or file systems optimize performance, if supported.
- VM root disks also contain a bootloader, which can be either EFI or BIOS. BIOS supports booting only from disks with a 512-byte block size. EFI supports booting from disks with either a 4096-byte or 512-byte block size. They are independent bootloading mechanisms. The root disk configuration determines which bootloader is used.
- You can identify the root disk configuration by examining the QCOW2 image or the disk partition table.
- EFI requires an EFI system partition, a GPT partition table, and related components.
- BIOS requires a partition marked as bootable.
- For example, RHEL configures its QCOW2 cloud images to boot using both EFI and BIOS by creating two partitions that contain the required information. Note that not all distributions support this configuration.
VM spec example with custom block size
...
spec:
domain:
devices:
disks:
- bootOrder: 1
blockSize:
custom:
logical: 512
physical: 4096
disk:
bus: virtio
name: rootdisk
- name: fio-data-disk-1
blockSize:
custom:
logical: 4096
physical: 4096
...
- Set
bootOrder: 1
to indicate the root disk. - If
blockSize
is not specified, the default is 512 for both logical and physical sizes. - For additional data disks, specifying both logical and physical block sizes as 4096 is recommended for improved performance.
VM bootloader spec example
Portworx supports both UEFI
and BIOS
bootloader.
- To use UEFI, define the bootloader section with
efi: {}
. - If no
bootloader
section is present, BIOS is used by default.
...
spec:
domain:
firmware:
bootloader:
efi: {}
...
Supported configuration matrix
VM Root disk
S.No | Portworx block size | VM physical block size | VM logical block size | Bootloader (root disk only) | Supported |
---|---|---|---|---|---|
1 | 4096 | 4096 / 512 | 512 | BIOS | Yes |
2 | 4096 | 4096 / 512 | 512 | EFI | Yes |
3 | 4096 | 4096 | 4096 | BIOS | No |
4 | 4096 | 4096 | 4096 | EFI only (qcow2 with 4K block size) | Yes |
Additional disks
S.No | Portworx block size | VM physical block size | VM logical block size | Supported |
---|---|---|---|---|
1 | 4096 | 4096 | 512 | Yes |
2 | 4096 | 4096 | 4096 | Yes (Recommended) |
Create a VM
Refer to the applicable version of the OpenShift documentation and KubeVirt user guide to create a KubeVirt VM.
Once the VMs are created, each VM will start running in a virt-launcher
pod.
Manage KubeVirt VMs during Portworx node upgrades
When upgrading Portworx on a node, the Portworx Operator manages KubeVirt VMs by initiating a live migration before the upgrade begins. Here’s what happens during this process:
-
Eviction notice: As the operator attempts to evict virtual machines (VMs) from a node, it generates the following event if it is unable to migrate the VMs:
Warning: UpdatePaused - The update of the storage node <node-name> has been paused because there are 3 KubeVirt VMs running on the node. Portworx will live-migrate the VMs and the storage node will be updated after there are no VMs left on this node.
-
Migration failure: If the operator cannot successfully live-migrate a VM, the upgrade is paused, and the following event is recorded:
Warning: FailedToEvictVM - Live migration <migration-name> failed for VM <vm-namespace\ß>/<vm-name> on node <node-name>. Please stop or migrate the VM manually to continue the update of the storage node.
Known issues
-
A known issue in
libvirt
affects the use of 4K block volumes and may cause VMs to pause due to I/O errors. This issue is resolved in OpenShift Container Platform (OCP) version 4.16 or later. For more information, see Red Hat solution. -
OpenShift Virtualization (OSV) versions 4.18.3 and earlier contain a known issue that incorrectly handles discards when used with Portworx block devices. As a workaround, disable discards on the Portworx volume. If discards are disabled, the StorageClass must include the parameter
nodiscard: true
.The
nodiscard
setting can also be toggled after PVC creation by using the commandpxctl volume update --nodiscard on <your_pvc_name>
. After updating, restart the pod for the change to take effect. -
When VMs are migrated from a VMware environment to a KubeVirt environment with
VolumeMode: Block
, the persistent volume claims (PVCs) may appear to have no free space remaining or to be thick-provisioned. This can increase pool usage and may trigger unnecessary volume expansion. -
Golden image and golden PVC behavior
A golden image is a preconfigured virtual machine disk image used as a standard template to create consistent VMs.
A golden PVC is a pre-provisioned persistent volume claim that contains a golden image. These PVCs are cloned to provision new VMs using a consistent and repeatable method.
- Golden PVCs created via HTTP import can behave like thick-provisioned volumes, consuming more physical storage than the actual image size. Cloned volumes from these PVCs, particularly when using Portworx Raw Block volumes, may exhibit increased capacity usage and degraded performance compared to sharedv4 volumes. To optimize space efficiency, run defragmentation inside the guest VM.
- VMs and their associated PVCs may be scheduled on the same node as the golden PVC. This can lead to resource bottlenecks. To avoid this, maintain multiple golden PVCs.
- If a golden PVC is created with a replication factor of 3 and a node or Portworx restart occurs during VM creation, the resulting cloned PVC may be created with a replication factor of 2. There is no automatic correction, but replication can be manually restored using
repl add
on the affected PVC. - When multiple VMs are created at the same time from a single golden PVC, some VMs may display a "Running" status while the guest operating system is unresponsive. To recover the VM, perform a power cycle using the OpenShift UI or
virtctl
CLI. - when multiple VM disks are cloned from a single template PVC, all the resulting clone volumes reside on the same set of replica nodes. This can create I/O hot spots and degrade performance for those nodes. To reduce this risk, create multiple template PVCs distributed across different nodes and round-robin between them when provisioning new VMs.