Version: 3.3

Set up Disaster Recovery

Disaster Recovery (DR) is a process that ensures the availability and recoverability of services and resources within a cluster in the event of a disaster. When you implement a DR strategy provided by Portworx, it mitigates or minimizes data loss caused by unforeseen incidents that can disrupt business operations. The goal is to swiftly restore the operational status of a cluster, enabling access to data as soon as possible after a disaster occurs.

You can easily manage the failover and failback of your applications. Failover is the process of migrating an application or workload from your source cluster to a destination cluster in the event of a failure or disruption in the source cluster. Failback is the process of moving the application and its data back to the source cluster once the source cluster is restored and operational again.

Types of disaster recovery

This section describes the synchronous and asynchronous methods for achieving Disaster Recovery (DR) between multiple clusters when using Portworx.

Synchronous DR

Portworx metro overview

Synchronous DR involves the immediate replication of any changes made to data or applications on the source cluster to the destination cluster.

In a Synchronous DR setup, a single Portworx cluster spans both the source and destination clusters. This is achieved by providing a common external key-value store endpoint during the installation of Portworx on all clusters. Once a single stretched Portworx cluster is established, volumes are automatically replicated across the clusters. This ensures consistent and up-to-date data on both the source and destination clusters.

important

Cluster-wide operators are not migrated as part of a DR migration if they are not installed in the same namespace as the applications you want to migrate (for example, in OpenShift, the operator installation defaults to the openshift-operators namespace). As a result, after migration, you will not be able to scale up or down your applications on the destination cluster using storkctl.
Synchronous DR is supported on the following platforms with vSphere cloud drive configurations:
- Tanzu Kubernetes Grid Service (TKGS)
- Vanilla Kubernetes
- OpenShift deployed on vSphere or Bare Metal
Synchronous DR is supported with local drive on the following platforms:
- OpenShift on Bare Metal
- RKE2 platform
In a Synchronous DR setup:
- The maximum supported replication factor is 2.
- The sharedv4 service volumes are not supported in the Portworx cluster.
- KubeVirt VMs are not supported.

With this setup, the Recovery Point Objective (RPO) is zero, and the Recovery Time Objective (RTO) is less than 60 seconds.

You should consider this method when:

Your source and destination clusters are within the same Metropolitan Area Network (MAN), including:
- The same cloud region, i.e on-prem environments (potentially in different zones).
- The same datacenter or datacenters located within a 50-mile proximity.
Network latency between the nodes remains under 10 ms. This low latency requirement ensures the seamless synchronization and replication of data between the source and destination clusters.

Synchronous DR

Asynchronous DR

Portworx Scheduled migration overview

Asynchronous DR involves replicating data from a source cluster to a destination cluster with a delay between the data changes occurring on the source cluster and their replication to the destination cluster.

important

Cluster-wide operators are not migrated during a DR migration unless they are installed in the same namespace as the applications you want to migrate.
- For OpenShift, the operator installation defaults to the openshift-operators namespace.
- For Kubernetes, ensure operators are installed in the same namespace as your applications. As a result, after migration, you will not be able to scale up or down your applications on the destination cluster using storkctl.
Disaster recovery is not supported for FlashArray Direct Access and FlashBlade Direct Access volumes. Do not configure migration schedules for namespaces containing these volumes.

In an Asynchronous DR setup, a separate Portworx cluster is installed on each Kubernetes cluster. This method can be used in a heterogeneous environment. For volume replications, you need to create migration schedules to migrate applications and volumes between the clusters that are paired.

With this setup, the Recovery Point Objective (RPO) is 15 minutes and the Recovery Time Objective (RTO) is less than 60 seconds.

You should consider this setup when:

Nodes in all your clusters are in the different regions or datacenter.
The network latency between the nodes is higher than 10 ms.

Asynchronous DR

Types of disaster recovery​

Synchronous DR​

Asynchronous DR​

Types of disaster recovery

Synchronous DR

Asynchronous DR