Version: 3.1

Failover an application within asynchronous DR in ARO

In the event of a disaster, when one of your Kubernetes clusters becomes inaccessible, you have the option to failover the applications running on it to an operational Kubernetes cluster.

The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:

Source Cluster is the Kubernetes cluster which is down and where your applications were originally running.
Destination Cluster is the Kubernetes cluster where the applications will be failed over.
The Zookeeper application is being failed over to the destination cluster.

Follow the instructions on this page to perform a failover of your applications to the destination cluster. These instructions apply to both scenarios, whether it is a controlled failover or a disaster recovery.

Prerequisites

You must ensure that Stork version 24.2.0 or newer is installed on both the source and destination clusters.

note
If you are using a Stork version prior to 24.2.0, then you can follow this procedure to perform a failover.
For operators deployed from the OpenShift OperatorHub:
- Ensure that the operator is deployed on both the source and the destination clusters in the application namespace.
- Ensure that the migration schedule is created with the --exclude-resource-types flag to exclude operator-related resources.
  
  Example:
```
storkctl create migrationschedule -c cluster-pair -n zookeeper migration-schedule --exclude-resource-types ClusterServiceVersion,operatorconditions,OperatorGroup,InstallPlan,Subscription --exclude-selectors olm.managed=true
```
- Ensure that the operator and applications are in scaled down state on the source cluster. Stork will leverage spec.replicas from most of the standard Kubernetes controllers such as Deployments, StatefulSets, and so on. However, for applications managed by an Operator, an ApplicationRegistration CR needs to be created which provides Stork with the necessary information required to perform a scale down of the application.
  For more information, see ApplicationRegistrations.

Perform failover

Disaster recovery
Controlled failover

In the event of a disaster, you can migrate an application or workload from the source cluster to destination cluster by running the storkctl perform failover command.

You can include or exclude a subset of namespaces for migration by using one of the following flags:

--include-namespaces - Use this flag to include a subset of namespaces for the migration.
--exclude-namespaces - Use this flag to exclude a subset of namespaces for the migration.

The --include-namespaces and --exclude-namespaces flags are mutually exclusive.

Are your clusters paired in a unidirectional manner? (Click to expand for more details)

If yes, you must use the --skip-source-operations flag to skip the source cluster operations.

To start the failover operation, run the following command in the destination cluster:

storkctl perform failover -m <migration-schedule> -n <migration-schedule-namespace>

Example:

$ storkctl perform failover -m migration-schedule -n zookeeper

Started failover for MigrationSchedule zookeeper/migration-schedule
To check failover status use the command : `storkctl get failover failover-migration-schedule-2024-05-20-140139 -n zookeeper`

If your source cluster is accessible and you want to migrate an application or workload from the source cluster to destination cluster, you can perform a controlled failover by running the storkctl perform failover command.

By default, Portworx scales down the resources and suspends the migration schedule in the source cluster, ensuring data consistency. If you just want the apps to come up on the destination cluster and do not need data consistency between the source and destination clusters, you can use the --skip-source-operations flag to skip source cluster operations.

You can include or exclude a subset of namespaces for migration by using one of the following flags:

--include-namespaces - Use this flag to include a subset of namespaces for the migration.
--exclude-namespaces - Use this flag to exclude a subset of namespaces for the migration.

The --include-namespaces and --exclude-namespaces flags are mutually exclusive.

Are your clusters paired in a unidirectional manner? (Click to expand for more details)

If yes, you must create a reverse ClusterPair in the destination-to-source direction with the same name as the ClusterPair in the source-to-destination direction to ensure data consistency between the clusters.

To start the failover operation, run the following command in the destination cluster:

storkctl perform failover -m <migration-schedule> -n <migration-schedule-namespace>

Example:

$ storkctl perform failover -m migration-schedule -n zookeeper

Started failover for MigrationSchedule zookeeper/migration-schedule
To check failover status use the command : `storkctl get failover failover-migration-schedule-2024-05-20-140139 -n zookeeper`

Check failover status

Run the following command to check the status of the failover operation. You can refer to the above section to get the value of failover-action-name.

storkctl get failover <failover-action-name> -n <migration-schedule-namespace>

Example:

$ storkctl get failover failover-migration-schedule-2024-05-20-140139 -n zookeeper

NAME                                    CREATED               STAGE       STATUS       MORE INFO
failover-migration-schedule-2024-05-20-140139       20 May 24 14:02 UTC   Completed   Successful   Scaled up Apps in : 1/1 namespaces

If the status is failed, you can use the oc describe actions <failover-action-name> -n <migration-schedule-namespace> command to get more information about the failure.

Verify volumes and Kubernetes resources

To verify the volumes and Kubernetes resources that are migrated to the destination cluster, run the following command:

oc get all -n <migration-schedule-namespace>

Example:

$ oc get all -n zookeeper

NAME                     READY   STATUS    RESTARTS   AGE
pod/zk-544ffcc474-6gx64   1/1     Running   0          18h

NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/zk-service   ClusterIP   10.233.22.60   <none>        3306/TCP   18h

NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/zk   1/1     1            1           18h

NAME                            DESIRED   CURRENT   READY   AGE
replicaset.apps/zk-544ffcc474   1         1         1       18h

Prerequisites​

Perform failover​

Check failover status​

Verify volumes and Kubernetes resources​

Prerequisites

Perform failover

Check failover status

Verify volumes and Kubernetes resources