Failback an application in OpenShift vSphere for synchronous DR
Failback is the process of moving the application and its data back to the source cluster once the source cluster is restored and operational again.
Once your unhealthy Kubernetes cluster is back up and running, the Portworx nodes in that cluster will not immediately rejoin the cluster. They will stay in
Out of Quorum
state until you explicitly Activate this cluster domain.
After this domain is marked as Active you can failback the applications if you want.
The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:
- Source Cluster is the Kubernetes cluster which is down and where your applications were originally running. The cluster domain for this source cluster is
us-east-1a
. - Destination Cluster is the Kubernetes cluster where the applications will be failed over. The cluster domain for this destination cluster is
us-east-1b
. - The Zookeeper application is being failed over to the destination cluster.
Prerequisite
-
You must ensure that Stork version 24.2.0 or newer is installed on both the source and destination clusters.
noteIf you are using a Stork version prior to 24.2.0, then you can follow this procedure to perform a failback.
-
For operators deployed from the OpenShift OperatorHub, ensure that the operator and applications are in scaled down state on the destination cluster. Stork will leverage
spec.replicas
from most of the standard Kubernetes controllers such as Deployments, StatefulSets, and so on. However, for applications managed by an Operator, anApplicationRegistration
CR needs to be created which provides Stork with the necessary information required to perform a scale down of the application.For more information, see ApplicationRegistrations.
Create a reverse ClusterPair
Skip this section if you have created a bidirectional ClusterPair, and move to the next section.
You need to create a reverse ClusterPair if you had initially paired your clusters in a unidirectional manner (from source to destination), and now you should establish a pairing from the destination cluster back to the source cluster. The reverse ClusterPair enables reverse communication between the clusters (from destination to source), allowing for failback.
Run the following command from your destination cluster to create a reverse ClusterPair:
storkctl create clusterpair reverse-migration-cluster-pair \
--namespace <migrationnamespace> \
--src-kube-file <destination-kubeconfig-file> \
--dest-kube-file <source-kubeconfig-file> \
--mode sync-dr \
--unidirectional
Ensure to provide the destination kubeconfig file with src-kube-file
and source destination kubeconfi file with dest-kube-file
as mentioned in the above command.
Reactivate your source cluster domain
Once your source cluster is operational, perform the following steps from your destination cluster to activate your source cluster domain:
-
Run the following command to activate the source cluster domain:
storkctl activate clusterdomain us-east-1a
Cluster Domain activate operation started successfully for us-east-1a
-
Verify if the source cluster domain is activated:
storkctl get clusterdomainsstatus
NAME LOCAL-DOMAIN ACTIVE INACTIVE CREATED
px-dr-cluster us-east-1a us-east-1a (InSync), us-east-1b (InSync) 29 Nov 22 22:09 UTC
Reverse sync your clusters
If the destination cluster has been running applications for some time, it is possible that the state of your application on the destination cluster differs from your source cluster. This is due to the creation of new resources or changes in data within stateful applications on the destination cluster.
It is recommended to perform one migration from destination cluster to your source cluster before failing back your applications, so that you have the most up-to-date applications on your original source cluster.
As both of your clusters are accessible, follow the instructions to configure a reverse migration schedule:
-
Create a schedule policy on your destination cluster using the instructions in the Create a schedule policy section.
-
Create a migration schedule on your destination cluster using the
storkctl create migrationschedule
command. For more information on how to use the command, see Create MigrationSchedule with storkctl.
For operators deployed from the OpenShift OperatorHub, create a migration schedule with the --exclude-resource-types
flag to exclude operator-related resources, as shown in the following example:
storkctl create migrationschedule -c cluster-pair -n zookeeper reverse-migration-schedule --exclude-resource-types ClusterServiceVersion,operatorconditions,OperatorGroup,InstallPlan,Subscription --exclude-selectors olm.managed=true
Perform failback
You can perform a failback using the storkctl perform failback -m <reverse-migration-schedule> -n <reverse-migration-schedule-namespace>
command.
You can also use one of the following flags to include or exclude a specific subset of the namespace for the migration, but not both at the same time.
--include-namespaces
- Includes a subset of namespaces for the migration.--exclude-namespaces
- Excludes a subset of namespaces for the migration.
To start the failback operation, run the following command in the destination cluster:
storkctl perform failback -m <reverse-migration-schedule> -n <reverse-migration-schedule-namespace>
Example:
storkctl perform failback -m reverse-migration-schedule -n zookeeper
Started failback for MigrationSchedule zookeeper/reverse-migration-schedule
To check failback status use the command : `storkctl get failback failback-reverse-migration-schedule-2024-05-21-115006 -n zookeeper`
Check failback status
Run the following command to check the status of the failback operation. You can get the failback-action-name
from the output of the storkctl perform failback
command.
storkctl get failback <failback-action-name> -n <reverse-migration-schedule-namespace>
Example:
storkctl get failback failback-reverse-migration-schedule-2024-05-21-115006 -n zookeeper
NAME CREATED STAGE STATUS MORE INFO
failback-reverse-migration-schedule-2024-05-21-115006 21 May 24 11:50 UTC Completed Successful Scaled up Apps in : 1/1 namespaces
If the status is failed, you can use the oc describe actions <failback-action-name> -n <reverse-migration-schedule-namespace>
command to get more information about the failure.
Verify volumes and Kubernetes resources are migrated
To verify the volumes and Kubernetes resources that are migrated to the source cluster, run the following command:
oc get all -n <reverse-migration-schedule-namespace>
Example:
oc get all -n zookeeper
NAME READY STATUS RESTARTS AGE
pod/zk-544ffcc474-6gx64 1/1 Running 0 18h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/zk-service ClusterIP 10.233.22.60 <none> 3306/TCP 18h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/zk 1/1 1 1 18h
NAME DESIRED CURRENT READY AGE
replicaset.apps/zk-544ffcc474 1 1 1 18h