Skip to main content
Version: 3.5

Failover an application

In the event of a disaster, when one of your Kubernetes clusters becomes inaccessible, you have the option to failover the applications running on it to an operational Kubernetes cluster.

The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:

  • Source Cluster is the Kubernetes cluster which is down and where your applications were originally running.
  • Destination Cluster is the Kubernetes cluster where the applications will be failed over.
  • The Zookeeper application is being failed over to the destination cluster.

Follow the instructions on this page to perform a failover of your applications to the destination cluster. These instructions apply to both scenarios, whether it is a controlled failover or a disaster recovery.

important

Portworx supports failover operations at the namespace level. As a result:

  • You cannot perform individual application or VM-level failover within a namespace.
  • You can failover a set of VMs from the ones migrated, with label based selection. See: Performing selective failovers

Prerequisites

  • You must ensure that Stork version 24.2.0 or newer is installed on both the source and destination clusters.

    note

    If you are using a Stork version prior to 24.2.0, then you can follow this procedure to perform a failover.

  • For operators deployed from the OpenShift OperatorHub:

    • Ensure that the operator is deployed on both the source and the destination clusters in the application namespace.

    • Ensure that the migration schedule is created with the --exclude-resource-types flag to exclude operator-related resources, as shown in the following example:

      storkctl create migrationschedule -c cluster-pair -n zookeeper migration-schedule --exclude-resource-types ClusterServiceVersion,operatorconditions,OperatorGroup,InstallPlan,Subscription --exclude-selectors olm.managed=true
    • Ensure that the operator and applications are in scaled down state on the source cluster. Stork will leverage spec.replicas from most of the standard Kubernetes controllers such as Deployments, StatefulSets, and so on. However, for applications managed by an Operator, an ApplicationRegistration CR needs to be created which provides Stork with the necessary information required to perform a scale down of the application. For more information, see the Application Registration document.

Perform failover

In the event of a disaster, you can migrate an application or workload from the source cluster to destination cluster by running the storkctl perform failover command.

Perform selective failover

You can use the following flags to include or exclude specific namespaces or resources during the failover operation, giving you greater control over what is being restored.

  • Namespace filters

    • --include-namespaces - Include only a specific subset of namespaces for the migration.
    • --exclude-namespaces - Exclude specific namespaces from the migration.
  • Resource filters

    • --selectors - Activate/deactivate only resources with the specified labels.
    • --exclude-selectors - Exclude resources with the specified labels from activation/deactivation.
    • --resource-types - Filter resources by type. Format: KIND or GROUP/VERSION/KIND (e.g., apps/v1/Deployment). Could be specified multiple times.
note

You cannot use --include-namespaces and --exclude-namespaces together, neither can you use --selectors and --exclude-selectors together, or you'll get an error.

storkctl perform failover --selectors=app=busybox --exclude-selectors=app=px-mongo-mongodb
error: can provide only one of --selectors or --exclude-selectors values at once
Toggle if your clusters are paired in a unidirectional manner?
If yes, you must use the --skip-source-operations flag to skip the source cluster operations.

To start the failover operation, run the following command in the destination cluster:

storkctl perform failover -m <migration-schedule> -n <migration-schedule-namespace>

Example:

storkctl perform failover -m migration-schedule -n zookeeper
Started failover for MigrationSchedule zookeeper/migration-schedule
To check failover status use the command : `storkctl get failover failover-migration-schedule-2024-05-20-140139 -n zookeeper`

Examples with filtering:

Namespace filtering:

# Perform failover for only specific namespaces
storkctl perform failover -m migration-schedule -n zookeeper --include-namespaces=app-ns1,app-ns2
# Perform failover for all namespaces except specific ones
storkctl perform failover -m migration-schedule -n zookeeper --exclude-namespaces=test-ns

Resource filtering:

# Perform failover for only the resources with the label app=busybox
storkctl perform failover -m migration-schedule -n zookeeper --selectors=app=busybox
# Perform failover for all resources except the ones with the label app=busybox
storkctl perform failover -m migration-schedule -n zookeeper --exclude-selectors=app=busybox
# Perform failover for only StatefulSet resources
storkctl perform failover -m migration-schedule -n zookeeper --resource-types=StatefulSet

Perform failover for only Deployment and CronJob resources

storkctl perform failover -m migration-schedule -n zookeeper --resource-types Deployment --resource-types batch/v1/CronJob

storkctl perform failover -m <migration-schedule> -n <migration-schedule-namespace>

If the last migration status is in the PartialSuccess state, you'll be prompted to proceed with the failover operation. To bypass this prompt, use the --force flag.

Example:

storkctl perform failover -m migration-schedule -n zookeeper
Started failover for MigrationSchedule zookeeper/migration-schedule
To check failover status use the command : `storkctl get failover failover-migration-schedule-2024-05-20-140139 -n zookeeper`

Examples with filtering:

Namespace filtering:

# Perform failover for only specific namespaces
storkctl perform failover -m migration-schedule -n zookeeper --include-namespaces=app-ns1,app-ns2
# Perform failover for all namespaces except specific ones
storkctl perform failover -m migration-schedule -n zookeeper --exclude-namespaces=test-ns

Resource filtering:

# Perform failover for only the resources with the label app=busybox
storkctl perform failover -m migration-schedule -n zookeeper --selectors=app=busybox
# Perform failover for all resources except the ones with the label app=busybox
storkctl perform failover -m migration-schedule -n zookeeper --exclude-selectors=app=busybox
# Perform failover for only StatefulSet resources
storkctl perform failover -m migration-schedule -n zookeeper --resource-types=StatefulSet
# Perform failover for only Deployment and CronJob resources
storkctl perform failover -m migration-schedule -n zookeeper --resource-types Deployment --resource-types batch/v1/CronJob

Perform additional selective failovers

If a partial failover has been performed and you want to initiate another failover for the remaining applications or VMs, you must use a new MigrationSchedule instead of reusing the one referenced in the earlier failover.

  1. Delete the MigrationSchedule that was referenced by the first failover from both source and destination clusters.

  2. Create a new MigrationSchedule on the source cluster.

  • Use --exclude-selectors to exclude applications/VMs (and their volumes) that are already failed over.

  • Optionally, you can use --selectors to migrate only a specific subset of remaining resources.

  1. Perform the additional failover by referencing the newly created MigrationSchedule.

Why a new MigrationSchedule is required

  • Prevents conflicts with resources that are already failed over and activated on the destination cluster.

  • Ensures isolation so that ongoing and last-mile migration only applies to the remaining (non-failed-over) applications/VMs and their volumes.

  • Allows consecutive failovers to proceed smoothly without impacting workloads that have already been failed over.

Using a fresh MigrationSchedule isolates the migration scope to the remaining workloads, providing better control and preventing unintended interference with already migrated resources.

Check failover status

Run the following command to check the status of the failover operation. You can refer to the above section to get the value of failover-action-name.

storkctl get failover <failover-action-name> -n <migration-schedule-namespace>

Example:

storkctl get failover failover-migration-schedule-2024-05-20-140139 -n zookeeper
NAME                                    CREATED               STAGE       STATUS       MORE INFO
failover-migration-schedule-2024-05-20-140139 20 May 24 14:02 UTC Completed Successful Scaled up Apps in : 1/1 namespaces

If the status is failed, you can use the following command to get more information about the failure:

kubectl describe actions <failover-action-name> -n <migration-schedule-namespace>

Verify volumes and Kubernetes resources are migrated

To verify the volumes and Kubernetes resources that are migrated to the destination cluster, run the following command:

kubectl get all -n <migration-schedule-namespace>

Example:

kubectl get all -n zookeeper
NAME                     READY   STATUS    RESTARTS   AGE
pod/zk-544ffcc474-6gx64 1/1 Running 0 18h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/zk-service ClusterIP 10.233.22.60 <none> 3306/TCP 18h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/zk 1/1 1 1 18h

NAME DESIRED CURRENT READY AGE
replicaset.apps/zk-544ffcc474 1 1 1 18h