Skip to main content
Version: 3.2

Import data into Portworx PVCs in GCP Anthos

In modern Kubernetes-based infrastructure, data migration and application deployment are critical tasks. This document provides a step-by-step guide on how to import an application data from a PVC backed by a non Portworx storage driver onto PVCs created by Portworx.

To import data into a Portworx PVC, Stork will use rsync to copy the data from an existing PVC into a PVC backed by Portworx. Stork will run a Kubernetes Job which runs the rsync command inside a container. This can be useful if you’re a new onboarding customer who was previously using a different storage provider, and who now needs to import data from non-Portworx PVCs into Portworx PVCs.

Prerequisites

  • A Kubernetes cluster set up
  • Portworx deployed on this cluster with Stork version 23.8.0 or higher

Import an application and its data onto PVCs

  1. Define a StorageClass and PVC to set up Portworx storage.

    Create a Portworx PVC using the px-csi-db StorageClass. This StorageClass would be already created for you when you installed Portworx. This is the PVC into which you will be importing data into.

    kubectl create -f destination-pvc.yaml

    destination-pvc.yaml

    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
    name: postgres-data
    labels:
    app: postgres
    spec:
    storageClassName: px-csi-db
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
    storage: 5Gi
  2. Scale down the application replicas to 0 to avoid data conflicts during migration.

    Stork supports importing only offline data import. Scaling down the application using the non Portworx PVC ensures the data stays consistent as we import it into a Portworx PVC.

    kubectl scale --replicas=0 <deployment>/<your-application-name>

    Replace <deployment>/<your-application-name> in the above command with the appropriate resource.

  3. Create a DataExport object specifying the source and destination of the data import.

    The DataExport CR is the main driver for triggering the import between a non Portworx PVC (source) and the Portworx PVC (destination). Both these PVCs are provided in the DataExport CR specification.

    kubectl create -f dataexport.yaml

    dataexport.yaml

    apiVersion: kdmp.portworx.com/v1alpha1
    kind: DataExport
    metadata:
    name: postgres-export
    namespace: default
    spec:
    type: rsync
    source:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pgbench-data
    namespace: default
    destination:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: postgres-data
    namespace: default
  4. Monitor the progress of the data export using kubectl describe.

    Following are the sample outputs for a data export process:

    In progress

    Spec:
    Destination:
    API Version: v1
    Kind: PersistentVolumeClaim
    Name: postgres-data
    Namespace: default
    Source:
    API Version: v1
    Kind: PersistentVolumeClaim
    Name: pgbench-data
    Namespace: default
    Type: rsync
    Status:
    Reason:
    Stage: TransferInProgress
    Status: InProgress
    Transfer ID: default/import-rsync-pgbench-data
    Events: <none>

    Completed

    Spec:
    Destination:
    API Version: v1
    Kind: PersistentVolumeClaim
    Name: postgres-data
    Namespace: default
    Source:
    API Version: v1
    Kind: PersistentVolumeClaim
    Name: pgbench-data
    Namespace: default
    Type: rsync
    Status:
    Progress Percentage: 100
    Stage: Final
    Status: Successful
    Transfer ID: default/import-rsync-pgbench-data
    Events:
  5. Update the application's deployment configuration to use the Portworx PVC.

    This section uses kubectl edit to modify your existing application to use the newly created Portworx PVC into which data has been imported. Based on your deployment model, you will need to change the application specifications to use the new Portworx PVC.

    kubectl edit <deployment> <your-application-namespace>
  6. Restore the application to its desired replica count:

    kubectl scale --replicas=1 <deployment>/<your-application-name>

    Replace <deployment>/<your-application-name> in steps 6 and 7 with the appropriate resource.

Additional options

This section provides options for customization, such as specifying a custom Docker registry, using image pull secrets, and tweaking rsync flags. You should provide these options to Stork through environment variables, which you can configure in the StorageCluster specification.

When using custom docker registry

In cases where a custom Docker registry is employed, Stork needs to use such a registry while initiating the job which runs the rsync process. To customize the rsync image name, you can update the following environment variable in the StorageCluster specification:

stork:
enabled: true
env:
- name: KDMP_RSYNC_IMAGE
value: <custom-registry>/eeacms/rsync:<tag>

This allows you to specify a unique image location from your custom Docker registry.

When using Image Pull Secrets

The rsync operation runs inside the container eeacms/rsync. If you require the use of Image Pull Secrets to pull this image, you can provide the Kubernetes secret name as an environment variable. You should establish these image pull secrets within the same namespaces where Stork is deployed. You can manage this configuration as an environment variable in the StorageCluster specification, which you defined during Step 1 of the Import an application and its data onto PVCs section above.

stork:
enabled: true
env:
- name: KDMP_RSYNC_IMAGE_SECRET
value: <image-secret-name>

This allows for secure retrieval of the rsync image using the specified image pull secret.

Customizing the rsync flags

Customizing the rsync flags is possible, as the default configuration employs the following flags for the rsync command within the rsync job pod: -avz. To specify your own set of rsync flags, you can introduce an environment variable in the StorageCluster specification as follows:

stork:
enabled: true
env:
- name: KDMP_RSYNC_FLAGS
value: "-<custom-flags>"

Ensure to include a hyphen at the beginning of your custom flags within the specified value field. This enables you to fine-tune the rsync operation according to your specific requirements.

Supported environment variables

You should provide all the following environment variables within the env section for Stork in the StorageCluster specification:

Environment variableDescription
KDMP_RSYNC_IMAGECustom image name for the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_IMAGE_SECRETImage pull secret for the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_OPENSHIFT_SCCOpenshift SCC to be used with the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_FLAGSCustom rsync flags that will be used by the rsync command that runs inside the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_REQUEST_CPURequest CPU for the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_REQUEST_MEMORYRequest Memory for the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_LIMIT_CPUCPU Limit for the rsync pod deployed by Stork’s KDMP controller
KDMP_RSYNC_LIMIT_MEMORYMemory Limit for the rsync pod deployed by Stork’s KDMP controller