Import data into Portworx PVCs in GCP Anthos
In modern Kubernetes-based infrastructure, data migration and application deployment are critical tasks. This document provides a step-by-step guide on how to import an application data from a PVC backed by a non Portworx storage driver onto PVCs created by Portworx.
To import data into a Portworx PVC, Stork will use rsync
to copy the data from an existing PVC into a PVC backed by Portworx. Stork will run a Kubernetes Job which runs the rsync
command inside a container. This can be useful if you’re a new onboarding customer who was previously using a different storage provider, and who now needs to import data from non-Portworx PVCs into Portworx PVCs.
Prerequisites
- A Kubernetes cluster set up
- Portworx deployed on this cluster with Stork version 23.8.0 or higher
Import an application and its data onto PVCs
-
Define a StorageClass and PVC to set up Portworx storage.
Create a Portworx PVC using the
px-csi-db
StorageClass. This StorageClass would be already created for you when you installed Portworx. This is the PVC into which you will be importing data into.kubectl create -f destination-pvc.yaml
destination-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: postgres-data
labels:
app: postgres
spec:
storageClassName: px-csi-db
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi -
Scale down the application replicas to 0 to avoid data conflicts during migration.
Stork supports importing only offline data import. Scaling down the application using the non Portworx PVC ensures the data stays consistent as we import it into a Portworx PVC.
kubectl scale --replicas=0 <deployment>/<your-application-name>
Replace
<deployment>/<your-application-name>
in the above command with the appropriate resource. -
Create a
DataExport
object specifying the source and destination of the data import.The
DataExport
CR is the main driver for triggering the import between a non Portworx PVC (source) and the Portworx PVC (destination). Both these PVCs are provided in theDataExport
CR specification.kubectl create -f dataexport.yaml
dataexport.yaml
apiVersion: kdmp.portworx.com/v1alpha1
kind: DataExport
metadata:
name: postgres-export
namespace: default
spec:
type: rsync
source:
apiVersion: v1
kind: PersistentVolumeClaim
name: pgbench-data
namespace: default
destination:
apiVersion: v1
kind: PersistentVolumeClaim
name: postgres-data
namespace: default -
Monitor the progress of the data export using
kubectl describe
.Following are the sample outputs for a data export process:
In progress
Spec:
Destination:
API Version: v1
Kind: PersistentVolumeClaim
Name: postgres-data
Namespace: default
Source:
API Version: v1
Kind: PersistentVolumeClaim
Name: pgbench-data
Namespace: default
Type: rsync
Status:
Reason:
Stage: TransferInProgress
Status: InProgress
Transfer ID: default/import-rsync-pgbench-data
Events: <none>Completed
Spec:
Destination:
API Version: v1
Kind: PersistentVolumeClaim
Name: postgres-data
Namespace: default
Source:
API Version: v1
Kind: PersistentVolumeClaim
Name: pgbench-data
Namespace: default
Type: rsync
Status:
Progress Percentage: 100
Stage: Final
Status: Successful
Transfer ID: default/import-rsync-pgbench-data
Events: -
Update the application's deployment configuration to use the Portworx PVC.
This section uses
kubectl edit
to modify your existing application to use the newly created Portworx PVC into which data has been imported. Based on your deployment model, you will need to change the application specifications to use the new Portworx PVC.kubectl edit <deployment> <your-application-namespace>
-
Restore the application to its desired replica count:
kubectl scale --replicas=1 <deployment>/<your-application-name>
Replace
<deployment>/<your-application-name>
in steps 6 and 7 with the appropriate resource.