Version: 3.3

Decommission a Node

This guide describes a recommended workflow for decommissioning a Portworx node in your cluster.

note

The following steps don't apply if you're using an auto-scaling group (ASG) to manage your Portworx nodes. For details about how you can change the size of your auto-scaling group, see the Scaling the Size of Your Auto Scaling Group page of the AWS documentation.

Migrate application pods using Portworx volumes that are running on this node

Openshift
Kubernetes

If you plan to remove Portworx from a node, applications running on that node using Portworx need to be migrated. If Portworx is not running, existing application containers will end up with read-only volumes and new ones will fail to start.

Perform the following steps to migrate select pods.

Cordon the node using the following command:
```
oc adm cordon <node>
```
Reschedule application pods using Portworx volumes on different nodes:
```
oc delete pod <pod-name> -n <application-namespace>
```
Since application pods are expected to be managed by a controller like Deployment or StatefulSet, a new replacement pod on another node will be created.

Perform the following steps to migrate select pods.

Cordon the node using the following command:
```
kubectl cordon <node>
```
Reschedule application pods using Portworx volumes on different nodes:
```
kubectl delete pod <pod-name> -n <application-namespace>
```
Since application pods are expected to be managed by a controller like Deployment or StatefulSet, Kubernetes will spin up a new replacement pod on another node.

Remove Portworx from the cluster

Follow this section "Removing offline Nodes" or "Removing a functional node from a cluster" to decommission the Portworx node from the cluster.

Removing a functional node from a cluster

A functional Portworx node may need to be removed from the cluster. In this section, we'll demonstrate the removal of a node by running commands on the node itself as well as the removal of a node from another node.

The following output shows the state of the cluster and the different node IPs and node IDs:

pxctl status

Status: PX is operational
Node ID: xxxxxxxx-xxxx-xxxx-xxxx-0027f6bbcbd1
        IP: X.X.X.0
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     SIZE    USED    STATUS  ZONE    REGION
        0       LOW             64 GiB  11 GiB  Online  c       us-east-1
        Local Storage Devices: 1 device
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/xvdf       STORAGE_MEDIUM_SSD      64 GiB          25 Feb 17 21:13 UTC
        total                   -                       64 GiB
Cluster Summary
        Cluster ID: xxxxxxxx-xxxx-xxxx-xxxx-d4a612b74cc3
        IP              ID                                      Used    Capacity        Status
        172.31.40.38    xxxxxxxx-xxxx-xxxx-xxxx-2bc112f5f131    11 GiB  64 GiB          Online
        172.31.37.211   xxxxxxxx-xxxx-xxxx-xxxx-a85e0514ae8b    11 GiB  64 GiB          Online
        172.31.35.130   xxxxxxxx-xxxx-xxxx-xxxx-893373631483    11 GiB  64 GiB          Online
        172.31.45.106   xxxxxxxx-xxxx-xxxx-xxxx-2eeddcd64d51    11 GiB  64 GiB          Online
        172.31.45.56    xxxxxxxx-xxxx-xxxx-xxxx-ec8e1420e645    11 GiB  64 GiB          Online
        172.31.46.119   xxxxxxxx-xxxx-xxxx-xxxx-0027f6bbcbd1    11 GiB  64 GiB          Online (This node)
        172.31.39.201   xxxxxxxx-xxxx-xxxx-xxxx-936b1b58aa24    11 GiB  64 GiB          Online
        172.31.33.151   xxxxxxxx-xxxx-xxxx-xxxx-41e70a72eafd    11 GiB  64 GiB          Online
        172.31.33.252   xxxxxxxx-xxxx-xxxx-xxxx-428e727eb6b8    11 GiB  64 GiB          Online
Global Storage Pool
        Total Used      :  99 GiB
        Total Capacity  :  576 GiB

Suspend active cloudsnap operations

Identify any active cloudsnap operations being run on the node that you intend to decommission:

pxctl cloudsnap status

The STATE of active operations shows as Backup-Active:

NAME                                    SOURCEVOLUME        STATE           NODE            TIME-ELAPSED        COMPLETED   
xxxxxxxx-xxxx-xxxx-xxxx-278535e49860    885345022234521857  Backup-Done     10.13.90.125    39.746191264s       Tue, 22 Mar 2022 22:53:37 UTC
xxxxxxxx-xxxx-xxxx-xxxx-3c3ccf47f276    186701534582547510  Backup-Done     10.13.90.122    1.677455484s        Tue, 22 Mar 2022 23:59:49 UTC
xxxxxxxx-xxxx-xxxx-xxxx-73176c2d03e2    885345022234521857  Backup-Done     10.13.90.125    27.550329395s       Wed, 23 Mar 2022 00:00:15 UTC
xxxxxxxx-xxxx-xxxx-xxxx-2307865c1b93    649554470078043771  Backup-Active   10.13.90.125    5m12.61653365s

From this output, identify the volumes with active backups. For example, if node 10.13.90.125 is being decommissioned, then the volume with active backup is 649554470078043771.

Identify the namespace of the volume the cloudsnap operation is occuring on. The namespace is displayed under the Labels section in the output from the following command. Replace <source_volume> with the SOURCEVOLUME value for the volume that is in a Backup-Active state from the previous output:

pxctl volume inspect <source_volume>

Volume               :  649554470078043771
Name                 :  pvc-xxxxxxxx-xxxx-xxxx-xxxx-d4827680f6de
Size                 :  500 GiB
Format               :  ext4
HA                   :  3
IO Priority          :  LOW
Creation time        :  Mar 22 20:37:52 UTC 2022
Shared               :  v4 (service)
Status               :  up
State                :  Attached: xxxxxxxx-xxxx-xxxx-xxxx-10f4e076cac8 (10.13.90.119)
Last Attached        :  Mar 22 20:37:58 UTC 2022
Device Path          :  /dev/pxd/pxd649554470078043771
Labels               :  mount_options=nodiscard=true,namespace=vdbench,nodiscard=true,pvc=vdbench-pvc-sharedv4,repl=3,sharedv4=true,sharedv4_svc_type=ClusterIP
...

Suspend backup operations for the volume and wait for current backup to complete:

storkctl suspend volumesnapshotschedule vdbench-pvc-sharedv4-schedule  -n vdbench

Verify the suspension. The SUSPEND field will show as true:

storkctl get volumesnapshotschedule -n vdbench 

NAME                            PVC                    POLICYNAME   PRE-EXEC-RULE   POST-EXEC-RULE   RECLAIM-POLICY   SUSPEND   LAST-SUCCESS-TIME
vdbench-pvc-sharedv4-schedule   vdbench-pvc-sharedv4   testpolicy                                    Delete           true      22 Mar 22 17:10 PDT

Repeat these steps until all active snaps complete and all backup operations are suspended on the node that you want to decommission.

Prevention of data loss

If any node hosts a volume with replication factor of 1, then Portworx disallows decommissioning of such nodes because there is data loss.

One possible workaround to decommission such a node is to increase the replication of single replica volumes by running volume ha-update.

List all the volumes hosts on the decommisioning node:

pxctl volume  list --node xxxxxxxx-xxxx-xxxx-xxxx-2eeddcd64d51

ID                      NAME                                            SIZE    HA      SHARED  ENCRYPTED  PROXY-VOLUME     IO_PRIORITY     STATUS                          SNAP-ENABLED
633738568577538909      pvc-xxxxxxxx-xxxx-xxxx-xxxx-d4827680f6de        2 GiB   3       no      no         no               LOW             up - attached on 172.31.45.106    no
161898313715947409      pvc-xxxxxxxx-xxxx-xxxx-xxxx-68f0d970e10c        2 GiB   1       no      no         no               LOW             up - attached on 172.31.45.106    no

Increase the replication factor:

pxctl volume ha-update --repl 2 161898313715947409

Once the volume is completely replicated onto another node, continue with the node decommissioning. This time, the volume already has another replica on another node, so decommissioning the node will reduce the replication factor of the volume and remove the node.

Placing the node in maintenance mode

After identifying the node to be removed (see "Identify the node that you want to remove from the cluster" above), place the node in maintenance mode.

pxctl service maintenance --enter

This is a disruptive operation, PX will restart in maintenance mode.
Are you sure you want to proceed ? (Y/N): y
Entered maintenance mode.

Run the cluster delete command

Example 1: Running the cluster delete command from a different node

ssh to 172.31.46.119 and run the following command:

pxctl cluster delete xxxxxxxx-xxxx-xxxx-xxxx-2eeddcd64d51

Node xxxxxxxx-xxxx-xxxx-xxxx-2eeddcd64d51 successfully deleted.

Example 2: Running the cluster delete command from the same node

ssh to 172.31.33.252 and type:

pxctl cluster delete xxxxxxxx-xxxx-xxxx-xxxx-428e727eb6b8

Node xxxxxxxx-xxxx-xxxx-xxxx-428e727eb6b8 successfully deleted.

Clean up Portworx metadata on the node

note

Remove Portworx installation or stop Portworx on the node to allow the clean up of Portworx metadata. See remove Portworx installation/stop Portworx.

To learn how to remove or clean up Portworx metadata on the decommmisioned node, see clean up Portworx metadata on the node.

Removing offline nodes

This document describes how to remove an offline node from a cluster. If you are specifically interested in decommissioning nodes with no storage that have been offline for an extended period, see Automatic decommission of storageless nodes.

Identify the cluster that needs to be managed:

pxctl status

Status: PX is operational
Node ID: xxxxxxxx-xxxx-xxxx-xxxx-3e2b01cd0bc3
    IP: X.X.X.197
    Local Storage Pool: 2 pools
    Pool	IO_Priority	Size	Used	Status	Zone	Region
    0	LOW		200 GiB	1.0 GiB	Online	default	default
    1	LOW		120 GiB	1.0 GiB	Online	default	default
    Local Storage Devices: 2 devices
    Device	Path				Media Type		SizLast-Scan
    0:1	/dev/mapper/volume-27dbb728	STORAGE_MEDIUM_SSD	200 GiB		08 Jan 17 05:39 UTC
    1:1	/dev/mapper/volume-0a31ef46	STORAGE_MEDIUM_SSD	120 GiB		08 Jan 17 05:39 UTC
    total					-			320 GiB
Cluster Summary
    Cluster ID: xxxxxxxx-xxxx-xxxx-xxxx-0242ac110002
    Node IP: X.X.X.197 - Capacity: 2.0 GiB/320 GiB Online (This node)
    Node IP: 10.99.117.129 - Capacity: 1.2 GiB/100 GiB Online
    Node IP: 10.99.119.1 - Capacity: 1.2 GiB/100 GiB Online
Global Storage Pool
    Total Used    	:  4.3 GiB
    Total Capacity	:  520 GiB

List the nodes in the cluster:

pxctl cluster list

Cluster ID: xxxxxxxx-xxxx-xxxx-xxxx-0242ac110002
Status: OK

Nodes in the cluster:
ID					DATA IP		CPU		MEM TOTAL	MEM FREE	CONTAINERS	VERSION		STATUS
xxxxxxxx-xxxx-xxxx-xxxx-3e2b01cd0bc3	X.X.X.197	1.629073	8.4 GB		7.9 GB		N/A		1.1.2-c27cf42	Online
xxxxxxxx-xxxx-xxxx-xxxx-dd5084dce208	10.XX.117.129	0.125156	8.4 GB		8.0 GB		N/A		1.1.3-b33d4fa	Online
xxxxxxxx-xxxx-xxxx-xxxx-d3478c485a61	10.99.119.1	0.25		8.4 GB		8.0 GB		N/A		1.1.3-b33d4fa	Online

List the volumes in the cluster:

pxctl volume list

ID			NAME	SIZE	HA	SHARED	ENCRYPTED	PRIORITSTATUS
845707146523643463	testvol	1 GiB	1	no	no		LOW	up - attached on X.X.X.197

In this case, there is one volume in the cluster and it is attached to node with IP X.X.X.97

Identify the node that you want to remove from the cluster. Skip to next step if the node you want to remove is already offline.
- Openshift
- Kubernetes
To manually set a node offline, apply px/enabled=remove label to node:
oc label nodes < node > px/enabled=remove --overwrite
To manually set a node offline, apply px/enabled=remove label to node:
kubectl label nodes < node > px/enabled=remove --overwrite

In the following example, node X.X.X.197 has been marked as offline:

pxctl cluster list

Cluster ID: xxxxxxxx-xxxx-xxxx-xxxx-0242ac110002
Status: OK

Nodes in the cluster:
ID					DATA IP		CPU		MEM TOTAL	MEM FREE	CONTAINERS	VERSION		STATUS
xxxxxxxx-xxxx-xxxx-xxxx-dd5084dce208	10.99.117.129	5.506884	8.4 GB	8.0 GB		N/A		1.1.3-b33d4fa	Online
xxxxxxxx-xxxx-xxxx-xxxx-d3478c485a61	10.99.119.1	0.25		8.4 GB	8.0 GB		N/A		1.1.3-b33d4fa	Online
xxxxxxxx-xxxx-xxxx-xxxx-3e2b01cd0bc3	X.X.X.197	-		-	N/A		1.1.2-c27cf42	Offline

Attach and detach the volume in one of the surviving nodes:

pxctl host attach 845707146523643463

Volume successfully attached at: /dev/pxd/pxd845707146523643463

pxctl host detach 845707146523643463

Volume successfully detached

Delete the local volume that belonged to the offline node:

pxctl volume delete 845707146523643463

Volume 845707146523643463 successfully deleted.

Delete the node that is offline:

pxctl cluster delete xxxxxxxx-xxxx-xxxx-xxxx-3e2b01cd0bc3

Node xxxxxxxx-xxxx-xxxx-xxxx-3e2b01cd0bc3 successfully deleted.

List the nodes in the cluster to make sure that the node is removed:

pxctl cluster list

Cluster ID: xxxxxxxx-xxxx-xxxx-xxxx-0242ac110002
Status: OK

Nodes in the cluster:
ID					DATA IP		CPU		MEM TOTAL	MEM FREE	CONTAINERS	VERSION		STATUS
xxxxxxxx-xxxx-xxxx-xxxx-dd5084dce208	10.99.117.129	4.511278	8.4 GB	8.0 GB		N/A		1.1.3-b33d4fa	Online
xxxxxxxx-xxxx-xxxx-xxxx-d3478c485a61	10.99.119.1	0.500626	8.4 GB	8.0 GB		N/A		1.1.3-b33d4fa	Online

Show the cluster status:

pxctl status

Status: PX is operational
Node ID: xxxxxxxx-xxxx-xxxx-xxxx-dd5084dce208
    IP: X.X.X.199
    Local Storage Pool: 1 pool
    Pool	IO_Priority	Size	Used	Status	Zone	Region
    0	LOW		100 GiB	1.2 GiB	Online	default	default
    Local Storage Devices: 1 device
    Device	Path				Media Type		Size	Last-Scan
    0:1	/dev/mapper/volume-9f6be49c	STORAGE_MEDIUM_SSD	100 GiB08 Jan 17 06:34 UTC
    total					-			100 GiB
Cluster Summary
    Cluster ID: xxxxxxxx-xxxx-xxxx-xxxx-0242ac110002
    Node IP: 10.99.117.129 - Capacity: 1.2 GiB/100 GiB Online (This node)
    Node IP: 10.99.119.1 - Capacity: 1.2 GiB/100 GiB Online
Global Storage Pool
    Total Used    	:  2.3 GiB
    Total Capacity	:  200 GiB

Automatic decommission of storageless nodes

Storageless nodes that are initialized and added to the cluster may not be needed once they complete their tasks (such as in a scheduler workflow). If they are taken offline or destroyed, the cluster will still retain the nodes and mark them as offline.

If eventually a majority of such nodes exist, the cluster won't have a quorum of nodes that are online. The solution is to run cluster delete commands and remove such nodes. This gets more laborious with more such nodes or an increased frequency of such nodes added and taken down.

To help with this, Portworx waits until a grace period of 48 hours has passed. After this period, offline nodes with no storage will be removed from the cluster. There is no CLI command needed to turn on or trigger this feature.

Remove Portworx installation from the node

Openshift
Kubernetes

Apply the px/enabled=remove label and it will remove the existing Portworx systemd service. It will also apply the px/enabled=false label to stop Portworx from running in the future.

For example, below command will remove existing Portworx installation from minion2 and also ensure that Portworx pod doesn’t run there in the future.

oc label nodes < node > px/enabled=remove --overwrite

Apply the px/enabled=remove label and it will remove the existing Portworx systemd service. It will also apply the px/enabled=false label to stop Portworx from running in future.

For example, below command will remove existing Portworx installation from minion2 and also ensure that Portworx pod doesn’t run there in future.

kubectl label nodes < node > px/enabled=remove --overwrite

note

Decommission from Kubernetes: If the plan is to decommission this node altogether from the Kubernetes cluster, no further steps are needed.

Ensure application pods using Portworx don’t run on this node

If you need to continue using the node without Portworx, you will need to ensure your application pods using Portworx volumes don’t get scheduled here.

You can ensure this by adding the schedulerName: stork field to your application specs (Deployment, Statefulset, etc). Stork is a scheduler extension that will schedule pods using Portworx PVCs only on nodes that have Portworx running. Refer to the Using scheduler convergence article for more information.

Another way to achieve this is to use inter-pod affinity

You need to define a pod affinity rule in your applications that ensure that application pods get scheduled only on nodes where the Portworx pod is running.

Consider the following nginx example:

apiVersion: apps/v1
kind: Deployment
 metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        # Inter-pod affinity rule restricting nginx pods to run only on nodes where Portworx pods are running (Portworx pods have a label
        # name=portworx which is used in the rule)
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: name
                operator: In
                values:
                - "portworx"
            topologyKey: kubernetes.io/hostname
            namespaces:
            - "kube-system"
      hostNetwork: true
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-persistent-storage
          mountPath: /usr/share/nginx/html
      volumes:
      - name: nginx-persistent-storage
        persistentVolumeClaim:
          claimName: px-nginx-pvc

It can also be configured on cluster level by adding spec.stork.args.webhook-controller Set to true in StorageCluster to make Stork the default scheduler for workloads using Portworx volumes:

apiVersion: core.libopenstorage.org/v1
kind: StorageCluster
metadata:
  name: portworx
  namespace: portworx
spec:
  stork:
    enabled: true
    args:
      webhook-controller: true

Uncordon the node

Openshift
Kubernetes

You can now uncordon the node using: oc adm uncordon <node>

If you want to permanently decommision the node, you can skip the following step.

You can now uncordon the node using: kubectl uncordon <node>

If you want to permanently decommission the node, you can skip the following step.

(Optional) Rejoin node to the cluster

If you want Portworx to start again on this node and join as a new node, follow the node rejoin steps.

Disconnect or delete drives from your FlashArray

If you are using Pure FlashArray as a cloud drive provider, follow the additional instructions to decommission FlashArray nodes.

Migrate application pods using Portworx volumes that are running on this node​

Remove Portworx from the cluster​

Removing a functional node from a cluster​

Suspend active cloudsnap operations​

Prevention of data loss​

Placing the node in maintenance mode​

Run the cluster delete command​

Clean up Portworx metadata on the node​

Removing offline nodes​

Automatic decommission of storageless nodes​

Remove Portworx installation from the node​

Ensure application pods using Portworx don’t run on this node​

Uncordon the node​

(Optional) Rejoin node to the cluster​

Disconnect or delete drives from your FlashArray​

Migrate application pods using Portworx volumes that are running on this node

Remove Portworx from the cluster

Removing a functional node from a cluster

Suspend active cloudsnap operations

Prevention of data loss

Placing the node in maintenance mode

Run the cluster delete command

Clean up Portworx metadata on the node

Removing offline nodes

Automatic decommission of storageless nodes

Remove Portworx installation from the node

Ensure application pods using Portworx don’t run on this node

Uncordon the node

(Optional) Rejoin node to the cluster

Disconnect or delete drives from your FlashArray