Version: 3.0

Configure stork-controller-config ConfigMap parameters

Applicable to both Classic and Federated modes

note

The NFS-related content in this topic (the kdmp-config parameters and the Large resource NFS backups and restores section) applies to Classic mode only. NFS backup locations and KDMP are not supported in Federated mode.

Cluster configurations with a large number of Kubernetes resources can span a broad spectrum of resource and system configurations. To make the solution viable across a wide range of configurations, you can alter the ConfigMap parameters.

Add the parameters specified in the table below to the stork-controller-config ConfigMap in the kube-system namespace and adjust the values as required to suit your configuration:

Key/Parameter	Default Value	Description
`large-resource-size-limit`	`1048576` (1 MiB)	Sets the size limit to adapt to the cluster-wide setting of etcd's message size if your cluster has modified the default etcd message size of 1.5 MiB (1,572,864 bytes). The default value of 1 MiB (1,048,576 bytes) is derived by subtracting approximately 512 KB (reserved for etcd headers and overhead) from the default etcd size limit of 1.5 MiB. All values are in bytes. If your cluster's etcd size limit has been modified, adjust this value by subtracting approximately 512 KB (524,288 bytes) from your cluster's configured etcd size limit in bytes. Usage: `large-resource-size-limit: "819200"` sets the limit to 800 KiB (819,200 bytes). Configmap: `stork-controller-config` Cluster: App cluster
`resource-count-limit`	`500`	Sets the maximum number of resources that will be grouped together for upload, regardless of whether the size limit is reached. Use this parameter to reduce the number of Kubernetes API calls by limiting uploads to a single large resource group. Usage: `resource-count-limit: "200"` Configmap: `stork-controller-config` Cluster: App cluster
`restore-volume-backup-count`	`25`	Sets the number of volumes that will be restored in a single batch. If the restore process fails with a "device busy" error, reduce this value below 25. Usage: `restore-volume-backup-count: "20"` Configmap: `stork-controller-config` Cluster: App cluster
`restore-volume-sleep-interval`	`20s`	Sets the time interval between two batches of volumes that will be restored. Increase this value to allow more time for the backend storage to process each batch before the next one begins. Usage: `restore-volume-sleep-interval: "30s"` or `restore-volume-sleep-interval: "1m"` Configmap: `stork-controller-config` Cluster: App cluster

The behavior of these parameters is explained below:

Large-resource-size-limit: If the etcd message size in a cluster is configured to less than the default value of 1.5 MiB (1,572,864 bytes), adjust this parameter’s value to match the cluster-wide setting. Specify the value in bytes.

Resource-count-limit: If the number of resources overloads the Kubernetes API server, you may see the following error in the Stork log and the backup operation may eventually time out:

time="2023-04-22T04:22:49Z" level=debug msg="Monitoring storage nodes"
time="2023-04-22T04:23:55Z" level=warning msg="gatherResourceInChunks: failed to list resources"
time="2023-04-22T04:23:55Z" level=error msg="Error getting resources: the server was unable to return a response in the time allotted, but may still be processing the request" ApplicationBackupName=<application-backup-name> ApplicationBackupUID=<application-backup-uid> Namespace=<namespace-name> ResourceVersion=<resource-version>
time="2023-04-22T04:23:55Z" level=error msg="Error backing up resources: the server was unable to return a response in the time allotted, but may still be processing the request" ApplicationBackupName=<application-backup-name> ApplicationBackupUID=<application-backup-uid> Namespace=<namespace-name> ResourceVersion=<resource-version>
time="2023-04-22T04:23:55Z" level=error msg="Error backing up volumes: the server was unable to return a response in the time allotted, but may still be processing the request" ApplicationBackupName=<application-backup-name> ApplicationBackupUID=<application-backup-uid> Namespace=<namespace-name> ResourceVersion=resource-version>

To troubleshoot this scenario, you can change the default value of 500 resource queries at a time to a lesser number, say 200 or 300.

Restore-volume-backup-count: This parameter defines the number of volumes that will be restored in a single batch. When the restore process fails with a device busy error, a probable cause is a higher batch count of PVCs supplied for the restore process, causing the backend storage system to fail with a device busy error. The following is a sample error message displayed in the web console for this scenario:
```
Restore failed for volume: cloudsnap Restore id:<restore_id> for <backup-name> did not succeed: [createRestoreDestinationVol, Failed to create restore vol err:Volume (Name: <pvc-name>)] create failed error: Volume is busy on Node-not-assigned, processingNode <node-name>]
```
Reduce the default value of this parameter to below 25 as a troubleshooting measure.
Restore-volume-sleep-interval: This parameter increases the time interval between two batches of volumes to be restored. You can increase the default value to lengthen the interval between two restore batches.

Large resource NFS backups and restores (applicable to Classic mode only)

KDMP job pods consume increased amounts of memory for large resource backup and restore operations to NFS backup locations. As a result, you may see out-of-memory alerts or failures of the NFS job pods that run on each target cluster. In these scenarios, Portworx recommends increasing the CPU and memory limits by adding the following parameters to the kdmp-config ConfigMap, which resides in the kube-system namespace on the target cluster.

The following parameters in the kdmp-config ConfigMap allow you to increase resources for the NFS executor:

KDMP_NFSEXECUTOR_REQUEST_CPU
KDMP_NFSEXECUTOR_LIMIT_CPU
KDMP_NFSEXECUTOR_REQUEST_MEMORY
KDMP_NFSEXECUTOR_LIMIT_MEMORY

For more information on these parameters and how to configure them, see kdmp-config ConfigMap parameters.

These values are not displayed in the kdmp-config ConfigMap by default. When you edit the ConfigMap with the kubectl command, see the usage column in the table above to set the parameters. In this case, the values of these parameters are set to double the default CPU and memory limits.

For example, consider a cluster with 4 nodes and 50,000 resources composed of ConfigMap and Secret resource types. The maximum required memory limit (KDMP_NFSEXECUTOR_LIMIT_MEMORY) to back up and restore data in such an environment is approximately 3Gi. Note that this value is approximate; actual memory may vary depending on your environment and configuration.

Large resource NFS backups and restores (applicable to Classic mode only)​

Large resource NFS backups and restores (applicable to Classic mode only)