Configure tolerations and resource sizing in PDS

Taints and tolerations are mechanisms in Kubernetes used to control which pods can be scheduled on specific nodes.

Taints

Taints are restrictions applied to nodes that prevent certain pods from being scheduled on them unless those pods explicitly indicate they can tolerate the taints. They ensure that specific nodes are reserved for particular workloads or purposes, such as high-memory tasks, GPU-enabled applications, or critical system services.

Tolerations

Tolerations are configurations applied to pods that allow them to tolerate or match specific taints on nodes. A toleration does not force a pod onto a node; it merely permits the pod to be scheduled there if needed. They allow pods to bypass taints and be scheduled on restricted nodes.

How taints and tolerations work together:

A taint is added to a node, such as:

kubectl taint nodes node1 key=value:NoSchedule

This prevents any pods without a matching toleration from being scheduled on node1.

A toleration is added to a pod, such as:

tolerations:
- key: key
  operator: Equal
  value: value
  effect: NoSchedule

This allows the pod to be scheduled on node1 despite the taint.

Node configuration example

A node with high memory has the following taint:

kubectl taint nodes high-memory-node memory=high:NoSchedule

Pod configuration example

A pod needing high memory adds a matching toleration:

tolerations:
- key: memory
  operator: Equal
  value: high
  effect: NoSchedule

This ensures the pod is scheduled only on the high-memory-node.

Resource sizing

Resource sizing in Kubernetes helps define how much CPU and memory a pod can request or use. This ensures fair resource allocation and prevents resource overcommitment.

Predefined sizes:

Small: Ideal for lightweight applications requiring minimal resources, such as microservices, development environments, and basic monitoring agents. It provides just enough CPU and memory to support non-critical workloads.
Medium: Offers a balanced configuration for moderate workloads like web applications, APIs, and small to medium-sized databases.
Large: Designed for high-performance workloads such as big data processing, machine learning, enterprise databases, and resource-intensive backup jobs.

note

The exact resource allocation for each size depends on the component. For example, in the case of px-app-operator, the predefined resource allocations are as follows:

# Small
app.resources.small:
  limits:
    cpu: 250m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

# Medium
app.resources.medium:
  limits:
    cpu: 500m
    memory: 1024Mi
  requests:
    cpu: 200m
    memory: 256Mi

# Large
app.resources.large:
  limits:
    cpu: 750m
    memory: 2048Mi
  requests:
    cpu: 500m
    memory: 512Mi

Custom sizing

Allows administrators to explicitly define the exact CPU and memory requirements for workloads:

Requests: The minimum resources a pod needs to run. If a node doesn’t have the requested resources, the pod won’t be scheduled.
Limits: The maximum resources a pod can use. If a pod exceeds its limits, it might be throttled or terminated.

Example:

resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2
    memory: 4Gi

Requests:

CPU: 500 millicores (half a CPU core)
Memory: 1 GiB
The scheduler ensures the node has at least these resources available before scheduling the pod.

Limits:

CPU: 2 cores
Memory: 4 GiB
The pod cannot exceed these limits, ensuring resource control.

Importance of tolerations and resource sizing

Cluster efficiency: Ensure workloads are scheduled appropriately based on node capabilities.
Fair resource allocation: Prevent resource monopolization by limiting resource usage for pods.
Flexibility: Accommodate diverse workloads with varying resource needs.
Isolation: Reserve nodes for specific workloads using taints, and allow only matching pods to access them using tolerations.

Configure tolerations using APIs

Using PDS APIs, you can configure tolerations during deployment creation, updates, and restore operations. Here’s a detailed explanation of how to manage tolerations through the APIs:

note

The PDS APIs listed below are accessible exclusively via the HTTP/2 (gRPC) protocol. Ensure that your client or integration supports gRPC to interact with these APIs effectively.

Create Deployment API

When creating a deployment, you can specify tolerations in the topologies field of the API request. This configuration ensures that the pods are scheduled on nodes with matching taints.

Tolerations are included in the deployment specification to match the taints on the target nodes.
The scheduler ensures that pods are scheduled only on nodes where tolerations match the taints.

Example deployment request:

{
  "dataServiceDeploymentTopologies": [{
    "podSchedulingConfig": {
      "tolerations": [
        {
          "key": "node-type",
          "operator": "Equal",
          "value": "high-memory",
          "effect": "NoSchedule"
        }
      ]
    }
  }]
}

where:

key: Specifies the taint key on the target node (for example, node-type).
operator: Defines the matching condition (Equal or Exists).
value: The value of the taint key (for example, high-memory).
effect: Determines the taint effect (NoSchedule, PreferNoSchedule, or NoExecute).

Explicitly empty tolerations

Specify an empty list of tolerations in the request.
Ensures no tolerations are applied, allowing pods to be scheduled on any available node without matching taints.

Use case: Allowing flexibility in node selection for non-critical workloads.

Example:

{
  "dataServiceDeploymentTopologies": [{
    "podSchedulingConfig": {
      "tolerations": []
    }
  }]
}

No tolerations specified

If the podSchedulingConfig field is omitted, no tolerations are applied.
Use case: General-purpose deployments without specific node constraints.

Explicitly empty tolerations

Specify an empty list of tolerations in the request.
Ensures no tolerations are applied, allowing pods to be scheduled on any available node without matching taints.

Use case: Allowing flexibility in node selection for non-critical workloads.

Example:

{
  "dataServiceDeploymentTopologies": [{
    "podSchedulingConfig": {
      "tolerations": []
    }
  }]
}

No tolerations specified

If the podSchedulingConfig field is omitted, no tolerations are applied.
Use case: General-purpose deployments without specific node constraints.

Update Deployment API

After creating a deployment, you can modify tolerations using the update deployment API. This allows you to:

Add new tolerations.
Modify existing tolerations.
Remove tolerations.

Use cases

Scaling: When new nodes with different taints are added to the cluster, update the deployment tolerations to match those nodes.
Optimization: Adjust tolerations to improve workload placement and resource utilization.

Example update request:

{
  "dataServiceDeploymentTopologies": [{
    "podSchedulingConfig": {
      "tolerations": [
        {
          "key": "zone",
          "operator": "Equal",
          "value": "us-west",
          "effect": "NoSchedule"
        }
      ]
    }
  }]
}

Explicitly empty tolerations

Specify an empty list of tolerations to remove all tolerations from the deployment.
Ensures pods are scheduled without any node-specific constraints.

Use case: Generalizing deployments for broader node availability.

Example:

{
  "dataServiceDeploymentTopologies": [{
    "podSchedulingConfig": {
      "tolerations": []
    }
  }]
}

No tolerations specified

If the podSchedulingConfig field is omitted in the update request, existing tolerations in the deployment remain unchanged.
Use case: Maintaining current tolerations when modifying other deployment attributes.

Create Restore API

You can also configure tolerations during restore operations. The behavior depends on how tolerations are specified in the restore request.

With new tolerations

Explicitly define tolerations for the restore operation.
These tolerations override any tolerations from the source deployment.

Use case: Restoring data to a node with different taints than the original deployment.

Example:

{
  "config": {
    "podSchedulingConfig": {
      "tolerations": [
        {
          "key": "backup-zone",
          "operator": "Equal",
          "value": "zone1",
          "effect": "NoSchedule"
        }
      ]
    }
  }
}

Explicitly empty tolerations

Specify an empty list of tolerations in the restore request.
This configuration ensures no tolerations are applied to the restored pods.

Use case: Removing all tolerations for the restored pods to allow them to be scheduled anywhere in the cluster.

Example:

{
  "config": {
    "podSchedulingConfig": {
      "tolerations": []
    }
  }
}

Default tolerations

If no tolerations are specified in the restore request, the tolerations from the source deployment are inherited.

Use case: Maintaining consistency with the original deployment tolerations.

Summary of toleration behaviors

API	Toleration field	Behavior
Create Deployment	New tolerations	Applies the specified tolerations for pod scheduling.
	Empty tolerations	Ensures no tolerations are applied, allowing scheduling on any node.
	No tolerations specified	Default behavior with no node-specific constraints.
Update Deployment	New tolerations	Adds or updates tolerations for pod scheduling.
	Empty tolerations	Removes all tolerations from the deployment.
	No tolerations specified	Leaves existing tolerations unchanged.
Create Restore	New tolerations	Overrides the source deployment tolerations with the specified tolerations.
	Empty tolerations	Removes all tolerations from the restored pods.
	No tolerations specified	Inherits tolerations from the source deployment.

Configure custom taints, tolerations, and resource settings for Portworx applications

This section explains how you can configure custom taints, tolerations, and resource settings for Kubernetes nodes and schedule pods (PDS and Portworx Agent) accordingly.

Configure Portworx Agent

Access the PDS platform UI to generate a deployment manifest for Portworx Agent.

The generated manifest serves as the base for adding custom configurations.
Modify the manifest to include tolerations in the bootstrapper configuration. Tolerations should match the taints on the target nodes.

Example toleration configuration:
```
tolerations:
  - effect: NoSchedule
    key: custom-key
    operator: Equal
    value: custom-value
```
An overlay configuration provides a structured way to customize Helm values for Portworx Agent and PDS applications.
Create a ConfigMap for Portworx Agent.

Define tolerations and resource settings within the values.yaml section.

Example overlay configMap for Portworx Agent:

apiVersion: v1
kind: ConfigMap
metadata:
  name: px-agent-overlay
  namespace: px-system
data:
  values.yaml: |-
    global:
      tolerations:
        - effect: NoSchedule
          key: px-agent-key
          operator: Equal
          value: px-agent-value
    px-app-operator:
      size: medium
      tolerations:
        - effect: NoSchedule
          key: app-key
          operator: Equal
          value: app-value
    teleport-kube-agent:
      tolerations:
        - effect: NoSchedule
          key: app-key
          operator: Equal
          value: app-value    

note

Tolerations specified under the global section in the values.yaml configuration provide default settings for all operators. However, if both global and operator-specific tolerations are provided, the operator-specific tolerations will override the global configurations, allowing unique tolerations to be applied for specific operators.

Configure PDS applications

Define resource sizing.

PDS applications support predefined and custom resource sizes:
- Predefined sizes: Small, Medium, Large.
- Custom size: Define explicit CPU and memory requirements.
Refer to this section for more information.

Custom resource example:
```
resources:
  limits:
    cpu: 530m
    memory: 600Mi
  requests:
    cpu: 230m
    memory: 200Mi
```
Resource configurations are supported for the following Operators:
- Portworx Agent Operators:
  - px-agent/px-app-operator
  - px-agent/px-tc-operator
- PDS Operators:
  - pds/pds-backups-operator
  - pds/pds-deployments-operator
  - pds/pds-target-operator
  - pds/pds-mutator
  - pds/pds-external-dns

Create an overlay ConfigMap for PDS applications, including tolerations and resource settings.

Example overlay ConfigMap for PDS:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pds-overlay
  namespace: px-system
data:
  values.yaml: |-
    global:
      tolerations:
      - effect: NoSchedule
        key: pds-key
        operator: Equal
        value: pds-value
      pds-backups-operator:
        size: large
        tolerations:
        - effect: NoSchedule
          key: backups-key
          operator: Equal
          value: backups-value

Apply configurations using overlays

Use kubectl to apply the overlay ConfigMaps to the cluster:

kubectl apply -f px-agent-overlay.yaml
kubectl apply -f pds-overlay.yaml

Integrate overlays with target cluster applications.

Overlays provide additional Helm values before installing PDS and Portworx Agent, ensuring the correct tolerations and resource settings are applied.
If changes are made to the overlay configurations, [force reconcile](force reconcile) Portworx Agent and PDS applications to ensure the updates are applied to the applications.

Install PDS with tolerations and custom resource settings

This section provides a detailed guide to configuring tolerations and resource settings for PDS and Portworx Agent applications.

Access the PDS platform and navigate to the section for creating PDS deployments.
Generate a Kubernetes manifest for Portworx Agent.

Modify the bootstrapper job by adding tolerations directly to the bootstrapper configuration in the manifest.

Example manifest :

apiVersion: batch/v1
kind: Job
metadata:
  name: px-agent-bootstrapper
  namespace: px-system
spec:
  template:
    metadata:
      labels:
        app: px-agent-bootstrapper
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-type
          operator: Equal
          value: high-performance
        - effect: NoExecute
          key: infra
          operator: Exists
      containers:
        - name: px-agent
          image: portworx/px-agent:latest
          resources:
            requests:
              memory: 256Mi
              cpu: 200m
            limits:
              memory: 512Mi
              cpu: 500m
      restartPolicy: OnFailure

Add Custom Resource Settings and tolerations to overlays.

Use overlays to configure tolerations and resource settings for PDS and Portworx Agent applications. Overlays act as an additional layer for Helm configurations.

Overlay for PDS configurations:

Create a ConfigMap for PDS.
Define tolerations and resource settings for each PDS Operator.

Example overlay ConfigMap for PDS:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pds-overlay
  namespace: px-system
data:
values.yaml: |-
  global:
    tolerations:
    - effect: NoSchedule
      key: pds-global-key
      operator: Equal
      value: pds-global-value

  pds-mutator:
    tolerations:
    - effect: NoSchedule
      key: mutator-key
      operator: Equal
      value: mutator-value
    size: small

  pds-target-operator:
    tolerations:
    - effect: NoSchedule
      key: target-key
      operator: Equal
      value: target-value
    size: medium
    externalDNS:
      chartConfig:
        size: small
        tolerations:
        - effect: NoSchedule
          key: external-dns-key
          operator: Equal
          value: extenal-dns-value
        - effect: NoSchedule
          key: key
          operator: Equal

  pds-backups-operator:
    tolerations:
    - effect: NoSchedule
      key: backups-key
      operator: Equal
      value: backups-value
    size: large

  px-deployments-operator:
    tolerations:
    - effect: NoSchedule
      key: deployments-key
      operator: Equal
      value: deployments-value
    size: custom
    manager:
      resources:
        limits:
          cpu: 530m
          memory: 600Mi
        requests:
          cpu: 230m
          memory: 200Mi

Apply configurations

Run the following commands to apply the configurations:

kubectl apply -f px-agent-overlay.yaml
kubectl apply -f pds-overlay.yaml

Proceed with the installation of PDS and Portworx Agent applications. The tolerations and resource settings defined in the overlays will automatically apply.

Force reconciliation for updates

If you update the tolerations or resource settings, you can force reconcile Portworx Agent and PDS applications to apply changes to running applications.

Edit tolerations, resource settings, and other properties using overlay config

This section provides a detailed guide on how to modify tolerations, resource settings, and other configuration properties of applications deployed via PDS and Portworx Agent using overlay configurations. It also explains how to apply the updated settings using force reconciliation.

Overlay configurations allow users to define custom Helm values for applications such as PDS and Portworx Agent. By editing these overlay configurations, users can:

Update tolerations to match new node taints.
Modify resource settings (for example: CPU, memory).
Adjust other deployment-specific properties.

Force reconciliation ensures that any changes made in the overlay configuration are applied to running applications without requiring manual redeployment.

To edit overlay configurations:

Identify the ConfigMap associated with the application you want to update. For example:
- px-agent-overlay for Portworx Agent.
- pds-overlay for PDS Operators.

Edit the overlay configuration to include the updated values for tolerations, resource settings, or other properties.

Example update for Portworx Agent overlay:

apiVersion: v1
kind: ConfigMap
metadata:
  name: px-agent-overlay
  namespace: px-system
data:
values.yaml: |-
  global:
    tolerations:
    - effect: NoSchedule
      key: infra-key
      operator: Equal
      value: infra-value

  px-app-operator:
    tolerations:
    - effect: NoSchedule
      key: app-key
      operator: Equal
      value: app-value
    size: medium

  px-tc-operator:
    tolerations:
    - effect: NoSchedule
      key: tc-key
      operator: Equal
      value: tc-value
    size: custom
    resources:
      limits:
        cpu: 530m
        memory: 630Mi
      requests:
        cpu: 230m
        memory: 200Mi

Example update for PDS overlay:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pds-overlay
  namespace: px-system
data:
  values.yaml: |-
    pds-mutator:
      tolerations:
      - effect: NoSchedule
        key: new-mutator-key
        operator: Equal
        value: mutator-value
      size: medium
    pds-backups-operator:
      tolerations:
      - effect: NoExecute
        key: backups-key
        operator: Exists
      size: custom
      manager:
        resources:
          limits:
            cpu: 1.5
            memory: 3Gi
          requests:
            cpu: 800m
            memory: 2Gi

Save the updated ConfigMap.

Apply the updated ConfigMap to the cluster:

kubectl apply -f px-agent-overlay.yaml
kubectl apply -f pds-overlay.yaml

Apply the updated configuration using force reconcile. Force reconciliation ensures that the updated overlay configuration is applied to running Portworx Agent and PDS applications.
Use the kubectl get tcapp command to check the status of the TargetClusterApplication resource:
```
kubectl get tcapp px-agent -n px-system
```
```
kubectl get tcapp pds -n px-system
```
Look for updated pods or events indicating that the changes have been applied.

Node requirements for scheduling data services with taints and tolerations

When using taints and tolerations for scheduling data services, it is essential to ensure that the cluster has enough nodes available to accommodate the desired number of data service instances. Improper node planning or insufficient nodes can result in unscheduled pods and degraded functionality.

Node allocation based on data service instances

Each data service instance (or pod) requires a dedicated node for scheduling. If the number of available nodes with matching taints is less than the required instances, some pods will remain unscheduled.

Use case example:

If you taints 2 nodes for data service deployments, taint applied to nodes:

kubectl taint nodes node1 pds=true:NoSchedule
kubectl taint nodes node2 pds=true:NoSchedule

If you deploy PostgreSQL with 3 instances (replicas), tolerations applied in the deployment:
```
tolerations:
- key: pds
  operator: Equal
  value: true
  effect: NoSchedule
```

Outcome: Only 2 pods will be scheduled on the tainted nodes, while the third pod will remain in a pending state due to the lack of a suitable node.

To avoid such scenarios, plan and taint nodes based on the required number of data service instances:

Calculate node requirements: Total nodes needed = Number of instances for the largest expected deployment.
Example scenarios:
- Single-node deployment: Suitable for lightweight workloads or development environments.
- Multi-node deployment with replicas: For high availability or production-grade setups, ensure at least one node per instance.
- Taint nodes appropriately: Apply taints to reserve nodes exclusively for PDS workloads:
```
kubectl taint nodes node1 pds=true:NoSchedule
kubectl taint nodes node2 pds=true:NoSchedule
kubectl taint nodes node3 pds=true:NoSchedule
```

Error scenarios and resolutions

Scenario	Cause	Resolution
Pod remains unscheduled	Insufficient tainted nodes	Add more tainted nodes or reduce replicas.
Node overcommitment	Too many pods scheduled on a single node	Apply proper resource limits and requests in deployment.
Misconfigured tolerations	Toleration key or value mismatch with taint	Verify and update toleration configurations.

Taints​

Tolerations​

How taints and tolerations work together:​

Resource sizing​

Custom sizing​

Importance of tolerations and resource sizing​

Configure tolerations using APIs​

Create Deployment API​

Explicitly empty tolerations​

No tolerations specified​

Explicitly empty tolerations​

No tolerations specified​

Update Deployment API​

Use cases​

Explicitly empty tolerations​

No tolerations specified​

Create Restore API​

With new tolerations​

Explicitly empty tolerations​

Default tolerations​

Summary of toleration behaviors​

Configure custom taints, tolerations, and resource settings for Portworx applications​

Configure Portworx Agent​

Configure PDS applications​

Apply configurations using overlays​

Install PDS with tolerations and custom resource settings​

Apply configurations​

Force reconciliation for updates​

Edit tolerations, resource settings, and other properties using overlay config​

Node requirements for scheduling data services with taints and tolerations​

Node allocation based on data service instances​

Error scenarios and resolutions​

Taints

Tolerations

How taints and tolerations work together:

Resource sizing

Custom sizing

Importance of tolerations and resource sizing

Configure tolerations using APIs

Create Deployment API

Explicitly empty tolerations

No tolerations specified

Explicitly empty tolerations

No tolerations specified

Update Deployment API

Use cases

Explicitly empty tolerations

No tolerations specified

Create Restore API

With new tolerations

Explicitly empty tolerations

Default tolerations

Summary of toleration behaviors

Configure custom taints, tolerations, and resource settings for Portworx applications

Configure Portworx Agent

Configure PDS applications

Apply configurations using overlays

Install PDS with tolerations and custom resource settings

Apply configurations

Force reconciliation for updates

Edit tolerations, resource settings, and other properties using overlay config

Node requirements for scheduling data services with taints and tolerations

Node allocation based on data service instances

Error scenarios and resolutions