Version: 2.10

Configure Batch Alerts for Schedules

Portworx Backup introduces an end-to-end bulk alerting system for backup schedule operations that streamlines how failures are detected, grouped, and delivered. This feature consolidates multiple failures related to backup schedules into a single, easy-to-read notification.

Instead of overwhelming users with multiple repetitive email alerts, the system delivers clear visibility into how many backup schedules failed and the reasons behind those failures. These grouped alerts are shared through consolidated email notifications that summarize all related issues in an easy-to-read table. As conditions change, the alerts stay relevant, groups update automatically when new failures occur or when existing issues are resolved, ensuring users always have a real-time view of their backup schedule status.

This feature also ensures reliable failure tracking and efficient grouping to notify the user. The result is reduced alert fatigue, improved visibility into bulk schedule operation issues, and an enhanced user experience for monitoring and troubleshooting backup schedules.

To set batch alerts for backup schedule-related operations, Portworx Backup has introduced two new keys groupWait and groupInterval. You need to update these two keys in alertmanagerconfig CRD to set batch email notifications for schedule failures in your environment.

groupWait: defines how long Portworx Backup waits before sending the first email after a failure is detected. During this wait time, it collects additional matching alerts so that they can all be included in one consolidated notification instead of sending separate emails immediately.
groupInterval: defines the minimum time gap between subsequent emails for the same group of alerts. If new failures are added to the existing group, Portworx Backup will not send another notification immediately.

Example:

If you have set groupWait: 1h and groupInterval: 2h and when you delete 5 backup schedules through the web console and all 5 operations fail, Portworx Backup generates individual failure metrics but automatically groups them into a single alert. The system then sends the first email summarizing all 5 failures in a consolidated table after 1 hour from the occurrence of first failure (if within this 1 hour, more failures occur, they will be grouped in the same group as well). Later if you delete 2 more schedules and both fail, Portworx Backup appends these two fresh failures to the existing alert group, increasing the count to 7, and sends an updated email (second notification) with all 7 failures. When 3 of these operations are later resolved successfully, leaving only 4 failures, the group adjusts accordingly, and another updated email (third notification) is sent after 2 hours, reflecting the reduced set of issues. This process continues to ensure users always receive a clear, consolidated, and up-to-date view of their backup schedule failures.

Prerequisites

Portworx Backup 2.10 and later
Preconfigured SMTP alerts
kubectl CLI installed
Permissions to view and edit resources in the namespace where Portworx Backup is deployed

How to configure

To configure batch email alerts for backup schedule failures in Portworx Backup:

List existing AlertmanagerConfig objects in Portworx Backup deployed namespace:
```
kubectl get alertmanagerconfig -n <pxb-namespace>
```
Example
```
kubectl get alertmanagerconfig -n px-backup
```
This displays all AlertmanagerConfig resources in the px-backup namespace.

View details of a specific AlertmanagerConfig:

kubectl describe alertmanagerconfig <config-name> -n <pxb-namespace>

OR To view the raw YAML:

kubectl get alertmanagerconfig <config-name> -n <pxb-namespace> -o yaml

Edit the AlertmanagerConfig:

kubectl edit alertmanagerconfig <config-name> -n <pxb-namespace>

This opens the resource in your default text editor (usually vi or nano).

Note that editing with kubectl edit modifies the resource in place. For version control, export the YAML first and then manage the changes.

You will see the following output with the kubectl edit command:

apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1alpha1
  kind: AlertmanagerConfig
  metadata:
    creationTimestamp: "2025-08-13T16:43:54Z"
    generation: 1
    labels:
      app: px-backup-alert-configs
      orgId: default
      owner: <owner>
      receiver_uid: <receiver-uid>
    name: <px-backup-xxx-xxx-xxx>
    namespace: px-backup
    resourceVersion: "11053855"
    uid: <resource-uid>
  spec:
    receivers:
    - emailConfigs:
      - authPassword:
          key: password
          name: <px-backup-alertmanagerconfig-xxx-xxx-xxx-xxx-xxx>
        authUsername: yyy@gmail.com
        from: zzz@gmail.com
        headers:
        - key: Subject
          value: Portworx Backup - {{ if eq .GroupLabels.alertname "BackupAlert" }}Backup
            Status Alert{{ else if eq .GroupLabels.alertname "RestoreAlert" }}Restore
            Status Alert{{ else if eq .GroupLabels.alertname "ClusterAlert" }}Cluster
            Status Alert{{ else if eq .GroupLabels.alertname "BackupLocationAlert"
            }}Backup Location Status Alert{{ else if eq .GroupLabels.alertname "BackupLocationLimitedAvailabilityAlert"
            }}Backup Location Limited Availability Alert{{ else if eq .GroupLabels.alertname
            "PartialBackupAlert" }}Partial Backup Success Alert{{ else if eq .GroupLabels.alertname
            "BulkOperationFailure" }}Bulk Operation Failure Alert{{ else }}Unknown
            Alert Type{{ end }}
        html: '{{ template "pxc_template.tmpl" . }}'
        smarthost: smtp.gmail.com:yy
        tlsConfig:
          ca: {}
          cert: {}
        to: xxx@purestorage.com
      name: <px-backup-xxx-xxx-xxx-xxx-xxx>
    - name: "null"
    route:
      matchers:
      - name: user_id
        value: <user-uuid-xxx-xxx-xxx-xxx-xxx>
      receiver: "null"
      routes:
      - groupBy:
        - alertname
        - object_type
        - operation
        - user_id
        groupInterval: 30s
        groupWait: 30s
        matchers:
        - name: alertname
          value: BulkOperationFailure
        receiver: <px-backup-xxx-xxx-xxx-xxx-xxx>
        repeatInterval: 87600h
      - groupBy:
        - alertname
        groupInterval: 1m
        groupWait: 30s
        matchers:
        - matchType: '!='
          name: alertname
          value: BulkOperationFailure
        - name: backfill
          value: ""
        receiver: <px-backup-xxx-xxx-xxx-xxx-xxx>
        repeatInterval: 2208h
kind: List
metadata:
  resourceVersion: ""

Update the following keys under value: BulkOperationFailure with the required values:
```
groupInterval: 30s
groupWait: 30s
```
Default value for both of these keys is 30s.

If you have to set the value in minutes, here is the example:
```
groupInterval: 5m
groupWait: 10m
```
Save and exit the editor.

Kubernetes automatically applies the updated configuration.

Confirm the changes were applied correctly:

kubectl get alertmanagerconfig <config-name> -n <pxb-namespace> -o yaml

Check logs of the Alertmanager pod to ensure it reloaded the updated configuration without errors:
```
kubectl logs <alertmanager-pod> -n <pxb-namespace>
```

Best Practices

Review email alerts and resolve failures promptly to clear metrics and reduce noise.
For environments with large numbers of schedules, monitor group sizes to avoid oversized email alerts.

Prerequisites​

How to configure​

Best Practices​

Prerequisites

How to configure

Best Practices