Configure Batch Alerts for Schedules
Portworx Backup introduces an end-to-end bulk alerting system for backup schedule operations that streamlines how failures are detected, grouped, and delivered. This feature consolidates multiple failures related to backup schedules into a single, easy-to-read notification.
Instead of overwhelming users with multiple repetitive email alerts, the system delivers clear visibility into how many backup schedules failed and the reasons behind those failures. These grouped alerts are shared through consolidated email notifications that summarize all related issues in an easy-to-read table. As conditions change, the alerts stay relevant, groups update automatically when new failures occur or when existing issues are resolved, ensuring users always have a real-time view of their backup schedule status.
This feature also ensures reliable failure tracking and efficient grouping to notify the user. The result is reduced alert fatigue, improved visibility into bulk schedule operation issues, and an enhanced user experience for monitoring and troubleshooting backup schedules.
To set batch alerts for backup schedule-related operations, Portworx Backup has introduced two new keys groupWait and groupInterval. You need to update these two keys in alertmanagerconfig CRD to set batch email notifications for schedule failures in your environment.
-
groupWait: defines how long Portworx Backup waits before sending the first email after a failure is detected. During this wait time, it collects additional matching alerts so that they can all be included in one consolidated notification instead of sending separate emails immediately. -
groupInterval: defines the minimum time gap between subsequent emails for the same group of alerts. If new failures are added to the existing group, Portworx Backup will not send another notification immediately.Example:
If you have set
groupWait: 1handgroupInterval: 2hand when you delete5backup schedules through the web console and all5operations fail, Portworx Backup generates individual failure metrics but automatically groups them into a single alert. The system then sends the first email summarizing all5failures in a consolidated table after 1 hour from the occurrence of first failure (if within this 1 hour, more failures occur, they will be grouped in the same group as well). Later if you delete2more schedules and both fail, Portworx Backup appends these two fresh failures to the existing alert group, increasing the count to7, and sends an updated email (second notification) with all7failures. When3of these operations are later resolved successfully, leaving only4failures, the group adjusts accordingly, and another updated email (third notification) is sent after 2 hours, reflecting the reduced set of issues. This process continues to ensure users always receive a clear, consolidated, and up-to-date view of their backup schedule failures.
Prerequisites
- Portworx Backup 2.10 and later
- Preconfigured SMTP alerts
- kubectl CLI installed
- Permissions to view and edit resources in the namespace where Portworx Backup is deployed
How to configure
To configure batch email alerts for backup schedule failures in Portworx Backup:
-
List existing
AlertmanagerConfigobjects in Portworx Backup deployed namespace:kubectl get alertmanagerconfig -n <pxb-namespace>Example
kubectl get alertmanagerconfig -n px-backupThis displays all
AlertmanagerConfigresources in thepx-backupnamespace. -
View details of a specific
AlertmanagerConfig:kubectl describe alertmanagerconfig <config-name> -n <pxb-namespace>OR To view the raw YAML:
kubectl get alertmanagerconfig <config-name> -n <pxb-namespace> -o yaml -
Edit the
AlertmanagerConfig:kubectl edit alertmanagerconfig <config-name> -n <pxb-namespace>This opens the resource in your default text editor (usually
viornano).Note that editing with
kubectl editmodifies the resource in place. For version control, export the YAML first and then manage the changes.You will see the following output with the
kubectl editcommand:apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
creationTimestamp: "2025-08-13T16:43:54Z"
generation: 1
labels:
app: px-backup-alert-configs
orgId: default
owner: <owner>
receiver_uid: <receiver-uid>
name: <px-backup-xxx-xxx-xxx>
namespace: px-backup
resourceVersion: "11053855"
uid: <resource-uid>
spec:
receivers:
- emailConfigs:
- authPassword:
key: password
name: <px-backup-alertmanagerconfig-xxx-xxx-xxx-xxx-xxx>
authUsername: yyy@gmail.com
from: zzz@gmail.com
headers:
- key: Subject
value: Portworx Backup - {{ if eq .GroupLabels.alertname "BackupAlert" }}Backup
Status Alert{{ else if eq .GroupLabels.alertname "RestoreAlert" }}Restore
Status Alert{{ else if eq .GroupLabels.alertname "ClusterAlert" }}Cluster
Status Alert{{ else if eq .GroupLabels.alertname "BackupLocationAlert"
}}Backup Location Status Alert{{ else if eq .GroupLabels.alertname "BackupLocationLimitedAvailabilityAlert"
}}Backup Location Limited Availability Alert{{ else if eq .GroupLabels.alertname
"PartialBackupAlert" }}Partial Backup Success Alert{{ else if eq .GroupLabels.alertname
"BulkOperationFailure" }}Bulk Operation Failure Alert{{ else }}Unknown
Alert Type{{ end }}
html: '{{ template "pxc_template.tmpl" . }}'
smarthost: smtp.gmail.com:yy
tlsConfig:
ca: {}
cert: {}
to: xxx@purestorage.com
name: <px-backup-xxx-xxx-xxx-xxx-xxx>
- name: "null"
route:
matchers:
- name: user_id
value: <user-uuid-xxx-xxx-xxx-xxx-xxx>
receiver: "null"
routes:
- groupBy:
- alertname
- object_type
- operation
- user_id
groupInterval: 30s
groupWait: 30s
matchers:
- name: alertname
value: BulkOperationFailure
receiver: <px-backup-xxx-xxx-xxx-xxx-xxx>
repeatInterval: 87600h
- groupBy:
- alertname
groupInterval: 1m
groupWait: 30s
matchers:
- matchType: '!='
name: alertname
value: BulkOperationFailure
- name: backfill
value: ""
receiver: <px-backup-xxx-xxx-xxx-xxx-xxx>
repeatInterval: 2208h
kind: List
metadata:
resourceVersion: "" -
Update the following keys under
value: BulkOperationFailurewith the required values:groupInterval: 30s
groupWait: 30sDefault value for both of these keys is
30s.If you have to set the value in minutes, here is the example:
groupInterval: 5m
groupWait: 10m -
Save and exit the editor.
Kubernetes automatically applies the updated configuration.
-
Confirm the changes were applied correctly:
kubectl get alertmanagerconfig <config-name> -n <pxb-namespace> -o yaml -
Check logs of the Alertmanager pod to ensure it reloaded the updated configuration without errors:
kubectl logs <alertmanager-pod> -n <pxb-namespace>
Best Practices
-
Review email alerts and resolve failures promptly to clear metrics and reduce noise.
-
For environments with large numbers of schedules, monitor group sizes to avoid oversized email alerts.