Skip to main content
Version: 2.10

Configure Batch Alerts for Schedules

Portworx Backup introduces an end-to-end bulk alerting system for backup schedule operations that streamlines how failures are detected, grouped, and delivered. This feature consolidates multiple failures related to backup schedules into a single, easy-to-read notification.

Instead of overwhelming users with multiple repetitive email alerts, the system delivers clear visibility into how many backup schedules failed and the reasons behind those failures. These grouped alerts are shared through consolidated email notifications that summarize all related issues in an easy-to-read table. As conditions change, the alerts stay relevant, groups update automatically when new failures occur or when existing issues are resolved, ensuring users always have a real-time view of their backup schedule status.

This feature also ensures reliable failure tracking and efficient grouping to notify the user. The result is reduced alert fatigue, improved visibility into bulk schedule operation issues, and an enhanced user experience for monitoring and troubleshooting backup schedules.

To set batch alerts for backup schedule-related operations, Portworx Backup has introduced two new keys groupWait and groupInterval. You need to update these two keys in alertmanagerconfig CRD to set batch email notifications for schedule failures in your environment.

  • groupWait: defines how long Portworx Backup waits before sending the first email after a failure is detected. During this wait time, it collects additional matching alerts so that they can all be included in one consolidated notification instead of sending separate emails immediately.

  • groupInterval: defines the minimum time gap between subsequent emails for the same group of alerts. If new failures are added to the existing group, Portworx Backup will not send another notification immediately.

    Example:

    If you have set groupWait: 1h and groupInterval: 2h and when you delete 5 backup schedules through the web console and all 5 operations fail, Portworx Backup generates individual failure metrics but automatically groups them into a single alert. The system then sends the first email summarizing all 5 failures in a consolidated table after 1 hour from the occurrence of first failure (if within this 1 hour, more failures occur, they will be grouped in the same group as well). Later if you delete 2 more schedules and both fail, Portworx Backup appends these two fresh failures to the existing alert group, increasing the count to 7, and sends an updated email (second notification) with all 7 failures. When 3 of these operations are later resolved successfully, leaving only 4 failures, the group adjusts accordingly, and another updated email (third notification) is sent after 2 hours, reflecting the reduced set of issues. This process continues to ensure users always receive a clear, consolidated, and up-to-date view of their backup schedule failures.

Prerequisites

  • Portworx Backup 2.10 and later
  • Preconfigured SMTP alerts
  • kubectl CLI installed
  • Permissions to view and edit resources in the namespace where Portworx Backup is deployed

How to configure

To configure batch email alerts for backup schedule failures in Portworx Backup:

  1. List existing AlertmanagerConfig objects in Portworx Backup deployed namespace:

    kubectl get alertmanagerconfig -n <pxb-namespace>

    Example

    kubectl get alertmanagerconfig -n px-backup

    This displays all AlertmanagerConfig resources in the px-backup namespace.

  2. View details of a specific AlertmanagerConfig:

    kubectl describe alertmanagerconfig <config-name> -n <pxb-namespace>

    OR To view the raw YAML:

    kubectl get alertmanagerconfig <config-name> -n <pxb-namespace> -o yaml
  3. Edit the AlertmanagerConfig:

    kubectl edit alertmanagerconfig <config-name> -n <pxb-namespace>

    This opens the resource in your default text editor (usually vi or nano).

    Note that editing with kubectl edit modifies the resource in place. For version control, export the YAML first and then manage the changes.

    You will see the following output with the kubectl edit command:

    apiVersion: v1
    items:
    - apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
    creationTimestamp: "2025-08-13T16:43:54Z"
    generation: 1
    labels:
    app: px-backup-alert-configs
    orgId: default
    owner: <owner>
    receiver_uid: <receiver-uid>
    name: <px-backup-xxx-xxx-xxx>
    namespace: px-backup
    resourceVersion: "11053855"
    uid: <resource-uid>
    spec:
    receivers:
    - emailConfigs:
    - authPassword:
    key: password
    name: <px-backup-alertmanagerconfig-xxx-xxx-xxx-xxx-xxx>
    authUsername: yyy@gmail.com
    from: zzz@gmail.com
    headers:
    - key: Subject
    value: Portworx Backup - {{ if eq .GroupLabels.alertname "BackupAlert" }}Backup
    Status Alert{{ else if eq .GroupLabels.alertname "RestoreAlert" }}Restore
    Status Alert{{ else if eq .GroupLabels.alertname "ClusterAlert" }}Cluster
    Status Alert{{ else if eq .GroupLabels.alertname "BackupLocationAlert"
    }}Backup Location Status Alert{{ else if eq .GroupLabels.alertname "BackupLocationLimitedAvailabilityAlert"
    }}Backup Location Limited Availability Alert{{ else if eq .GroupLabels.alertname
    "PartialBackupAlert" }}Partial Backup Success Alert{{ else if eq .GroupLabels.alertname
    "BulkOperationFailure" }}Bulk Operation Failure Alert{{ else }}Unknown
    Alert Type{{ end }}
    html: '{{ template "pxc_template.tmpl" . }}'
    smarthost: smtp.gmail.com:yy
    tlsConfig:
    ca: {}
    cert: {}
    to: xxx@purestorage.com
    name: <px-backup-xxx-xxx-xxx-xxx-xxx>
    - name: "null"
    route:
    matchers:
    - name: user_id
    value: <user-uuid-xxx-xxx-xxx-xxx-xxx>
    receiver: "null"
    routes:
    - groupBy:
    - alertname
    - object_type
    - operation
    - user_id
    groupInterval: 30s
    groupWait: 30s
    matchers:
    - name: alertname
    value: BulkOperationFailure
    receiver: <px-backup-xxx-xxx-xxx-xxx-xxx>
    repeatInterval: 87600h
    - groupBy:
    - alertname
    groupInterval: 1m
    groupWait: 30s
    matchers:
    - matchType: '!='
    name: alertname
    value: BulkOperationFailure
    - name: backfill
    value: ""
    receiver: <px-backup-xxx-xxx-xxx-xxx-xxx>
    repeatInterval: 2208h
    kind: List
    metadata:
    resourceVersion: ""
  4. Update the following keys under value: BulkOperationFailure with the required values:

    groupInterval: 30s
    groupWait: 30s

    Default value for both of these keys is 30s.

    If you have to set the value in minutes, here is the example:

    groupInterval: 5m
    groupWait: 10m
  5. Save and exit the editor.

    Kubernetes automatically applies the updated configuration.

  6. Confirm the changes were applied correctly:

    kubectl get alertmanagerconfig <config-name> -n <pxb-namespace> -o yaml
  7. Check logs of the Alertmanager pod to ensure it reloaded the updated configuration without errors:

    kubectl logs <alertmanager-pod> -n <pxb-namespace>

Best Practices

  • Review email alerts and resolve failures promptly to clear metrics and reduce noise.

  • For environments with large numbers of schedules, monitor group sizes to avoid oversized email alerts.