Portworx Backup Metrics
PX-Backup exposes Prometheus metrics via the /metrics endpoint that provide comprehensive monitoring data for backup, restore, clusters, backup locations, and other resources. Note that this is a point-in-time REST API endpoint that returns current metric values when queried - historical data and time-range queries require a Prometheus server to scrape and store these metrics over time. This guide helps you understand all available metrics, their labels, value ranges, and usage patterns.
Before accessing metrics, ensure you have set up Prometheus to scrape PX-Backup metrics. Refer to the following guides based on your environment:
Scraping Endpoint
PX-Backup metrics can be scraped from the following endpoint:
http://px-backup-svc-endpoint:<rest_port>/metrics
Where <rest_port> is the REST API port (default: 10001).
or
http://<external-ip>:10001/metrics
where <external-ip> is the external IP address of the PX-Backup service. 10001 is the default REST API port.
Metrics Format
All metrics use the pxbackup_ prefix and follow Prometheus naming conventions. The endpoint returns metrics in Prometheus exposition (OpenMetrics) format.
Note: Prometheus /metrics endpoint will serve in response to all data available and doesn't provide incremental delta or filtered data output.
Data provided by /metrics endpoint, but without pxbackup_ prefix can be ignored.
Backfill Behavior
When PX-Backup pod restarts, it loads existing objects from the datastore and creates metrics with the backfill="true" label. This ensures metrics are available for existing backups, restores, and clusters even after pod restarts.
Metrics Categories
Backup Status and Performance Metrics
Metric retention is set to 90 days, however metric retention periods vary by metric type, please refer to individual metric descriptions for details.
pxbackup_backup_status
Type: Gauge Lifecycle: Created at backup start, updated during execution, removed on backup object deletion Description: Current status of backups in PX-Backup Usage: Monitor backup health, detect failures, track backup lifecycle
| Label | Description | Type | Value Range | Example |
|---|---|---|---|---|
| name | Backup name | string | User-defined backup name | "mysql-backup-001" |
| namespaces | Kubernetes namespaces backed up | string | Comma-separated namespace list | "default,kube-system" |
| cluster | Source cluster name | string | Cluster identifier | "prod-cluster-1" |
| user_id | User who created the backup | string | User identifier/email | "admin@company.com" |
| schedule_name | Associated backup schedule | string | Schedule name or empty string | "daily-backup-schedule" |
| org_id | Organization ID | string | Organization identifier | "default" |
| cluster_uid | Unique cluster identifier | string | UUID format | "a1b2cXXX-XXXX-XXX-abcd-ef123456XXXX" |
| error_reason | Error details for failed backups | string | Error message or empty | "Volume snapshot failed: timeout" |
| timestamp_in_secs | Timestamp of last update | string | Unix timestamp as string | "1699123456" |
| backup_namespace | Actual namespaces in backup | string | Comma-separated namespace list | "app1,app2,monitoring" |
| backfill | Indicates backfilled metric | string | "true" or empty string | "true" |
Status Values:
0: Invalid - Backup object is in invalid state1: Pending - Backup is queued for execution2: InProgress - Backup is currently running3: Aborted - Backup was manually aborted4: Failed - Backup failed with errors5: Deleting - Backup is being deleted6: Success - Backup completed successfully7: Captured - Backup data captured (intermediate state)8: PartialSuccess - Backup completed with some failures9: DeletePending - Backup marked for deletion10: CloudBackupMissing - Cloud backup data is missing
pxbackup_backup_count
Type: Counter Lifecycle: Created at first backup completion, incremented on each subsequent backup, persists within Prometheus retention window (typically 90 days) Description: Total number of backup operations (cumulative) Usage: Track backup frequency, generate rates
| Label | Description | Type | Value Range |
|---|---|---|---|
| cluster_name | Source cluster name | string | Cluster identifier |
| user_id | Backup owner | string | User identifier |
| org_id | Organization ID | string | Organization identifier |
| cluster_uid | Cluster UUID | string | UUID format |
| status | Final backup status | string | Status enum as string |
Status Values: "Success", "Failed", "PartialSuccess"
Backup Schedule Metrics
pxbackup_backup_schedule_status
Note: This metric is excluded in the OCP Prometheus
Type: Gauge
Lifecycle: Created when backup schedule is configured, updated when schedule is suspended/resumed, removed on schedule deletion
Description: Status of backup schedules (active/suspended)
Usage: Monitor schedule health, detect suspended schedules
| Label | Description | Type | Value Range |
|---|---|---|---|
| name | Schedule name | string | User-defined schedule name |
| namespaces | Scheduled namespaces | string | Comma-separated namespace list |
| cluster | Target cluster | string | Cluster identifier |
| user_id | Schedule owner | string | User identifier |
Values:
0: Active - Schedule is running normally1: Suspended - Schedule is suspended/paused
Restore Metrics
pxbackup_restore_status
Type: Gauge Lifecycle: Created at restore start, updated during execution, removed on restore object deletion Description: Current status of restore operations Usage: Monitor restore health, track restore progress
| Label | Description | Type | Value Range | Example |
|---|---|---|---|---|
| name | Restore name | string | User-defined restore name | "mysql-restore-001" |
| namespaces | Target namespaces for restore | string | Comma-separated list | "prod-ns,app-ns" |
| cluster | Target cluster name | string | Cluster identifier | "staging-cluster" |
| user_id | Restore owner | string | User identifier | "admin@company.com" |
| cluster_uid | Target cluster UUID | string | UUID format | "b2c3d4e5-f6g7-8901-bcde-f23456789012" |
| error_reason | Error details for failed restores | string | Error message or empty | "PVC creation failed" |
| backup | Source backup name | string | Original backup name | "mysql-backup-001" |
| timestamp_in_secs | Last update timestamp | string | Unix timestamp | "1699123456" |
| org_id | Organization ID | string | Organization identifier | "default" |
| backfill | Backfilled metric indicator | string | "true" or empty | "" |
Status Values:
0: Invalid - Restore object is invalid1: Pending - Restore is queued2: InProgress - Restore is running3: Aborted - Restore was aborted4: Failed - Restore failed5: Deleting - Restore is being deleted6: Success - Restore completed successfully7: Retained - Restore data retained8: PartialSuccess - Restore completed with some failures
pxbackup_restore_count
Type: Counter Lifecycle: Created at first restore completion, incremented on each subsequent restore, persists indefinitely Description: Total number of restore operations (cumulative) Usage: Track restore frequency
| Label | Description | Type | Value Range | Example |
|---|---|---|---|---|
| cluster | Target cluster name | string | Cluster identifier | "cluster-name" |
| cluster_uid | Target cluster UUID | string | UUID format | "670XXXXX-9b11-40a3-XXXX-eda95aXXXXXX" |
| org_id | Organization ID | string | Organization identifier | "default" |
| status | Final restore status | string | Status enum as string | "Failed" |
| user_id | Restore owner | string | User identifier/UUID | "70aXXXXX-419c-429f-XXXX-e302c2XXXXXX" |
Status Values: "Success", "Failed", "PartialSuccess"
Cluster Metrics
pxbackup_cluster_status
Type: Gauge Lifecycle: Created at cluster registration, updated on connectivity checks, removed on cluster deletion Description: Health status of registered clusters Usage: Monitor cluster connectivity, detect offline clusters
| Label | Description | Type | Value Range | Example |
|---|---|---|---|---|
| name | Cluster name | string | User-defined cluster name | "production-k8s" |
| user_id | Cluster owner | string | User identifier | "admin@company.com" |
| org_id | Organization ID | string | Organization identifier | "default" |
| cluster_uid | Unique cluster identifier | string | UUID format | "c3d4e5f6-g7h8-9012-cdef-345678901234" |
| error_reason | Error details for failed clusters | string | Error message or empty | "Connection timeout" |
| timestamp_in_secs | Last status update time | string | Unix timestamp | "1699123456" |
| backfill | Backfilled metric indicator | string | "true" or empty | "" |
Status Values:
0: Invalid - Cluster configuration is invalid1: Online - Cluster is healthy and accessible2: Offline - Cluster is not reachable3: DeletePending - Cluster is marked for deletion4: Pending - Cluster registration is pending5: Failed - Cluster registration/connection failed6: Success - Cluster successfully registered but not online yet
Backup Location Metrics
pxbackup_backup_location_status
Type: Gauge Lifecycle: Created when backup location is configured, updated during periodic validation checks, removed on location deletion Description: Status of backup locations in Portworx Backup Usage: Monitor backup destination health
| Label | Description | Type | Value Range |
|---|---|---|---|
| name | Backup location name | string | User-defined location name |
| user_id | Location owner | string | User identifier |
| org_id | Organization ID | string | Organization identifier |
| error_reason | Error details | string | Error message or empty |
| timestamp_in_secs | Last validation time | string | Unix timestamp |
| backfill | Backfilled metric | string | "true" or empty |
Status Values:
0: Invalid - Location configuration is invalid1: Valid - Location is accessible and working2: DeletePending - Location is being deleted3: ValidationInProgress - Location is being validated4: ValidationFailed - Location validation failed5: LimitedAvailability - Location has limited functionality
pxbackup_backuplocation_metrics
Note: This metric is excluded in the OCP Prometheus
Type: Gauge Lifecycle: Created when backup location is configured/added, value remains constant at 1, removed on location deletion Description: Count of configured backup locations Usage: Track backup destination inventory
Labels: name, user_id, org_id Value: Always 1 (indicates location exists)
Cloud Credential Metrics
pxbackup_cloudcred_metrics
Note: This metric is excluded in the OCP Prometheus
Type: Gauge Lifecycle: Created when cloud credential is configured/added, value remains constant at 1, removed on credential deletion Description: Count and type of cloud credentials configured in Portworx Backup Usage: Track credential inventory
| Parameter | Description | Type | Value Range |
|---|---|---|---|
| name | Credential name | string | User-defined name |
| user_id | Credential owner | string | User identifier |
Cloud Credential Type Values:
0: Invalid - Invalid credential type1: AWS - Amazon Web Services credentials2: Azure - Microsoft Azure credentials3: Google - Google Cloud Platform credentials4: IBM - IBM Cloud credentials5: Rancher - Rancher credentials
Policy Metrics
pxbackup_schedpolicy_metrics
Note: This metric is excluded in the OCP Prometheus
Type: Gauge Lifecycle: Created when backup schedule policy is configured, removed on policy deletion Description: Count of schedule policies in Portworx Backup Usage: Track policy inventory
Labels: name, type, user_id Value: Always 1 (indicates policy exists)
pxbackup_volumeresourceonlypolicy_metrics
Type: Gauge Lifecycle: Created when volume resource only policy is configured, removed on policy deletion Description: Count of volume resource only policies Usage: Track specialized policy inventory
Labels: name, type, user_id Value: Always 1 (indicates policy exists)
pxbackup_rule_metrics
Type: Gauge Lifecycle: Created when rule is configured, removed on rule deletion Description: Count of backup rules in Portworx Backup Usage: Track rule inventory
Labels: name, user_id Value: Always 1 (indicates rule exists)
Backup information metrics, virtual machine metrics, backup volume metrics, and virtual machine resource metrics are supported starting from PX-Backup version 2.10.1
Backup Information Metrics
pxbackup_backup_object_info
Type: Gauge Lifecycle: Created at backup start, updated during execution, removed on backup object deletion. Description: Comprehensive backup information aggregating data from multiple backup-related metrics. Usage: Monitor complete backup details including scheduling, retention, resources, and virtual machines. Rentention period: Default period is 24 hours, can be reset by setting the Helm param: pxbackup.backupInfoMetricsBackfillHours. It can be set to a maximum of (720 hours) 30 days. Setting the value to 0 will remove the metrics from Portworx Backup.
To enable these metrics for OpenShift Container Platform (OCP) Prometheus or external Prometheus servers, you must set the pxbackup.enableExternalMetricsScraping Helm parameter during installation or upgrade.
| Label | Type | Description |
|---|---|---|
| name | string | Name of the backup object |
| uid | string | Unique identifier for the backup object |
| org_id | string | Organization UID that owns this backup |
| create_time_in_sec | int64 | Creation time in seconds (Unix timestamp) |
| cluster | string | Name of the cluster if this backup is syned backup |
| namespaces | string | Namespaces where the backup is taken |
| label_selectors | string | Label selectors to choose resources for backup |
| status | string | Current status of the backup operation [ Failed(4), Success(6), PartialSuccess(8)] |
| status_reason | Status reason of the backup operation | |
| backup_path | string | Path where backup is stored |
| backup_schedule_name | string | Name of the backup schedule, if the backup was taken by schedule |
| backup_schedule_uid | string | Unique identifier of the backup schedule, if the backup was taken by schedule |
| total_size | integer | Total size of the backup |
| resource_count | integer | Total count of resources in backup |
| backup_location_name | string | Name of the backup location |
| backup_location_uid | string | uid of the backup location |
| cloud_credential_name | string | Name of the cloud credential object attached |
| cloud_credential_uid | string | Unique identifier for the cloud credential rule object |
| backup_type | string | Type of backup (generic or normal) |
| retention_period | integer | Backup retention period |
| cluster_name | string | Reference to cluster object |
| cluster_uid | string | Unique identifier for the cluster object |
| ns_label_selectors | string | Label selectors for choosing namespaces |
| large_resource_enabled | bool | This flag signifies if the backup involves large number of resources or not |
| backup_object_type | string | [Values = All, VirtualMachine] Gives output of whether it is for all application or virtual machine specific backup |
| skip_vm_auto_exec_rules | bool | Skip auto execution rules for VirtualMachine backup |
| direct_kdmp | bool | Option to take backup as direct KDMP |
| retention_time | string | Expiration timestamp for locked backup retention |
| volumes_completion_time | string | This will store timestamp for the completion of volumes |
| resources_completion_time | string | This will store timestamp for the completion of resources |
| total_completion_time | string | This will store timestamp for the completion of entire backup |
| advanced_resource_label_selector | string | Advanced label selector supporting operators |
| schedule_policy_name | string | Name of the schedule policy object attached |
| schedule_policy_uid | string | Unique identifier of the schedule policy object attached |
| virtual_machines_total_count | int64 | Total count of virtual machines |
| virtual_machines_failed_count | int64 | Count of failed virtual machines |
| volume_resource_only_policy_name | string | Name of the volume resource only policy attached |
| volume_resource_only_policy_uid | string | Unique Identifier of the volume resource only policy attached |
Virtual Machine Metrics
pxbackup_virtual_machine_info
Type: Gauge Lifecycle: Created when virtual machine backup starts, updated during backup execution, removed on backup object deletion. Description: Information about virtual machines included in backups. Usage: Track virtual machine backup status and details. Rentention period: Default period is 24 hours, can be reset by setting the Helm param: pxbackup.backupInfoMetricsBackfillHours. It can be set to a maximum of (720 hours) 30 days. Setting the value to 0 will remove the metrics from Portworx Backup.
To enable these metrics for OpenShift Container Platform (OCP) Prometheus or external Prometheus servers, you must set the pxbackup.enableExternalMetricsScraping Helm parameter during installation or upgrade.
| Label | Type | Description |
|---|---|---|
| backup_name | string | Name of the backup in which this volume is part of |
| backup_id | string | Unique Reference to the backup Object |
| schedule_policy_name | string | Name of the schedule policy object attached |
| schedule_policy_uid | string | Unique identifier of the schedule policy object attached |
| cluster_name | string | Reference to cluster object |
| cluster_uid | string | Unique identifier for the cluster object |
| name | string | Name of the virtual machine |
| namespace | string | Namespace of the virtual machine |
| os_name | string | Operating system name |
| status | string | Status of the virtual machine backup |
| status_reason | string | Status reason of the virtual machine backup |
| create_time_in_sec | int64 | Creation time in seconds (Unix timestamp) |
Backup Volume Metrics
pxbackup_backup_volume_info
Type: Gauge Lifecycle: Created when volume backup starts, updated during backup execution, removed on backup object deletion. Description: Detailed information about volumes included in backups. Usage: Track volume backup status, sizes, and storage details. Rentention period: Default period is 24 hours, can be reset by setting the Helm param: pxbackup.backupInfoMetricsBackfillHours. It can be set to a maximum of (720 hours) 30 days. Setting the value to 0 will remove the metrics from Portworx Backup.
To enable these metrics for OpenShift Container Platform (OCP) Prometheus or external Prometheus servers, you must set the pxbackup.enableExternalMetricsScraping Helm parameter during installation or upgrade.
| Label | Type | Description |
|---|---|---|
| backup_name | string | Name of the backup in which this volume is part of |
| backup_id | string | Unique Reference to the backup Object |
| name | string | Name of the volume |
| namespace | string | Namespace of the volume |
| pvc | string | Persistent Volume Claim name |
| status | string | Status Value of the metric [ Failed(4), Success(6), PartialSuccess(8)] |
| driver_name | string | Storage driver name |
| total_size | integer | Total size of the volume |
| actual_size | integer | Actual backup size (incremental size for incremental backups) |
| storage_class | string | Storage class of the volume |
| pvc_id | string | Unique identifier for the PVC |
| provisioner | string | Storage provisioner |
| volume_snapshot | string | Volume snapshot reference |
| virtual_machine_name | string | Associated virtual machine name |
| backup_mode | string | Backup mode [ Not Supported(1), Full(2), Incremental(3) ] |
Virtual Machine Resource Metrics
pxbackup_virtual_machine_resource_info
Type: Gauge Lifecycle: Created when virtual machine resource backup starts, updated during backup execution, removed on backup object deletion Description: Information about Kubernetes resources associated with virtual machines Usage: Track resource backup details for virtual machine workloads Rentention period: Default period is 24 hours, can be reset by setting the Helm param: pxbackup.backupInfoMetricsBackfillHours. It can be set to a maximum of (720 hours) 30 days. Setting the value to 0 will remove the metrics from Portworx Backup.
To enable these metrics for OpenShift Container Platform (OCP) Prometheus or external Prometheus servers, you must set the pxbackup.enableExternalMetricsScraping Helm parameter during installation or upgrade.
| Label | Type | Description |
|---|---|---|
| virtual_machine_name | string | virtual machine name associated with this resource |
| name | string | Name of the resource |
| namespace | string | Namespace of the resource |
| group | string | Group of the resource |
| kind | string | kind of the resource |
| version | string | version of the resource |
| backup_name | string | Name of the backup in which this volume is part of |
| backup_id | string | Unique Reference to the backup Object |
Unsupported Metrics
The following metrics are not fully supported yet, it is advised to exclude them in a production environment.
pxbackup_backup_size_bytespxbackup_backup_duration_secondspxbackup_backup_volume_countpxbackup_backup_resource_countpxbackup_restore_size_bytespxbackup_restore_duration_secondspxbackup_restore_volume_countpxbackup_restore_resource_count