Portworx Alerts

PX provides a way to monitor your cluster using alerts. It has a predefined set of alerts which are listed below. The alerts are broadly classified into the following types based on the Resource on which it is raised

  1. Cluster
  2. Nodes
  3. Disks
  4. Volumes

Each alert has a severity from one of the following levels:

  1. INFO
  2. WARNING
  3. ALARM

List of Alerts

Alert CodesAlert TypeSeverityResource TypeDescription
0DriveOperationFailureALARMDRIVETriggered when a driver operations such as add or replace fails.
1DriveOperationSuccessNOTIFYDRIVETriggered when a driver operations such as add or replace fails.
2DriveStateChangeWARNDRIVETriggered when there is a change in the driver state viz. Free Disk space goes below the recommended level of 10%.
3VolumeOperationFailureAlarmALARMVOLUMETriggered when a volume operation fails. Volume operations could be resize, cloudsnap etc. The alert message will give more info about the specific error case.
4VolumeOperationSuccessNOTIFYVOLUMETriggered when a volume operation such as resize succeeds.
5VolumeStateChangeWARNVOLUMETriggered when there is a change in the state of the volume.
6VolGroupOperationFailureALARMCLUSTERTriggered when a volume group operation fails.
7VolGroupOperationSuccessNOTIFYCLUSTERTriggered when a volume group operation succeeds.
8VolGroupStateChangeWARNCLUSTERTriggered when a volume group’s state changes.
9NodeStartFailureALARMCLUSTERTriggered when a node in the PX cluster fails to start.
10NodeStartSuccessNOTIFYCLUSTERTriggered when a node in the PX cluster successfully initializes.
11>Internal PX Alert<--Alert code used for internal PX book keeping.
12NodeJournalHighUsageALARMCLUSTERTriggered when a node’s timestamp journal usage is not within limits.
13IOOperationALARMVOLUMETriggered when an IO operation such as Block Read/Block Write fails.
14-16>Internal PX Alerts<--Alert codes used for internal PX book keeping.
17PXInitFailureALARMNODETriggered when PX fails to initialize on a node.
18PXInitSuccessNOTIFYNODETriggered when PX successfully initializes on a node.
19PXStateChangeWARNNODETriggered when the PX daemon shuts down in error.
20VolumeOperationFailureWarnWARNVOLUMETriggered when a volume operation fails. Volume operations could be resize, cloudsnap etc. The alert message will give more info about the specific error case.
21StorageVolumeMountDegradedALARMNODETriggered when PX storage enters degraded mode on a node.
22ClusterManagerFailureALARMNODETriggered when Cluster manager on a PX node fails to start. The alert message will give more info about the specific error case.
23KernelDriverFailureALARMNODETriggered when an incorrect PX kernel module is detected. Indicates that PX is started with an incorrect version of kernel module.
24NodeDecommissionSuccessNOTIFYCLUSTERTriggered when a node is successfully decommissioned from PX cluster.
25NodeDecommissionFailureALARMCLUSTERTriggered when a node could not be decommissioned from PX cluster.
26NodeDecommissionPendingWARNCLUSTERTriggered when a node decommission is kept in pending state as it has data which is not replicated on other nodes.
27NodeInitFailureALARMCLUSTERTriggered when PX fails to initialize on a node.
28>Internal PX Alert<--Alert code used for internal PX book keeping.
29NodeScanCompletionNOTIFYNODETriggered when node media scan completes without error.
30VolumeSpaceLowALARMVOLUMETriggered when the free space available in a volume goes below a threshold.
31ReplAddVersionMismatchWARNVOLUMETriggered when a volume HA update fails with version mismatch.
32CloudsnapScheduleFailureALARMNODETriggered if a cloudsnap schedule fails to configure.
33CloudsnapOperationUpdateNOTIFYVOLUMETriggered if a cloudsnap schedule is changed successfully.
34CloudsnapOperationFailureALARMVOLUMETriggered when a cloudsnap operation fails.
35CloudsnapOperationSuccessNOTIFYVOLUMETriggered when a cloudsnap operation succeeds.
36NodeMarkedDownWARNCLUSTERTriggered when a PX node marks another node down as it is unable to connect to it.
37VolumeCreateSuccessNOTIFYVOLUMETriggered when a volume is successfully created.
38VolumeCreateFailureALARMVOLUMETriggered when a volume creation fails.
39VolumeDeleteSuccessNOTIFYVOLUMETriggered when a volume is successfully deleted.
40VolumeDeleteFailureALARMVOLUMETriggered when a volume deletion fails.
41VolumeMountSuccessNOTIFYVOLUMETriggered when a volume is successfully mounted at the requested path.
42VolumeMountFailureALARMVOLUMETriggered when a volume cannot be mounted at the requested path.
43VolumeUnmountSuccessNOTIFYVOLUMETriggered when a volume is successfully unmounted.
44VolumeUnmountFailureALARMVOLUMETriggered when a volume cannot be unmounted. The alert message provides more info about the specific error case.
45VolumeHAUpdateSuccessNOTIFYVOLUMETriggered when a volume’s replication factor (HA factor) is successfully updated.
46VolumeHAUpdateFailureALARMVOLUMETriggered when an update to volume’s replication factor (HA factor) fails.
47SnapshotCreateSuccessNOTIFYVOLUMETriggered when a volume is successfully created.Snapshot create success
48SnapshotCreateFailureALARMVOLUMETriggered when a volume snapshot creation fails.
49SnapshotRestoreSuccessNOTIFYVOLUMETriggered when a snapshot is successfully restored on a volume.
50SnapshotRestoreFailureALARMVOLUMETriggered when the restore of snapshot fails.
51SnapshotIntervalUpdateFailureALARMVOLUMETriggered when an update of the snapshot interval for a volume fails.
52SnapshotIntervalUpdateSuccessNOTIFYVOLUMETriggered when a snapshot interval of a volume is successfully updated.
53PXReadyNOTIFYNODETriggered when PX is ready on a node.
54StorageFailureALARMNODETriggered when the provided storage drives could not be mounted by PX.