Monitor PX-CSI with Prometheus
PX-CSI integrates with Prometheus to provide built-in metrics monitoring. This integration enables you to observe the health and performance of PX-CSI components across your cluster.
Monitoring is enabled by default. PX-CSI deploys Prometheus components into the portworx namespace, where Prometheus collects metrics from the PX-CSI controller and node plugin pods. You can view these metrics in the Prometheus UI or integrate them with tools such as Alertmanager and Grafana.
PX-CSI includes the following components as part of its monitoring stack:
- The Portworx Operator deploys the Prometheus Operator, which manages a custom resource defining the Prometheus configuration and deploys the Prometheus stack.
When monitoring is active, PX-CSI deploys the following resources in the portworx namespace:
- The Prometheus custom resource named
px-prometheus, which defines retention periods, scrape intervals, and other Prometheus settings. - A Prometheus instance, deployed as a StatefulSet, which collects and serves metrics.
PX-CSI creates and configures these components automatically. You do not need to deploy or manage them manually.
PX-CSI uses the Prometheus instance that OpenShift deploys for monitoring, rather than deploying px-prometheus. On OpenShift, set the spec.monitoring.prometheus.enabled field to false and the spec.monitoring.prometheus.exportMetrics field to true in your StorageCluster.
Enable monitoring
Monitoring is enabled by default. If it is disabled during installation, you can enable it by editing the StorageCluster specification.
- OpenShift Container Platform
- Other Kubernetes platforms
spec:
monitoring:
prometheus:
enabled: false # Do not deploy PX Prometheus. Use OpenShift’s Prometheus instead.
exportMetrics: true # Export metrics to OpenShift Prometheus
spec:
monitoring:
prometheus:
enabled: true # Deploy PX-CSI Prometheus stack
exportMetrics: true # Export metrics via ServiceMonitors
The enabled field determines if PX-CSI deploys its own Prometheus Operator and Prometheus instance. The exportMetrics field controls whether ServiceMonitor resources are created to allow Prometheus to scrape metrics from PX-CSI components.
Disable monitoring
To disable monitoring, update the StorageCluster specification:
spec:
monitoring:
prometheus:
enabled: false
exportMetrics: false
Verify monitoring
These steps are not applicable on OpenShift Container Platform, because PX-CSI uses the Prometheus instance deployed by OpenShift.
To confirm that monitoring is active and running as expected:
-
Run the following command to check for Prometheus pods in the
portworxnamespace:kubectl -n portworx get pods | grep prometheusThe output should include pod names similar to:
prometheus-px-prometheus-0 2/2 Running 0 23h
px-prometheus-operator-764bb9c6cb-9qgvd 1/1 Running 0 23h -
Run the following command to forward Prometheus to port 9090:
kubectl -n portworx port-forward prometheus-px-prometheus-0 9090:9090 -
Open http://localhost:9090/targets to view the Prometheus targets.
Verify that targets such as
px-pure-csi-controllerandpx-pure-csi-nodeare listed and have a status of UP.
Prometheus metrics
PX-CSI metrics contain information about PersistentVolumeClaim (PVC) usage, volume lifecycle operations, API requests to external endpoints, CSI call latencies, and host connection health.
API latency metrics
| Metric | Type | Description |
|---|---|---|
| px_csi_create_volume_latency_ms | Histogram | Latency histogram for the CreateVolume API |
| px_csi_delete_volume_latency_ms | Histogram | Latency histogram for the DeleteVolume API |
| px_csi_ctrlpublishvolume_latency_ms | Histogram | Latency histogram for the ControllerPublishVolume API |
| px_csi_ctrlunpublishvolume_latency_ms | Histogram | Latency histogram for the ControllerUnpublishVolume API |
| px_csi_nodestagevolume_latency_ms | Histogram | Latency histogram for the NodeStageVolume API |
| px_csi_nodeunstagevolume_latency_ms | Histogram | Latency histogram for the NodeUnstageVolume API |
| px_csi_nodepublishvolume_latency_ms | Histogram | Latency histogram for the NodePublishVolume API |
| px_csi_nodeunpublishvolume_latency_ms | Histogram | Latency histogram for the NodeUnpublishVolume API |
Volume attachment metric
| Metric | Type | Description |
|---|---|---|
| px_csi_attachments_per_node | Gauge | Number of volume attachments per node |
Volume usage metrics
Starting with PX-CSI 26.1.0, PX-CSI exposes volume usage metrics for filesystem volumes.
| Metric | Type | Description | Labels |
|---|---|---|---|
| px_volume_fs_usage_bytes | Gauge | Current filesystem used bytes in the volume | node, volumeid, volumename, pvc |
| px_volume_capacity_bytes | Gauge | Total capacity of the volume in bytes | node, volumeid, volumename, pvc |
FlashArray and FlashBlade API request metrics
| Metric | Type | Description |
|---|---|---|
| px_csi_fafb_all_apis_requests_total | Counter | Total API requests to all configured FA/FB endpoints |
| px_csi_fafb_apis_volumes_requests_total | Counter | API requests to the /volumes endpoint |
| px_csi_fafb_apis_array_requests_total | Counter | API requests to the /arrays endpoint |
| px_csi_fafb_apis_volumesnapshots_requests_total | Counter | API requests to the /volume-snapshots endpoint |
| px_csi_fafb_apis_hosts_requests_total | Counter | API requests to the /hosts endpoint |
| px_csi_fafb_apis_controllers_requests_total | Counter | API requests to the /controllers endpoint |
| px_csi_fafb_apis_ports_requests_total | Counter | API requests to the /ports endpoint |
| px_csi_fafb_apis_alerts_requests_total | Counter | API requests to the /alerts endpoint |
| px_csi_fafb_apis_connections_requests_total | Counter | API requests to the /connections endpoint |
| px_csi_fafb_apis_login_requests_total | Counter | API requests to the /login endpoint |
| px_csi_fafb_apis_version_requests_total | Counter | API requests to the /api_version endpoint |
Kubelet volume metrics
Starting with PX-CSI 26.1.0, kubelet exposes standard Kubernetes volume metrics for PX-CSI volumes.
| Metric | Type | Description |
|---|---|---|
| kubelet_volume_stats_capacity_bytes | Gauge | Total capacity of the volume in bytes |
| kubelet_volume_stats_used_bytes | Gauge | Number of used bytes in the volume |
| kubelet_volume_stats_available_bytes | Gauge | Number of available bytes remaining in the volume |
| kubelet_volume_stats_inodes | Gauge | Total number of inodes in the volume (filesystem volumes only) |
| kubelet_volume_stats_inodes_used | Gauge | Number of used inodes in the volume (filesystem volumes only) |
| kubelet_volume_stats_inodes_free | Gauge | Number of free inodes in the volume (filesystem volumes only) |
Host connection health metrics
PX-CSI exposes host connection health metrics. The CSI node driver periodically checks the health of host storage connections (iSCSI, NVMe, and FC) and multipath devices, and exposes the results as Prometheus gauge metrics. Use these metrics to detect degraded or lost storage connections before they affect your workloads.
All metrics in this section include the node_name label, which identifies the Kubernetes node that reports the metric.
Host connection health metrics are available in PX-CSI version 26.2.0 or later.
iSCSI connection metrics
| Metric Name | Type | Description | Labels |
|---|---|---|---|
| px_csi_node_iscsi_sessions | Gauge | Total number of iSCSI sessions on the node | node_name |
| px_csi_node_iscsi_sessions_healthy | Gauge | Number of healthy iSCSI sessions on the node | node_name |
A session is counted as healthy when its iSCSI session state is LOGGED_IN.
NVMe connection metrics
| Metric Name | Type | Description | Labels |
|---|---|---|---|
| px_csi_node_nvme_subsystems | Gauge | Number of available NVMe subsystems that match Pure FA target NQNs | node_name, subsysnqn |
| px_csi_node_nvme_connections | Gauge | Total number of NVMe connections | node_name, transport_type |
| px_csi_node_nvme_connections_healthy | Gauge | Number of healthy NVMe connections on the node | node_name, transport_type |
A connection is counted as healthy when its state is live. The subsysnqn label contains the NVMe Qualified Name of the subsystem. The transport_type label identifies the NVMe transport type (rdma, tcp, or fc).
FC connection metrics
| Metric Name | Type | Description | Labels |
|---|---|---|---|
| px_csi_node_fc_hosts | Gauge | Number of FC hosts on the node | node_name |
| px_csi_node_fc_hosts_online | Gauge | Number of FC hosts that are online | node_name |
| px_csi_node_fc_rports | Gauge | Number of available FC remote-port connections on the node | node_name |
| px_csi_node_fc_rports_online | Gauge | Number of FC remote-port connections that are online | node_name |
FC hosts and remote ports are counted as online when their port_state is Online.
Multipath device metrics
| Metric Name | Type | Description | Labels |
|---|---|---|---|
| px_csi_multipath_device_total_paths | Gauge | Total number of paths for a multipath device on the node | node_name, volume_id |
| px_csi_multipath_device_healthy_paths | Gauge | Number of healthy paths for a multipath device on the node | node_name, volume_id |
| px_csi_multipath_device_unhealthy_paths | Gauge | Number of unhealthy paths for a multipath device on the node | node_name, volume_id |
A path is counted as healthy when its state is active. The volume_id label identifies the PX-CSI volume associated with the multipath device.