Version: 3.1

Grafana with Portworx

note

This document presents the non-Kubernetes method of monitoring your Portworx cluster with Grafana. Please refer to the Prometheus and Grafana page if you are running Portworx on Kubernetes.

Prerequisites

Some of the panels in the Portworx dashboards require Node Exporter running on Portworx nodes. Without Node Exporter, these panels might show ‘No data’.
The Prometheus AlertManager plugin must be installed on Grafana, and the AlertManager endpoint must be added as a data source.

Configure Grafana

Start Grafana with the following docker run command

docker run --restart=always --name grafana -d -p 3000:3000 grafana/grafana

Log in to Grafana at http://your_ip_address:3000 in your browser. The default Grafana login is admin/admin.
Once logged in, Grafana will ask you to configure your datastore. Use the Prometheus that you configured earlier. To use the templates that are provided later, name your datastore 'prometheus'. In the screen below:
- Choose 'Prometheus' from the 'Type' dropdown.
- Name datastore 'prometheus'
- Add URL of your prometheus UI under Http settings -> Url
- Select Save & Test
Import the Portworx provided Cluster, Volume, Node, and Performance Grafana templates: From the dropdown on left in your Grafana dashboard, select Dashboards followed by Import, and add the cluster, volume, and node templates.

note

The Performance dashboard should be run only on systems without memory constraints. The dashboard plots a large dataset range, so it's not recommended for systems that have restricted memory.
Some clusters running with caching enabled might not show any data in I/O Rate.

Once added, you can view your dashboards. The following section provides a detailed description of various Grafana dashboards for monitoring Portworx.

Grafana dashboards for Portworx

Grafana offers several built-in dashboards for monitoring Portworx. These dashboards provide a real-time view of the system’s performance and status, helping you maintain optimal performance and quickly diagnose any issues.

Etcd dashboard

The Etcd Dashboard provides metrics specific to the etcd component, which is critical for cluster coordination.

Grafana Etcd dashboard

Key panels include:

Disk Sync Duration: Tracks the latency of persisting etcd log entries to disk. High values (> 1s) may indicate issues with the KVDB disk metrics.
Up: Monitors the health of KVDB nodes.

Portworx Cluster dashboard

This dashboard provides an overview of the cluster's storage and health.

Grafana cluster dashboard

Key panels include:

Usage Meter: Displays the percentage of utilized storage compared to total capacity.
Capacity Used: Shows the actual storage space used in the cluster.
Nodes (total): Displays the number of nodes in the Portworx cluster.
Storage Providers: Indicates how many storage nodes are currently online.
Quorum: Tracks the quorum status of the cluster.
Nodes online: Number of online nodes in the cluster (includes storage and storage-less).
Avg. Cluster CPU: Monitors the average CPU usage across all nodes.

Portworx Node dashboard

The Node dashboard focuses on individual nodes within the cluster.

Grafana node dashboard

Key panels include:

PWX Disk Usage: Monitors the Portworx storage space used per node.
PWX Disk IO: Displays the time spent on disk read and write operations per node.
PWX Disk Throughput: Shows the rate of total bytes read and written for each node.
PWX Disk Latency: Provides the average time spent on read and write operations for each node.

Portworx Volume dashboard

The Volume dashboard provides insights into the performance and utilization of storage volumes within the cluster. It is divided into two main sections: All Volumes in the Cluster and Individual Volumes, offering a detailed view of both overall and per-volume metrics.

Grafana volume dashboard

All Volumes in the Cluster

This section displays metrics aggregated across all volumes in the cluster, helping you track overall performance and identify any potential bottlenecks.

Avg Read Latency (1m): Average time (in seconds) spent on completing read operations during the last minute for all volumes.
Avg Write Latency (1m): Average time (in seconds) spent on completing write operations during the last minute for all volumes.
Top n Volumes by Capacity: Lists the top n volumes in the cluster based on their storage capacity.
Top n Volumes by IO Depth: Lists the top n volumes based on the number of I/O operations currently in progress.

Individual Volumes

This section provides metrics for each individual volume in the cluster, allowing for detailed monitoring of specific volume performance and usage.

Replication Level (HA): Displays both the current and configured High Availability (HA) level for the volume.
Avg Read Latency: Average time (in seconds) spent per successfully completed read operation for the volume.
Avg Write Latency: Average time (in seconds) spent per successfully completed write operation for the volume.
Volume Usage: Shows the total capacity and the used storage space for the volume.
Volume Latency: Displays the average time (in seconds) spent per successfully completed read and write operations during the given interval.
Volume IOPs: Number of successfully completed I/O operations per second for the volume.
Volume IO Depth: Number of I/O operations currently in progress for the volume.
Volume IO Throughput: Displays the number of bytes read and written per second for the volume.

Portworx Performance dashboard

The Performance dashboard provides a comprehensive view of the performance metrics for your Portworx cluster. This dashboard helps you monitor the cluster’s overall health, storage usage, and I/O performance, enabling you to quickly identify any issues affecting performance.

Grafana performance dashboard

Key panels include:

Members: Displays the total number of nodes in your Portworx cluster.
Total Volumes: The total number of volumes in the cluster.
Storage Providers: Number of storage nodes that are currently online.
Attached Volumes: Indicates the number of volumes that are attached to the nodes.
Storage Offline: The count of nodes where the storage is either full or down.
Avg HA Level: The average High Availability (HA) level of all volumes in the cluster.
Total Available: Displays the total available storage space in the cluster.
Total Used: The total size of volumes that have been provisioned. This is calculated based on the utilized disk space across all nodes.
Volume Total Used: Shows the used storage space of all volumes combined.
Storage Usage: Displays the utilized storage space for each individual node.
Storage Pending IO: Number of read and write operations that are currently in progress for each node.

Volume-specific metrics

Latency (Volume): Displays the average time (in seconds) spent per successfully completed read and write operations for each volume during the specified interval.
Discarded Bytes: The total number of discarded bytes on the volume. These discards are replicated based on the volume’s replication factor. When an application deletes files, the file system converts these deletions into block discards on the Portworx volume.
PX Pool Write Latency: The write latency experienced by Portworx when writing I/O operations to the page cache.
PX Pool Write Throughput: The write throughput observed by Portworx, combining all I/O operations across all replicas provisioned on the pool. These represent the application-level I/Os performed on the pool.
PX Pool Flush Latency: The time taken for Portworx to complete periodic flush/sync operations, which ensure the stability of data and associated metadata in the page cache.
PX Pool Flush Throughput: The amount of data synced during each flush/sync operation, averaged over the time period.
Volume IO Throughput: The amount of data being synced by the periodic flush/sync operation, averaged over the interval.

Custom metrics and additional monitoring

Portworx also offers a wide range of custom metrics for monitoring specific aspects of your environment. For more information on available metrics, you can refer to the Portworx Metrics documentation.

Using Grafana to monitor Portworx clusters provides visibility into the health, performance, and usage of your storage environment. With built-in dashboards and customizable metrics, you can quickly identify issues and ensure your infrastructure runs smoothly.

Prerequisites​

Configure Grafana​

Grafana dashboards for Portworx​

Etcd dashboard​

Portworx Cluster dashboard​

Portworx Node dashboard​

Portworx Volume dashboard​

Portworx Performance dashboard​

Custom metrics and additional monitoring​

Prerequisites

Configure Grafana

Grafana dashboards for Portworx

Etcd dashboard

Portworx Cluster dashboard

Portworx Node dashboard

Portworx Volume dashboard

Portworx Performance dashboard

Custom metrics and additional monitoring