Collect diagnostics
About diagnostics
The Portworx diagnostics bundle, known as diags, is a support bundle that provides all the necessary information for the Portworx support team to diagnose any problems in your cluster. It includes the following information:
- Portworx journal logs
- Output from common Portworx CLI commands that provide details about cluster, nodes, and volumes
- Basic information about the operating system of the worker nodes
- Stack and heap files of Portworx processes
- Alerts generated by the cluster
- Cores or traces from Portworx processes (if found)
Portworx automatically uploads the diags to Pure Storage's call home service, Pure1. This feature eliminates the need for manual work of uploading diags to support tickets, ultimately reducing the time required to resolve cluster issues.
Prerequisites
Outbound access to the internet to allow connection to Pure1
Enable Pure1 integration
Telemetry to Pure1 is enabled by default when you generate the StorageCluster spec from Portworx Central, unless you disable it by setting spec.monitoring.telemetry.enabled to false.
If telemetry is enabled and the PX_HTTPS_PROXY environment variable is set, Portworx disables telemetry and creates a Kubernetes event of type Warning to notify you.
If you disabled telemetry during installation, you can enable it by setting spec.monitoring.telemetry.enabled to true in your StorageCluster spec:
spec:
monitoring:
telemetry:
enabled: true
Enabling or disabling telemetry on an existing cluster triggers a rolling restart of the Portworx cluster for the change to take effect. It is recommended to perform this update during a planned maintenance window to avoid any service disruption.
After you enable Pure1 integration:
- Portworx automatically uploads the diagnostics bundle and real-time Prometheus metrics from your Portworx cluster to Pure1. If you are running Portworx Operator version 25.5.2 or later, you can disable uploading real-time Prometheus metrics to Pure1 by disabling the metrics collector while keeping telemetry enabled. For more information, see Customize metrics collector.
- You can use Pure1 AI Copilot to interact with your Portworx storage clusters using natural language. For more information, see Portworx on Pure1 AI Copilot.
If you are running Operator version 25.5.2 and later with telemetry enabled and you use GitOps to manage your StorageCluster, set metricsCollector.enabled to true in your Git manifest to keep the repository in sync with the cluster state. This is because applying a StorageCluster spec with only telemetry.enabled: true automatically sets metricsCollector.enabled to true.
spec:
monitoring:
telemetry:
enabled: true
metricsCollector:
enabled: true
Additional configuration for air-gapped clusters
In air-gapped clusters, telemetry is supported via a simple or traditional proxy that serves in HTTP mode. If you have a next generation firewall acting as a proxy, or if you have problems with the telemetry configuration, contact Portworx Support for assistance.
To enable telemetry, configure a proxy and add it to the PX_HTTP_PROXY or PX_HTTPS_PROXY environment variable in your StorageCluster specification. Telemetry communicates through the specified proxy.
...
spec:
env:
- name: PX_HTTP_PROXY
value: "http://<IP:port>"
- name: PX_HTTPS_PROXY
value: "http://<IP:port>"
...
Customize metrics collector
If you are running Portworx Operator 25.5.2 or later, you can configure the metrics collector separately from other telemetry components.
The metrics collector requires telemetry to be enabled. When telemetry is enabled, the metrics collector is enabled by default. You can disable the metrics collector while keeping telemetry enabled, but you cannot enable the metrics collector without telemetry.
The telemetry metrics collector scrapes Prometheus metrics from Portworx pods, forwards these real-time metrics to Pure1 for monitoring and analytics, and provides data for Pure1 dashboards showing capacity, utilization, and performance trends.
Disable metrics collector while keeping telemetry enabled
When you disable the metrics collector, real-time metrics forwarding to Pure1 stops, which means Pure1 dashboards no longer display current metrics. However, diagnostics upload to Pure1 continues to work, and support functionality remains intact. Pure1 AI Copilot continues to function because it uses diagnostics data. Your own Prometheus monitoring solutions continue to work normally because Portworx continues exporting metrics locally.
spec:
monitoring:
telemetry:
enabled: true
metricsCollector:
enabled: false
This configuration disables real-time metrics forwarding to Pure1 while maintaining diagnostics upload for support purposes.
Use a custom image for the metrics collector
spec:
monitoring:
telemetry:
enabled: true
metricsCollector:
enabled: true
image: "portworx/realtime-metrics:<version>"
This configuration allows you to deploy a specific image version for the metrics collector.
How to use diags with support tickets
To resolve a support case, you need to provide your cluster UUID to the Portworx support team. This allows them to retrieve your diags from Pure1 and diagnose the problem in your cluster.
Find your cluster's UUID
Run the following command to get your cluster UUID:
- Kubernetes
- Openshift
kubectl get storagecluster -n <px-namespace>
NAME CLUSTER UUID STATUS VERSION AGE
px-cluster-xxxxxxxx-xxxx-xxxx-xxxx-ec53d3ba5b39 xxxxxxxx-xxxx-xxxx-xxxx-b2249dff3501 Online 01d934d_d03a058 26h
oc get storagecluster -n <px-namespace>
NAME CLUSTER UUID STATUS VERSION AGE
px-cluster-xxxxxxxx-xxxx-xxxx-xxxx-ec53d3ba5b39 xxxxxxxx-xxxx-xxxx-xxxx-b2249dff3501 Online 01d934d_d03a058 26h
Collect diagnostics
Portworx diagnostics are collected primarily in two ways:
On-demand diagnostics
You can collect diagnostics either using the pxctl CLI or the PortworxDiag custom resource.
On-demand diagnostics using pxctl CLI
You can run diagnostics directly from the node using the pxctl CLI. The most commonly used command is:
/opt/pwx/bin/pxctl service diags -a
This command generates a diagnostics bundle on the node. If Pure1 telemetry is enabled, the bundle is automatically uploaded to Pure1.
On-demand diagnostics using PortworxDiag custom resource
The PortworxDiag custom resource is only available when using Portworx Enterprise 3.3.0 and Operator 25.2.0 or later.
The PortworxDiag custom resource allows you to collect diagnostics at the cluster level. When you create this resource, the Portworx Operator:
- Creates temporary pods for diagnostics collection.
- Collects node-level diagnostics and Portworx pod logs.
- Stores the diagnostics bundle in the
/var/coresdirectory. - Deletes the pods after the logs are collected.
The diagnostics bundle includes the following:
- Node diagnostics:
- Output from
pxctl service diags - Stack traces
- Heap logs
- Output from
- Pod diagnostics:
- Logs from all Portworx pods
- Pod YAML manifests
You can also specify the nodes and volumes for which you want to collect diagnostics. For more information, see PortworxDiag CRD.
To collect diagnostics:
-
Enable the
spec.clusterDiagsfield in yourStorageClusterCR spec.StorageCluster CR...
spec:
clusterDiags:
enabled: true
... -
Create a
PortworxDiagcustom resource. For more information, see PortworxDiag CRD.
Portworx is installed in the portworx namespace by default. If you didn’t customize the namespace during installation, use portworx for the namespace field in the manifest. If you used a different namespace, replace <NAMESPACE> with your specific namespace.
Example:
apiVersion: portworx.io/v1
kind: PortworxDiag
metadata:
name: cluster-diag-obj1
namespace: <namespace> # Replace with your namespace; if default, use portworx
spec:
portworx:
generateCore: true
podDiags: true
nodes:
all: true # You can either use `all: true` or specify IDs and/or labels by using the `ids` and/or `labels` fields.
ids: []
labels: {}
volume:
ids: []
labels: {}
- Apply the
PortworxDiagcustom resource.
kubectl apply -f <your-portworxdiag-resource>.yaml
Once diagnostics collection is complete, the status of the custom resource includes the name and path of each diagnostics tar file.
On crash
If a Portworx process runs into an issue on a node, it will automatically collect diagnostics. If Pure1 telemetry is enabled, the diagnostics bundle is automatically uploaded to Pure1.
Disable Pure1 integration
To disable the metrics collector and telemetry integration, add the following section in your StorageCluster spec:
...
spec:
monitoring:
telemetry:
enabled: false
...
This removes the telemetry container from the Portworx pod.
If you want to disable only the metrics collector while keeping telemetry active for diagnostics upload, refer to the Customize metrics collector section for more information.
See Generate a complete diagnostics package for the entire CLI syntax.