Collect diagnostics
About diagnostics
The Portworx diagnostics bundle, known as diags, is a support bundle that provides all the necessary information for the Portworx support team to diagnose any problems in your cluster. It includes the following information:
- Portworx journal logs
- Output from common Portworx CLI commands that provide details about cluster, nodes, and volumes
- Basic information about the operating system of the worker nodes
- Stack and heap files of Portworx processes
- Alerts generated by the cluster
- Cores or traces from Portworx processes (if found)
Portworx automatically uploads the diags to Everpure's call home service, Pure1 when telemetry is enabled. This feature eliminates the need for manual work of uploading diags to support tickets, ultimately reducing the time required to resolve cluster issues.
Prerequisites
Outbound access to the internet to allow connection to Pure1
Enable Pure1 integration
Telemetry to Pure1 is enabled by default when you generate the StorageCluster spec from Portworx Central, unless you disable it by setting spec.monitoring.telemetry.enabled to false.
For information about enabling telemetry, see Portworx Telemetry.
If a Portworx process runs into an issue on a node, Portworx will automatically collect diagnostics. If Pure1 telemetry is enabled, the diagnostics bundle is automatically uploaded to Pure1.
Collect diagnostics
Using pxctl
You can run diagnostics directly from the node using the pxctl CLI. The most commonly used command is:
/opt/pwx/bin/pxctl service diags -a
This command generates a diagnostics bundle on the node. If Pure1 telemetry is enabled, the bundle is automatically uploaded to Pure1.
Portworx also automatically triggers diagnostics collection if a process encounters a critical issue on a node.
For more information abput collecting diagnostics using pxctl, see Generate a complete diagnostics package.
Using the PortworxDiag custom resource
The PortworxDiag custom resource is only available when using Portworx Enterprise 3.3.0 and Operator 25.2.0 or later.
The PortworxDiag custom resource allows you to collect diagnostics at the cluster level. When you create this resource, the Portworx Operator:
- Creates temporary pods for diagnostics collection.
- Collects node-level diagnostics and Portworx pod logs.
- Stores the diagnostics bundle in the
/var/coresdirectory. - Deletes the pods after the logs are collected.
The diagnostics bundle includes the following:
-
Node diagnostics (on-demand CRs only):
- Output from
pxctl service diags - Stack traces
- Heap logs
- Core dumps (when
generateCore: trueis set in the CR spec)
- Output from
-
Pod diagnostics:
- Logs from all Portworx pods
- Pod YAML manifests
noteStarting with Portworx Enterprise 3.6.1, pod diagnostics also include metadata from Kubernetes CRD resources for Portworx, Stork, KubeVirt, KDMP, and storage-related objects.
You can collect diagnostics using the PortworxDiag CR either on-demand or periodically.
- On-demand diagnostics collection using
PortworxDiagCR - Periodic diagnostics collection using
PortworxDiagCR
On-demand diagnostics collection using PortworxDiag CR
-
Enable the
spec.clusterDiagsfield in yourStorageClusterCR spec.StorageCluster CR...
spec:
clusterDiags:
enabled: true
... -
Create a
PortworxDiagcustom resource. For more information, see PortworxDiag CRD.Example:
PortworxDiag CRapiVersion: portworx.io/v1
kind: PortworxDiag
metadata:
name: cluster-diag-obj1
namespace: <portworx> # Replace with the namespace where Portworx is installed
spec:
portworx:
generateCore: true
podDiags: true
nodes:
all: true # You can either use `all: true` or specify IDs and/or labels by using the `ids` and/or `labels` fields.
ids: []
labels: {}
volumes:
ids: []
labels: {}You can also specify the nodes and volumes for which you want to collect diagnostics. For more information, see PortworxDiag CRD.
-
Apply the
PortworxDiagcustom resource.kubectl apply -f <your-portworxdiag-resource>.yaml
Once diagnostics collection is complete, the status of the custom resource includes the name and path of each diagnostics tar file.
Periodic diagnostics collection using PortworxDiag CR
The Portworx Operator supports automatic, periodic collection of pod-level diagnostics. When enabled, the operator creates PortworxDiag CRs every 4 hours and collects logs from all Portworx pods and pod YAML manifests. If you have enabled telemetry, the collected diagnostics are automatically uploaded to Pure1.
On a fresh install with clusterDiags enabled, the first diagnostics custom resource (CR) is created immediately. If you enable clusterDiags on an existing cluster, the first CR is created after the four-hour interval elapses. If the operator restarts, it resumes the interval based on the creation time of the last periodic CR instead of resetting the timer. If no previous periodic CR exists, for example, if it was deleted manually, a new CR is created immediately after the operator restarts.
Periodic diagnostics collection is supported with Portworx Operator 26.2.0 or later and Portworx Enterprise 3.6.1 or later.
Enable periodic diagnostics collection:
-
Set
clusterDiags.enabled: truein yourStorageClusterspecification:StorageCluster CRspec:
clusterDiags:
enabled: true -
Apply the updated
StorageClusterspecification.
Automatic cleanup:
- Diagnostic CRs: Completed
PortworxDiagCRs are automatically deleted by the operator when the next scheduledPortworxDiagCR is created. This doesn't apply to manually createdPortworxDiagCRs. - Periodic Pod diagnostic tarballs: Periodic Pod diagnostic tarballs (
/var/cores/*-periodic-poddiags-*.tar.gz) are cleaned up by the Pure1 phonehome runner. When a new periodic diagnostic tarball is added to a node, tarballs older than two hours are removed. If only one periodic diagnostic tarball exists on a node, it is retained regardless of age until a newer tarball is created. This behavior ensures that at least one periodic diagnostic tarball is always present on each node.
How to use diags with support tickets
To resolve a support case, you need to provide your cluster UUID to the Portworx support team. This allows them to retrieve your diags from Pure1 and diagnose the problem in your cluster.
Find your cluster's UUID
Run the following command to get your cluster UUID:
- Kubernetes
- Openshift
kubectl get storagecluster -n <px-namespace>
NAME CLUSTER UUID STATUS VERSION AGE
px-cluster-xxxxxxxx-xxxx-xxxx-xxxx-ec53d3ba5b39 xxxxxxxx-xxxx-xxxx-xxxx-b2249dff3501 Online 01d934d_d03a058 26h
oc get storagecluster -n <px-namespace>
NAME CLUSTER UUID STATUS VERSION AGE
px-cluster-xxxxxxxx-xxxx-xxxx-xxxx-ec53d3ba5b39 xxxxxxxx-xxxx-xxxx-xxxx-b2249dff3501 Online 01d934d_d03a058 26h
Configure the core file path on OpenShift (optional)
On OpenShift clusters, if you want core files to be written to /var/cores, configure the kernel core dump path by using a MachineConfig resource. Portworx Enterprise sets a default core path automatically, but you can override it if required.
Apply the following MachineConfig to set the core dump location on all worker nodes:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-worker-core-pattern
labels:
machineconfiguration.openshift.io/role: worker
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- path: /etc/sysctl.d/99-core-pattern.conf
mode: 0644
contents:
source: data:text/plain;charset=utf-8,kernel.core_pattern=/var/cores/core-%25e-sig%25s-user%25u-group%25g-pid%25p-time%25t
This configuration sets the core_pattern sysctl so that core dumps are written to /var/cores with the format:core-<executable>-sig<signal>-user<uid>-group<gid>-pid<pid>-time<timestamp>
Applying this MachineConfig triggers a rolling reboot of worker nodes through the Machine Config Operator. Ensure sufficient cluster capacity before applying it.