Skip to main content
Version: 3.6

Collect diagnostics

About diagnostics

The Portworx diagnostics bundle, known as diags, is a support bundle that provides all the necessary information for the Portworx support team to diagnose any problems in your cluster. It includes the following information:

  • Portworx journal logs
  • Output from common Portworx CLI commands that provide details about cluster, nodes, and volumes
  • Basic information about the operating system of the worker nodes
  • Stack and heap files of Portworx processes
  • Alerts generated by the cluster
  • Cores or traces from Portworx processes (if found)

Portworx automatically uploads the diags to Everpure's call home service, Pure1 when telemetry is enabled. This feature eliminates the need for manual work of uploading diags to support tickets, ultimately reducing the time required to resolve cluster issues.

Prerequisites

Outbound access to the internet to allow connection to Pure1

Enable Pure1 integration

Telemetry to Pure1 is enabled by default when you generate the StorageCluster spec from Portworx Central, unless you disable it by setting spec.monitoring.telemetry.enabled to false. For information about enabling telemetry, see Portworx Telemetry.

If a Portworx process runs into an issue on a node, Portworx will automatically collect diagnostics. If Pure1 telemetry is enabled, the diagnostics bundle is automatically uploaded to Pure1.

Collect diagnostics

Using pxctl

You can run diagnostics directly from the node using the pxctl CLI. The most commonly used command is:

/opt/pwx/bin/pxctl service diags -a

This command generates a diagnostics bundle on the node. If Pure1 telemetry is enabled, the bundle is automatically uploaded to Pure1.

Portworx also automatically triggers diagnostics collection if a process encounters a critical issue on a node.

For more information abput collecting diagnostics using pxctl, see Generate a complete diagnostics package.

Using the PortworxDiag custom resource

note

The PortworxDiag custom resource is only available when using Portworx Enterprise 3.3.0 and Operator 25.2.0 or later.

The PortworxDiag custom resource allows you to collect diagnostics at the cluster level. When you create this resource, the Portworx Operator:

  1. Creates temporary pods for diagnostics collection.
  2. Collects node-level diagnostics and Portworx pod logs.
  3. Stores the diagnostics bundle in the /var/cores directory.
  4. Deletes the pods after the logs are collected.

The diagnostics bundle includes the following:

  • Node diagnostics (on-demand CRs only):

    • Output from pxctl service diags
    • Stack traces
    • Heap logs
    • Core dumps (when generateCore: true is set in the CR spec)
  • Pod diagnostics:

    • Logs from all Portworx pods
    • Pod YAML manifests
    note

    Starting with Portworx Enterprise 3.6.1, pod diagnostics also include metadata from Kubernetes CRD resources for Portworx, Stork, KubeVirt, KDMP, and storage-related objects.

You can collect diagnostics using the PortworxDiag CR either on-demand or periodically.

On-demand diagnostics collection using PortworxDiag CR

  1. Enable the spec.clusterDiags field in your StorageCluster CR spec.

    StorageCluster CR
    ...
    spec:
    clusterDiags:
    enabled: true
    ...
  2. Create a PortworxDiag custom resource. For more information, see PortworxDiag CRD.

    Example:

    PortworxDiag CR
    apiVersion: portworx.io/v1
    kind: PortworxDiag
    metadata:
    name: cluster-diag-obj1
    namespace: <portworx> # Replace with the namespace where Portworx is installed
    spec:
    portworx:
    generateCore: true
    podDiags: true
    nodes:
    all: true # You can either use `all: true` or specify IDs and/or labels by using the `ids` and/or `labels` fields.
    ids: []
    labels: {}
    volumes:
    ids: []
    labels: {}

    You can also specify the nodes and volumes for which you want to collect diagnostics. For more information, see PortworxDiag CRD.

  3. Apply the PortworxDiag custom resource.

    kubectl apply -f <your-portworxdiag-resource>.yaml

Once diagnostics collection is complete, the status of the custom resource includes the name and path of each diagnostics tar file.

Periodic diagnostics collection using PortworxDiag CR

The Portworx Operator supports automatic, periodic collection of pod-level diagnostics. When enabled, the operator creates PortworxDiag CRs every 4 hours and collects logs from all Portworx pods and pod YAML manifests. If you have enabled telemetry, the collected diagnostics are automatically uploaded to Pure1.

On a fresh install with clusterDiags enabled, the first diagnostics custom resource (CR) is created immediately. If you enable clusterDiags on an existing cluster, the first CR is created after the four-hour interval elapses. If the operator restarts, it resumes the interval based on the creation time of the last periodic CR instead of resetting the timer. If no previous periodic CR exists, for example, if it was deleted manually, a new CR is created immediately after the operator restarts.

note

Periodic diagnostics collection is supported with Portworx Operator 26.2.0 or later and Portworx Enterprise 3.6.1 or later.

Enable periodic diagnostics collection:

  1. Set clusterDiags.enabled: true in your StorageCluster specification:

    StorageCluster CR
    spec:
    clusterDiags:
    enabled: true
  2. Apply the updated StorageCluster specification.

Automatic cleanup:

  • Diagnostic CRs: Completed PortworxDiag CRs are automatically deleted by the operator when the next scheduled PortworxDiag CR is created. This doesn't apply to manually created PortworxDiag CRs.
  • Periodic Pod diagnostic tarballs: Periodic Pod diagnostic tarballs (/var/cores/*-periodic-poddiags-*.tar.gz) are cleaned up by the Pure1 phonehome runner. When a new periodic diagnostic tarball is added to a node, tarballs older than two hours are removed. If only one periodic diagnostic tarball exists on a node, it is retained regardless of age until a newer tarball is created. This behavior ensures that at least one periodic diagnostic tarball is always present on each node.

How to use diags with support tickets

To resolve a support case, you need to provide your cluster UUID to the Portworx support team. This allows them to retrieve your diags from Pure1 and diagnose the problem in your cluster.

Find your cluster's UUID

Run the following command to get your cluster UUID:

kubectl get storagecluster -n <px-namespace>
NAME                                              CLUSTER UUID                           STATUS   VERSION           AGE
px-cluster-xxxxxxxx-xxxx-xxxx-xxxx-ec53d3ba5b39 xxxxxxxx-xxxx-xxxx-xxxx-b2249dff3501 Online 01d934d_d03a058 26h

Configure the core file path on OpenShift (optional)

On OpenShift clusters, if you want core files to be written to /var/cores, configure the kernel core dump path by using a MachineConfig resource. Portworx Enterprise sets a default core path automatically, but you can override it if required.
Apply the following MachineConfig to set the core dump location on all worker nodes:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-worker-core-pattern
labels:
machineconfiguration.openshift.io/role: worker
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- path: /etc/sysctl.d/99-core-pattern.conf
mode: 0644
contents:
source: data:text/plain;charset=utf-8,kernel.core_pattern=/var/cores/core-%25e-sig%25s-user%25u-group%25g-pid%25p-time%25t

This configuration sets the core_pattern sysctl so that core dumps are written to /var/cores with the format:
core-<executable>-sig<signal>-user<uid>-group<gid>-pid<pid>-time<timestamp>

note

Applying this MachineConfig triggers a rolling reboot of worker nodes through the Machine Config Operator. Ensure sufficient cluster capacity before applying it.