Version: 3.3

Diagnostics collection in AKS

About diagnostics

The Portworx diagnostics bundle, known as diags, is a support bundle that provides all the necessary information for the Portworx support team to diagnose any problems in your cluster. It includes the following information:

Portworx journal logs
Output from common Portworx CLI commands that provide details about cluster, nodes, and volumes
Basic information about operating system of the worker nodes
Stack and heap files of Portworx processes
Alerts generated by the cluster
Cores or traces from Portworx processes (if found)

Portworx automatically uploads the diags to Pure Storage's call home service, Pure1. This feature eliminates the need for manually work of uploading diags to support tickets, ultimately reducing the time required to resolve cluster issues.

Prerequisites

Outbound access to the internet to allow connection to Pure1

Enable Pure1 integration

Enabling telemetry adds a new telemetry pod on each node. This pod is responsible for uploading Portworx diagnostics to Pure1.

Fresh installs

Telemetry and metrics collector are enabled by default for all new clusters when you generate a spec from Portworx Central. However, you can disable this function later in the StorageCluster spec.

note

If you disabled telemetry during the installation of Portworx and want to enable it now, you must update the StorageCluster spec and restart the pods to load the updated configuration.

Upgrades

When upgrading to Portworx Operator 23.3.0 or newer, the following conditions apply:

Telemetry will be enabled by default unless telemetry is disabled in the StorageCluster spec or when the PX_HTTPS_PROXY variable is configured.
If telemetry is already enabled and the PX_HTTPS_PROXY environment variable is set, then telemetry will be disabled. When this happens, a Kubernetes warning event will be generated to notify you that telemetry will be disabled.

How to use diags with support tickets

To resolve a support case, you need to provide your cluster UUID to the Portworx support team. This will allow them to retrieve your diags from Pure1 and diagnose the problem in your cluster.

Find your cluster's UUID

Run the following command to get your cluster UUID:

kubectl get storagecluster -n <px-namespace>

NAME                                              CLUSTER UUID                           STATUS   VERSION           AGE
px-cluster-xxxxxxxx-xxxx-xxxx-xxxx-ec53d3ba5b39   xxxxxxxx-xxxx-xxxx-xxxx-b2249dff3501   Online   01d934d_d03a058   26h

Collect diagnostics

Portworx diagnostics are collected in primarily two ways:

On-demand diagnostics

You can collect diagnostics either using the pxctl CLI or the PortworxDiag custom resource.

On-demand diagnostics using `pxctl` CLI

You can run diagnostics directly from the node using the pxctl CLI. The most commonly used command is:

/opt/pwx/bin/pxctl service diags -a

This command generates a diagnostics bundle on the node. If Pure1 telemetry is enabled, the bundle is automatically uploaded to Pure1.

On-demand diagnostics using `PortworxDiag` custom resource

note

The PortworxDiag custom resource is only available when using Portworx Enterprise 3.3.0 and Operator 25.2.0 or later.

The PortworxDiag custom resource allows you to collect diagnostics at the cluster level. When you create this resource, the Portworx Operator:

Creates temporary pods for diagnostics collection.
Collects node-level diagnostics and Portworx pod logs.
Stores the diagnostics bundle in /var/cores directory.
Deletes the pods after the logs are collected.

The diagnostics bundle includes the following:

Node diagnostics:
- Output from pxctl service diags
- Stack traces
- Heap logs
Pod diagnostics:
- Logs from all Portworx pods
- Pod YAML manifests

You can also specify the nodes and volumes for which you want to collect diagnostics. For more information, see PortworxDiag CRD.

To collect diagnostics:

Enable the spec.clusterDiags field in your StorageCluster CR spec.
StorageCluster CR
```
...
spec:
  clusterDiags:
    enabled: true
...
```
Create a PortworxDiag custom resource. For more information, see PortworxDiag CRD.

Portworx is installed in the portworx namespace by default. If you didn’t customize the namespace during installation, use portworx for the namespace field in the manifest. If you used a different namespace, replace <NAMESPACE> with your specific namespace.

Example:

PortworxDiag CR
apiVersion: portworx.io/v1
kind: PortworxDiag
metadata:
  name: cluster-diag-obj1
  namespace: <NAMESPACE> # Replace with your namespace if default use portworx
spec:
  portworx:
   generateCore: true
   podDiags: true
   nodes:
    all: true # You can either use `all: true` or specify IDs and/or labels by using the `ids` and/or `labels` fields.
    ids: []
    labels: {}
   volume:
    ids: []
    labels: {}

Apply the PortworxDiag custom resource.

kubectl apply -f <your-portworxdiag-resource>.yaml

Once diagnostics collection is complete, the status of the custom resource includes the name and path of each diagnostics tar file.

On crash

If a Portworx process runs into an issue on a node, it will automatically collect diagnostics. If the Pure1 telemetry is enabled, it will be automatically uploaded to Pure1.

Disable Pure1 integration

Operator based install

To disable the metrics collector and telemetry integration, add the following section in your StorageCluster spec:

StorageCluster CRD
...
spec:
  monitoring:
    telemetry:
      enabled: false 
...

This removes the telemetry container from the Portworx pod.

See Generate a complete diagnostics package for the entire CLI syntax.

About diagnostics​

Prerequisites​

Enable Pure1 integration​

Fresh installs​

Upgrades​

How to use diags with support tickets​

Find your cluster's UUID​

Collect diagnostics​

On-demand diagnostics​

On-demand diagnostics using pxctl CLI​

On-demand diagnostics using PortworxDiag custom resource​

On crash​

Disable Pure1 integration​

Operator based install​