Operate and troubleshoot Autopilot in ARO
Summary and Key concepts
Summary
This article provides operational procedures for monitoring and troubleshooting Portworx Autopilot. It explains how to gather information about objects monitored by Autopilot using autopilotruleobjects, which are created when an object’s conditions are triggered. The article includes commands for listing and describing these objects, and for filtering them by rule. It also covers steps for collecting troubleshooting data, including creating a support bundle by capturing diagnostic information and logs from the Autopilot pod, which can then be sent to Portworx support for further analysis.
Kubernetes Concepts
- Custom Resource Definitions (CRD): autopilotruleobjectis a custom resource created to track the state of objects monitored by Autopilot.
- oc get: A command to list Kubernetes objects like autopilotruleobjectsin OpenShift clusters.
- oc exec: Executes commands in a running pod, such as sending signals to the Autopilot process for diagnostics.
Portworx Concepts
- Autopilot: Automates storage operations such as volume resizing or storage pool rebalancing.
This section provides common operational procedures for monitoring and troubleshooting your autopilot installation.
Troubleshooting objects monitored by Autopilot
Follow the steps in the sections below to troubleshoot objects monitored by Autopilot
Get recent statuses using AutopilotRuleObjects
For each object monitored by Autopilot, it will create a corresponding autopilotruleobject instance in the namespace of the object.
- For volumes (PVCs), the autopilotruleobjectinstance will be in the namespace of the PVC.
- For storage pools, the autopilotruleobjectinstance will be in the namespace where Portworx is installed.
The autopilotruleobject is created only if an object's condiitons were atleast triggered once. In other words, you will not see an autopilotruleobject if the object was always in nornal state.
List all autopilotruleobjects
The following command lists all Autopilot rule objects in all namespaces:
oc get autopilotruleobjects --all-namespaces
Instead of entering the full autopilotruleobjects string, you can use the aro alias.
oc get aro --all-namespaces
NAMESPACE   NAME                                       AGE
pg1         pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c   1s
Describe a specific object
The Status section contains a list of recent object statuses:
oc describe aro -n pg1 pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c
Name:         pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c
Namespace:    pg1
Labels:       rule=volume-resize
Annotations:  <none>
API Version:  autopilot.libopenstorage.org/v1alpha1
Kind:         AutopilotRuleObject
Metadata:
  Creation Timestamp:  2020-08-26T22:29:45Z
  Generation:          2
  Owner References:
    API Version:           autopilot.libopenstorage.org/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  AutopilotRule
    Name:                  volume-resize
    UID:                   xxxxxxxx-xxxx-xxxx-xxxx-62fbd2d5dbbc
  Resource Version:        7554069
  Self Link:               /apis/autopilot.libopenstorage.org/v1alpha1/namespaces/pg1/autopilotruleobjects/pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c
  UID:                     xxxxxxxx-xxxx-xxxx-xxxx-37fe57310baf
Status:
  Items:
    Last Process Timestamp:  2020-08-26T22:29:45Z
    Message:                 rule: volume-resize:pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c transition from Normal => Triggered
    State:                   Triggered
    Last Process Timestamp:  2020-08-26T22:30:19Z
    Message:                 rule: volume-resize:pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c transition from Triggered => ActionAwaitingApproval
    State:                   ActionAwaitingApproval
Events:                      <none>
List autopilotruleobjects for a given autopilotrule
You can use the label selector rule=<RULE_NAME> for list autopilotruleobjects only for that autopilotrule.
oc get aro --all-namespaces -l rule=volume-resize
NAMESPACE   NAME                                       AGE
pg1         pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c   4m3s
Troubleshooting autopilot
- 
Create a directory ( ap-cores) in which to store your support bundle files and send the support signal to the autopilot process:mkdir ap-cores
 POD=$(oc get pods -n portworx -l name=autopilot | grep -v NAME | awk '{print $1}')
 oc exec -n portworx $POD -- killall -SIGUSR1 autopilot
- 
Copy the support bundle files from your OCP cluster to your directory: oc cp portworx/$POD:/tmp/aut-diags.zip ap-cores/aut-diags.zip
 ls ap-cores
- 
Collect and place your autopilot pod logs into an autopilot-pod.logfile within your temporary directory:oc logs $POD -n portworx --tail=99999 > ap-cores/autopilot-pod.log
Once you've created a support bundle and collected your logs, send all of the files in the ap-cores/ directory to Portworx support.