Operate and troubleshoot Autopilot in AWS EKS
Summary and Key concepts
Summary:
This article provides troubleshooting procedures for monitoring objects managed by Portworx Autopilot. It explains how to view the status of objects through autopilotruleobject
instances, which are created when a monitored object triggers a condition. Commands are provided to list and describe these rule objects, showing recent statuses and transitions. Additionally, the article covers how to generate a support bundle by signaling the Autopilot process, copying diagnostic files from the cluster, and capturing logs, which can then be sent to Portworx support for further assistance.
Kubernetes Concepts:
- Custom Resource Definitions (CRD):
autopilotruleobject
is a custom resource created when an Autopilot rule is triggered, storing the monitored object’s status and actions. - kubectl exec: Executes commands in a running pod, such as sending a signal to the Autopilot process to create diagnostic files.
- kubectl get: Lists Kubernetes objects like
autopilotruleobjects
.
Portworx Concepts:
- Autopilot: Automates storage management tasks based on conditions and actions, like resizing volumes or rebalancing storage pools.
This section provides common operational procedures for monitoring and troubleshooting your autopilot installation.
Troubleshooting objects monitored by Autopilot
Follow the steps in the sections below to troubleshoot objects monitored by Autopilot
Get recent statuses using AutopilotRuleObjects
For each object monitored by Autopilot, it will create a corresponding autopilotruleobject
instance in the namespace of the object.
- For volumes (PVCs), the
autopilotruleobject
instance will be in the namespace of the PVC. - For storage pools, the
autopilotruleobject
instance will be in the namespace where Portworx is installed.
The autopilotruleobject
is created only if an object's condiitons were atleast triggered once. In other words, you will not see an autopilotruleobject
if the object was always in nornal state.
List all autopilotruleobjects
The following command lists all Autopilot rule objects in all namespaces:
kubectl get autopilotruleobjects --all-namespaces
Instead of entering the full autopilotruleobjects
string, you can use the aro
alias.
kubectl get aro --all-namespaces
NAMESPACE NAME AGE
pg1 pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c 1s
Describe a specific object
The Status
section contains a list of recent object statuses:
kubectl describe aro -n pg1 pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c
Name: pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c
Namespace: pg1
Labels: rule=volume-resize
Annotations: <none>
API Version: autopilot.libopenstorage.org/v1alpha1
Kind: AutopilotRuleObject
Metadata:
Creation Timestamp: 2020-08-26T22:29:45Z
Generation: 2
Owner References:
API Version: autopilot.libopenstorage.org/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: AutopilotRule
Name: volume-resize
UID: xxxxxxxx-xxxx-xxxx-xxxx-62fbd2d5dbbc
Resource Version: 7554069
Self Link: /apis/autopilot.libopenstorage.org/v1alpha1/namespaces/pg1/autopilotruleobjects/pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c
UID: xxxxxxxx-xxxx-xxxx-xxxx-37fe57310baf
Status:
Items:
Last Process Timestamp: 2020-08-26T22:29:45Z
Message: rule: volume-resize:pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c transition from Normal => Triggered
State: Triggered
Last Process Timestamp: 2020-08-26T22:30:19Z
Message: rule: volume-resize:pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c transition from Triggered => ActionAwaitingApproval
State: ActionAwaitingApproval
Events: <none>
List autopilotruleobjects for a given autopilotrule
You can use the label selector rule=<RULE_NAME>
for list autopilotruleobjects
only for that autopilotrule.
kubectl get aro --all-namespaces -l rule=volume-resize
NAMESPACE NAME AGE
pg1 pvc-xxxxxxxx-xxxx-xxxx-xxxx-03263da7b04c 4m3s
Troubleshooting autopilot
-
Create a directory (
ap-cores
) in which to store your support bundle files and send the support signal to the autopilot process:mkdir ap-cores
POD=$(kubectl get pods -n portworx -l name=autopilot | grep -v NAME | awk '{print $1}')
kubectl exec -n portworx $POD -- killall -SIGUSR1 autopilot -
Copy the support bundle files from your Kubernetes cluster to your directory:
kubectl cp portworx/$POD:/tmp/aut-diags.zip ap-cores/aut-diags.zip
ls ap-cores -
Collect and place your autopilot pod logs into an
autopilot-pod.log
file within your temporary directory:kubectl logs $POD -n portworx --tail=99999 > ap-cores/autopilot-pod.log
Once you've created a support bundle and collected your logs, send all of the files in the ap-cores/
directory to Portworx support.