Skip to main content

Advanced procedures

This topic provides information about troubleshooting and manually recovering data in PDS.

Troubleshoot diverged GTIDs in MySQL

The MySQL data service in PDS handles (in most cases) pod crashes and outages. For example, instances can failover and rejoin the cluster automatically on reboot. In some cases, a pod, after an outage will be unable to reboot the cluster and keeps failing with the following error:

The instance `instance-a` has an incompatible Global Transaction Identifier (GTID) set with the seed instance `instance-b` (GTIDs diverged). If you wish to proceed, the `force` option must be explicitly set.

This means, instances cannot agree on who should be the new master as data on those instances has diverged.

To troubleshoot this issue:

  1. Review the GTIDs in the binary log of the instances and choose which instance contains the latest or the most appropriate changes to continue on with. You can inspect the transactions on instances by:

    • opening a shell into the mysql container of the pods

    • using MySQL tools such as mysql and mysqlbinlog

  2. Once you selected which instance should be used as seed, you can force reboot the cluster by executing the following commands inside the mysql container of the selected pod:

    seed_instance=$(hostname -f)
    mysqlsh --host=$seed_instance --user=innodb-config --password=$password -- dba reboot-cluster-from-complete-outage --force --primary=$seed_instance:3306
  3. Check the cluster status and wait for the cluster to become recovered:

    mysqlsh --host=$seed_instance --user=innodb-config --password=$password -- cluster status

If the cluster does not become healthy or if some nodes are not becoming online, then you should continue with:

  • removing the failing instances:

    mysqlsh ... -- cluster remove-instance <other_instance>
  • and re-adding the instances:

    mysqlsh ... -- cluster add-instance <other_instance> --recoveryMethod=clone

See restoring and rebooting a cluster for more imformation.

Recover Cassandra pods from corrupt commit logs

After deploying the Cassandra data service, when you reboot the worker nodes, the Cassandra pods do not come up to form the cluster. The pods do not come up due to the corrupt logs:

cassandra ERROR 18:22:11 Exiting due to error while processing commit log during initialization.
cassandra org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Mutation checksum failure at 23031717 in Next section at 23028485 in CommitLog-7-1676881531203.log

To recover Cassandra pods from the corrupt commit logs:

  1. Scale deployment pds-deployment-controller-manager to 0.

  2. Edit Cassandra statefulset by adding the follwing line under spec.template.spec.containers:

    command: ["/bin/sleep", "3650d"]
    note

    The statefulset name is identical to the deployment name in PDS UI.

  3. Delete all Cassandra pods and wait for the pod 0 to start running (it will start running, but never become ready).

  4. Shell into pod 0 and delete the corrupt commit log. For example:

    rm /srv/pds/data/commitlog/CommitLog-7-1676881531203.log
  5. Exit the sheel of pod 0 and scale pds-deployment-controller-manager back to 1.

  6. Wait for 30 seconds (approximately) for the deployment Operator to update the statefulset.

  7. Delete Cassandra pod 0.

All Cassandra nodes should come up successfully. However, it is recommended to shell back into one of the Cassandra pods and run nodetool repair.

Update Kubernetes secret after changing the pds password

If you change the password for the pds user, you need to also update the corresponding Kubernetes secret for the deployment. To base64 encode a string and update the Kubernetes secret:

  1. Get the Kubernetes secret for the couchbase data service:

    kubectl get secrets -n <namespace-where-the-Couchbase-data-service-is-deployed>
  2. Encode your new administrator password into base64:

    echo <the-updated-password1> | base64
  3. Update the Kubernetes secret with the new base64 encoded adminsitrator password:

    kubectl get secret cb-rke-qichff-creds -n cb -o json | jq '.data["password"]="UGFzc3dvcmQxCg=="' | kubectl apply -f -secret/cb-rke-qichff-creds configured

Update the pds password in the cqlshrc file for Cassandra pods

If you change the password for the pds user, you need to also update the cqlshrc file located on all Cassandra pods:

  1. Get in to all Cassandra pods:

    kubectl exec -it -n <NAMESPACE> <POD-NAME> -- bash
  2. Open the cqlshrc file:

    vi ./cassandra/cqlshrc
  3. Change the default user password pds with a new password.

Was this page helpful?