Troubleshoot diverged GTIDs in MySQL
The MySQL data service in PDS handles (in most cases) pod crashes and outages. For example, instances can failover and rejoin the cluster automatically on reboot. In some cases, a pod, after an outage will be unable to reboot the cluster and keeps failing with the following error:
The instance `instance-a` has an incompatible Global Transaction Identifier (GTID) set with the seed instance `instance-b` (GTIDs diverged). If you wish to proceed, the `force` option must be explicitly set.
This means, instances cannot agree on who should be the new master as data on those instances has diverged.
To troubleshoot this issue:
-
Review the GTIDs in the binary log of the instances and choose which instance contains the latest or the most appropriate changes to continue on with. You can inspect the transactions on instances by:
-
opening a shell into the
mysql
container of the pods -
using MySQL tools such as
mysql
andmysqlbinlog
-
-
Once you selected which instance should be used as seed, you can force reboot the cluster by executing the following commands inside the
mysql
container of the selected pod:seed_instance=$(hostname -f)
mysqlsh --host=$seed_instance --user=innodb-config --password=$password -- dba reboot-cluster-from-complete-outage --force --primary=$seed_instance:3306
-
Check the cluster status and wait for the cluster to become recovered:
mysqlsh --host=$seed_instance --user=innodb-config --password=$password -- cluster status
If the cluster does not become healthy or if some nodes are not becoming online, then you should continue with:
-
removing the failing instances:
mysqlsh ... -- cluster remove-instance <other_instance>
-
and re-adding the instances:
mysqlsh ... -- cluster add-instance <other_instance> --recoveryMethod=clone
See restoring and rebooting a cluster for more imformation.