Skip to main content
Version: 24.12.01

Recover Cassandra pods from corrupt commit logs

After deploying the Cassandra data service, when you reboot the worker nodes, the Cassandra pods do not come up to form the cluster. The pods do not come up due to the corrupt logs:

cassandra ERROR 18:22:11 Exiting due to error while processing commit log during initialization.
cassandra org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Mutation checksum failure at 23031717 in Next section at 23028485 in CommitLog-7-1676881531203.log

To recover Cassandra pods from the corrupt commit logs:

  1. Scale deployment pds-deployment-controller-manager to 0.

  2. Edit Cassandra statefulset by adding the follwing line under spec.template.spec.containers:

    command: ["/bin/sleep", "3650d"]
    note

    The statefulset name is identical to the deployment name in PDS UI.

  3. Delete all Cassandra pods and wait for the pod 0 to start running (it will start running, but never become ready).

  4. Shell into pod 0 and delete the corrupt commit log. For example:

    rm /srv/pds/data/commitlog/CommitLog-7-1676881531203.log
  5. Exit the sheel of pod 0 and scale pds-deployment-controller-manager back to 1.

  6. Wait for 30 seconds (approximately) for the deployment Operator to update the statefulset.

  7. Delete Cassandra pod 0.

All Cassandra nodes should come up successfully. However, it is recommended to shell back into one of the Cassandra pods and run nodetool repair.