Possible Data Loss when using K8s Operator with "CLEAR ORPHAN DATABASES" command

mmanthena · April 21, 2023, 9:53pm

We have identified and resolved a critical issue in SingleStoreDB versions 8.0.17, 7.9.20, and 7.8.26 that could potentially lead to data loss. We will be back-porting this fix to SingleStoreDB versions 7.6.29 (ETA: 4/24) and 7.5.25 (ETA: June 2023).

We recommend that all customers who use the "CLEAR ORPHAN DATABASES’’ command to upgrade, but we especially urge those who self-manage SingleStore DB using the K8s Operator to upgrade as soon as possible. These customers are at greater risk of data loss because the K8s Operator often calls this command during routine maintenance actions. We have discovered that when database tables on the master aggregator (MA) are in an unrecoverable state and the "CLEAR ORPHAN DATABASES’’ command is called on a database partition, some tables within this parition are dropped.

Orphaned partitions refer to partition databases and tables on leaf nodes that are no longer part of the cluster. This can occur when tables are dropped on the MA during asynchronous replication and the node fails before it can complete, resulting in orphaned tables. The “CLEAR ORPHAN DATABASES” command is typically used to remove these orphaned partitions.

It is important to note that data loss only occurs when the “CLEAR ORPHAN DATABASES” command is called while the MA is offline or when database tables on the MA are unrecoverable. When database tables on the MA are unrecoverable, the Engine cannot validate which tables are orphaned, resulting in all tables being dropped, including ref tables. If database tables are dropped, you can use the PITR feature or the latest backup to restore your database.