Backup fails: ERROR 2004 UNKNOWN_ERR_CODE: Leaf Error

Hello,
we are getting the following error when trying to backup to S3: ERROR 2004 UNKNOWN_ERR_CODE: Leaf Error (:3306): Socket closed due to keepalive probe failures.
The database is about 500GB, has 35 tables and 32 partitions. I created a split partitions backup on sunday to match the recommended amount of partitions for our setup. I followed the exact steps in the docs. Since then it’s not possible to make any backups.

Can someone provide more information about the error code and possible solutions? Sadly I can’t find anything in the docs about this.

Things I already tried but resulted in the same error:

  • increase the backup timeout (using the TIMEOUT clause in the backup query)
  • increase connection_timeout
  • increase subprocess_io_idle_timeout_ms
  • decrease backup_max_threads (tried: 4, 8, 16, 32)
  • different master aggregators
  • stopped all pipelines
  • restarted cluster multiple times
  • upgraded to 8.0.17
  • ‘FILL CONNECTION POOLS’ on each node
  • using WITH INIT clause

I also can’t create a local backup as the command immediatly throws an invalid permission error (even though the directory is owned by memsql:memsql and it actually writes data before throwing).

thanks & best,
tom

I assume you are running self-hosted, is that right (not on our managed service)?

The “Socket closed due to keepalive probe failures” error might occur if there is an issue with the underlying network, which causes a disconnection between the client and the leaf nodes. This might lead to unexpected or partial results, especially during data load, when the leaves have to wait and eventually time out1 2.

Please consider checking the network connection and verifying that there are no underlying issues causing the disconnection or timeout. Additionally, make sure that the data you are loading into SingleStore does not have a large number of duplicates, as this can also cause the leaf nodes to time out when using LOAD DATA IGNORE 1 2.

If the problem persists, please reply with more information about your installation and the command you are running that’s failing.

Hi, seems like it’s an issue with our hosting provider. I was able to do a local backup and migrated to another provider, now everything’s working as expected. Thank you!

1 Like