My cluster suddenly running into ERROR 1016: ER_CANT_OPEN_FILE

Hello,
on a customer site we have just experienced a big issue. The customer can’t queries anymore since they are getting ERROR 1016: ER_CANT_OPEN_FILE

2021-07-27 21:03:28 ERROR ConsumerPod:81 - Errore in gestione Pod Procedure 
java.sql.SQLException: Unhandled exception
Type: ER_CANT_OPEN_FILE
Message: Leaf Error (192.168.20.133:3306): Can't open file: 'columns/METERING_3/1/1358001/70356085' (errno: 2)
Callstack:
  #0 Line 23 in `METERING`.`SANITY_POD2` called from
  #1 Line 15 in `METERING`.`INIT_POD6` called from
  #2 Line 3 in `METERING`.`DEQUEE_POD_SIMPLE`

	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:965)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2495)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1903)
	at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1242)
	at com.mysql.jdbc.CallableStatement.execute(CallableStatement.java:837)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
	at mapxml.utils.ConsumerPod.run(ConsumerPod.java:73)

The cluster is made up of 3 nodes (master + 2 leaves) and the installed MEMSQL version is 6.8.9. I don’t see any errors if I go through Dashboard/Events/Hosts/Nodes via MemSQL Studio. We have also tried to restart the server (and the Memsql instance of course) but unsuccessfully.

I have no idea on how to work it around or solve.
Thanks for any kind of help.

It looks like SingleStore cannot open a database data file. I’d recommend that you check if that file exists, and if it does not, restore from a backup. If it does exist, check the permissions on it to make sure memsqld can read it.

Thanks for your reply.
Can you describe how to make both checks you suggested? Also, the customer doesn’t perform backups.

The data files are stored under the path given in the datadir variable. Do

show variables like 'datadir'

to see the path. Then look in there for the file in question and check if it is there and if it is, its permissions.

Sorry to hear there are no backups. Best practice is always to take backups.

Hi,

This error likely means you have hit a bug on the older version is memsql your running there. It can be repaired, but it will take a bit of effort.

There is an _REPAIR_TABLE command you can run directly on the leaf node with the corruption, but you have to find the table and partition that is corrupted. It will be a partition database and table on 192.168.20.133:3306.

You can find which via this query:

select * from information_schema.columnar_segments where FILE = columns/METERING_3/1/1358001/70356085'';

use the database and table name from that query, and login to 192.168.20.133:3306 with mysql client (can’t run this via an aggregator, it has to be vs the leaf) and run _REPAIR_TABLE <db>.<table>

I would strongly recommend you upgrade to at least the latest 6.8 patch release (6.8.24) if you want to stay on the 6.8 engine. Your better off upgrading at a more recent major version.

-Adam

1 Like

Thank you for your suggestions. This time the customer preferred to restore a server full backup.