Troubleshooting: removed tansaction logs

Hi, all

I’m trying to remove the tansaction log file (eg, test_1_log_v1_0) and restore it.

I couldn’t find any other way than to backup-drop-restore the database.
Is there any other good way (keep it online)?

And,
While using the DB, I could not check the information that the log file was removed. Is there any way to check?
I tried to check it with tail -f memsql.log , but I couldn’t.

Thank you in advance.

Hi @kyoungho.kum. Why are you trying to remove the transaction log file? And what do you mean “restore it?”

That’s a situation I’ve created for troubleshooting.

‘Restore’ means ‘restore database’.
That is, the database was backed up, dropped, and then restored. Database recovery was successful in this way.

For reference, there was no problem until the engine was restarted, but after the restart, the database became unavailable.

What SingleStore version are you running?

What system error message did you get when you tried to restart?

  • Version
    Currently it is 7.3.11, but in the past it was 7.3.10 and 7.3.7(this problem is all the same).

±-----------±-----------±------±------±--------------±-------------±--------±---------------±-------------------±-------------+
| MemSQL ID | Role | Host | Port | Process State | Connectable? | Version | Recovery State | Availability Group | Bind Address |
±-----------±-----------±------±------±--------------±-------------±--------±---------------±-------------------±-------------+
| 1153C774B4 | Master | node1 | 3306 | Running | True | 7.3.11 | Online | | 0.0.0.0 |
| D8B3F071F5 | Aggregator | node1 | 23306 | Running | True | 7.3.11 | Online | | 0.0.0.0 |
| 40D95A9004 | Leaf | node1 | 3308 | Running | True | 7.3.11 | RecoveryFailed | 1 | 0.0.0.0 |
| 43996A07F0 | Leaf | node1 | 3309 | Running | True | 7.3.11 | Online | 1 | 0.0.0.0 |
±-----------±-----------±------±------±--------------±-------------±--------±---------------±-------------------±-------------+

  • Logs
/MA/tracelog/memsql.log
   ...There is no error log...

/LF1/tracelog/memsql.log

   58860917 2021-06-04 13:38:07.159  ERROR: Thread 114891: Fn: Some error happened during Replication Management. Processed 15 out of 16.
   59069532 2021-06-04 13:38:07.367  ERROR: Thread 114891: Fn: Some error happened during Replication Management. Processed 15 out of 16.
   59277362 2021-06-04 13:38:07.575  ERROR: Thread 114891: Fn: Some error happened during Replication Management. Processed 15 out of 16.
   ... repeated ...

Please see below.
Removed “transaction log file” and restarted DB.

[13:34:27 madamgold@gpuserver /var/lib/memsql/LF1/data/logs]# ls -rtl test*
-rw------- 1 memsql memsql 16777216 4월 12 16:37 test_log_v1_4096
-rw------- 1 memsql memsql 67108864 5월 12 14:16 test_1_log_v1_16384
-rw------- 1 memsql memsql 67108864 5월 12 14:16 test_0_log_v1_16384
-rw------- 1 memsql memsql 16777216 6월 4 13:34 test_log_v1_0
-rw------- 1 memsql memsql 67108864 6월 4 13:34 test_1_log_v1_0
-rw------- 1 memsql memsql 67108864 6월 4 13:34 test_0_log_v1_0

[13:34:29 madamgold@gpuserver /var/lib/memsql/LF1/data/logs]# sudo rm -f test_0_log_v1_0

[13:34:58 madamgold@gpuserver /var/lib/memsql/LF1/data/logs]# ls -rtl test*
-rw------- 1 memsql memsql 16777216 4월 12 16:37 test_log_v1_4096
-rw------- 1 memsql memsql 67108864 5월 12 14:16 test_1_log_v1_16384
-rw------- 1 memsql memsql 67108864 5월 12 14:16 test_0_log_v1_16384
-rw------- 1 memsql memsql 67108864 6월 4 13:35 test_1_log_v1_0
-rw------- 1 memsql memsql 16777216 6월 4 13:36 test_log_v1_0

Thanks.

I think you’re saying that you removed a log file and tried to restart the database as a test. In general you can’t remove log files – they are critical to the successful operation and restart of the database.

Snapshots and transaction logs are important files for database durability.

So, if there is a problem with the file, we are considering how to fix it.

I’ve found that once the transaction log files are removed,
they are unrecoverable and the problem is not recognized until restarted, and when restarted the database becomes unusable. (Restore from backup, of course, but that might be too late.)

The only solution is to not cause the problem. We have to manage our transaction files carefully.

Thanks for your kind reply.