Hey! So I ran memsql-admin register-node -y --json --force ... and then memsql-admin start-node -y --memsql-id <ID> against a leaf node (but on the master aggregator node of course) last night and when I woke up the leaf node was still in the “Recovering” state, so I called start-node again and waited for about 10 minutes. Here’s what memsql-admin list-nodes shows me:
+------------+--------+-------------+------+---------------+--------------+---------+----------------+--------------------+--------------+
| MemSQL ID  |  Role  |    Host     | Port | Process State | Connectable? | Version | Recovery State | Availability Group | Bind Address |
+------------+--------+-------------+------+---------------+--------------+---------+----------------+--------------------+--------------+
| 967171DBBC | Master | 10.0.45.3   | 3306 | Running       | True         | 6.8.11  | Online         |                    | 0.0.0.0      |
| 101B4127FD | Leaf   | 10.0.39.177 | 3306 | Running       | True         | 6.8.11  | Recovering     | 1                  | 0.0.0.0      |
+------------+--------+-------------+------+---------------+--------------+---------+----------------+--------------------+--------------+
And here’s what sudo tail -n 100 /mnt/raid/memsql/tracelogs/memsql.log shows me so far:
133270214 2019-10-31 15:02:19.696  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
139790686 2019-10-31 15:02:26.217   INFO: Recovering secondary key `deft_2`.`datoms_ref`.`ivt`
143277894 2019-10-31 15:02:29.704  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
153284812 2019-10-31 15:02:39.711  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
159675906 2019-10-31 15:02:46.102   INFO: Recovering secondary key `deft_1`.`datoms_string`.`eavt`
162415437 2019-10-31 15:02:48.841   INFO: Recovering secondary key `deft_2`.`datoms_ref`.`eavt`
163297285 2019-10-31 15:02:49.723  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
166546699 2019-10-31 15:02:52.973   INFO: Recovering secondary key `deft_2`.`datoms_ref`.`avt`
173305009 2019-10-31 15:02:59.731  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
174770068 2019-10-31 15:03:01.196   INFO: Recovering secondary key `deft_1`.`datoms_string`.`avt`
180796892 2019-10-31 15:03:07.223   INFO: Recovering secondary key `deft_2`.`datoms_ref`.`vt`
183314803 2019-10-31 15:03:09.741  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
193329751 2019-10-31 15:03:19.756  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
203011969 2019-10-31 15:03:29.438   INFO: Recovering secondary key `deft_2`.`datoms_ref`.`tt`
203347296 2019-10-31 15:03:29.773  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
208494557 2019-10-31 15:03:34.921   INFO: Recovering secondary key `deft_1`.`datoms_string`.`vt`
208868212 2019-10-31 15:03:35.294   INFO: Recovering secondary key `deft_2`.`datoms_ref`.`p`
212574665 2019-10-31 15:03:39.001   INFO: Preparing to recover secondary keys for table `deft_2`.`datoms_short`
212574729 2019-10-31 15:03:39.001   INFO: Recovering secondary key `deft_2`.`datoms_short`.`ieavt`
212575379 2019-10-31 15:03:39.001   INFO: Recovering secondary key `deft_2`.`datoms_short`.`iavt`
212575804 2019-10-31 15:03:39.002   INFO: Recovering secondary key `deft_2`.`datoms_short`.`ivt`
212576035 2019-10-31 15:03:39.002   INFO: Recovering secondary key `deft_2`.`datoms_short`.`eavt`
212576386 2019-10-31 15:03:39.002   INFO: Recovering secondary key `deft_2`.`datoms_short`.`avt`
212576803 2019-10-31 15:03:39.003   INFO: Recovering secondary key `deft_2`.`datoms_short`.`vt`
212577053 2019-10-31 15:03:39.003   INFO: Recovering secondary key `deft_2`.`datoms_short`.`tt`
212577469 2019-10-31 15:03:39.003   INFO: Recovering secondary key `deft_2`.`datoms_short`.`p`
212577863 2019-10-31 15:03:39.004   INFO: Preparing to recover secondary keys for table `deft_2`.`datoms_string`
212765739 2019-10-31 15:03:39.192   INFO: Recovering secondary key `deft_2`.`datoms_string`.`ieavt`
213354939 2019-10-31 15:03:39.781  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
220860837 2019-10-31 15:03:47.287   INFO: Recovering secondary key `deft_2`.`datoms_string`.`iavt`
223367782 2019-10-31 15:03:49.794  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
233382269 2019-10-31 15:03:59.808  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
243391441 2019-10-31 15:04:09.817  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
245806903 2019-10-31 15:04:12.233   INFO: Recovering secondary key `deft_2`.`datoms_string`.`ivt`
253402606 2019-10-31 15:04:19.829  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 2 out of 4
…
295026679 2019-10-31 15:05:01.453   INFO: Clearing snapshot version for database `deft_1` as it is about to go online.
295062983 2019-10-31 15:05:01.489   INFO: RecoverDatabaseThread[deft_1]: End
295483958 2019-10-31 15:05:01.910   INFO: STOP REPLICATING deft_1
295484017 2019-10-31 15:05:01.910   INFO: Thread 115083: StopSlaveThreads: Stopping slave thread for database `deft_1`.
295484070 2019-10-31 15:05:01.910   INFO: Completed STOP REPLICATING deft_1
295484093 2019-10-31 15:05:01.910   INFO: DB deft_1: has original Id 11395525453320891103.
295484104 2019-10-31 15:05:01.910   INFO: PROMOTE ALL LOCKS RELEASED deft_1
303494218 2019-10-31 15:05:09.920  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 3 out of 4
…
423633071 2019-10-31 15:07:10.059  ERROR: Thread 115068: Fn: Some error happened during Replication Management. Processed 3 out of 4
426466282 2019-10-31 15:07:12.892   INFO: Recovering secondary key `deft_2`.`datoms_string`.`p`
430241989 2019-10-31 15:07:16.668   INFO: [deft_2] Scanning columnstore segments: Completed, elapsed: 0 ms, disk: 0 ms
430242484 2019-10-31 15:07:16.668   INFO: Clearing snapshot version for database `deft_2` as it is about to go online.
430279079 2019-10-31 15:07:16.705   INFO: RecoverDatabaseThread[deft_2]: End
430279189 2019-10-31 15:07:16.705   INFO: Completed recovery of user databases.
430640805 2019-10-31 15:07:17.067   INFO: STOP REPLICATING deft_2
430640881 2019-10-31 15:07:17.067   INFO: Thread 115080: StopSlaveThreads: Stopping slave thread for database `deft_2`.
430640930 2019-10-31 15:07:17.067   INFO: Completed STOP REPLICATING deft_2
430640977 2019-10-31 15:07:17.067   INFO: DB deft_2: has original Id 11395525453320891103.
430640995 2019-10-31 15:07:17.067   INFO: PROMOTE ALL LOCKS RELEASED deft_2
The logs do show repeated errors, but looks like there’s been forward progress of some sort. However memsql-admin list-nodes still shows the leaf as “Recovering”, and it’s been 8 minutes since the log has been written to. I have a pretty grim outlook at the moment since my first thought is that the data is corrupted. Luckily I do have backups, but they’re potentially slightly after the last known good state; we’ll see.
Any insight you can bring to bear on this would be extremely appreciated since we’re dead in the water till we fix this! Thanks!
 That does make me feel better. Still strange it’s stuck in a Recovering state. The latest output of the master aggregator log is a bunch of statistics queries and prior to that it says there is an availability group with no leaves. I have an urgent meeting so I will have to get back to you with more specifics about that within an hour or so.
  That does make me feel better. Still strange it’s stuck in a Recovering state. The latest output of the master aggregator log is a bunch of statistics queries and prior to that it says there is an availability group with no leaves. I have an urgent meeting so I will have to get back to you with more specifics about that within an hour or so.