About failure to disable HA (load_balanced mode)

Hi,

In load_balanced HA mode, an attempt was made to remove all leaf nodes of availability group 2 to disable high availability, but failed.

Is there a way, or is it a bug?

Please see below.

[11:30:58 madamgold@gpuserver ~]# sdb-admin list-nodes
±-----------±-------±--------------±-----±--------------±-------------±--------±---------------±-------------------±-------------+
| MemSQL ID | Role | Host | Port | Process State | Connectable? | Version | Recovery State | Availability Group | Bind Address |
±-----------±-------±--------------±-----±--------------±-------------±--------±---------------±-------------------±-------------+
| 62D6451F6F | Master | 192.168.1.200 | 3306 | Running | True | 7.3.6 | Online | | 0.0.0.0 |
| 40D95A9004 | Leaf | 192.168.1.200 | 3307 | Running | True | 7.3.6 | Online | 2 | 0.0.0.0 |
| 43996A07F0 | Leaf | 192.168.1.200 | 3308 | Running | True | 7.3.6 | Online | 1 | 0.0.0.0 |
| 9DAEDD0F9A | Leaf | 192.168.1.200 | 3309 | Running | True | 7.3.6 | Online | 2 | 0.0.0.0 |
| 8A6FD7D32F | Leaf | 192.168.1.200 | 3310 | Running | True | 7.3.6 | Online | 1 | 0.0.0.0 |
±-----------±-------±--------------±-----±--------------±-------------±--------±---------------±-------------------±-------------+

[11:31:02 madamgold@gpuserver ~]# memsql -p

singlestore> show databases;
±-------------------+
| Database |
±-------------------+
| cluster |
| information_schema |
| memsql |
±-------------------+
3 rows in set (0.00 sec)

singlestore> create database test;
Query OK, 1 row affected (3.01 sec)

singlestore> show partitions on test;
±--------±--------------±-----±-------±-------+
| Ordinal | Host | Port | Role | Locked |
±--------±--------------±-----±-------±-------+
| 0 | 192.168.1.200 | 3310 | Master | 0 |
| 0 | 192.168.1.200 | 3307 | Slave | 0 |
| 1 | 192.168.1.200 | 3310 | Master | 0 |
| 1 | 192.168.1.200 | 3307 | Slave | 0 |
| 2 | 192.168.1.200 | 3310 | Master | 0 |
| 2 | 192.168.1.200 | 3309 | Slave | 0 |
| 3 | 192.168.1.200 | 3310 | Master | 0 |
| 3 | 192.168.1.200 | 3309 | Slave | 0 |
| 4 | 192.168.1.200 | 3308 | Master | 0 |
| 4 | 192.168.1.200 | 3307 | Slave | 0 |
| 5 | 192.168.1.200 | 3308 | Master | 0 |
| 5 | 192.168.1.200 | 3307 | Slave | 0 |
| 6 | 192.168.1.200 | 3308 | Master | 0 |
| 6 | 192.168.1.200 | 3309 | Slave | 0 |
| 7 | 192.168.1.200 | 3308 | Master | 0 |
| 7 | 192.168.1.200 | 3309 | Slave | 0 |
| 8 | 192.168.1.200 | 3307 | Master | 0 |
| 8 | 192.168.1.200 | 3310 | Slave | 0 |
| 9 | 192.168.1.200 | 3307 | Master | 0 |
| 9 | 192.168.1.200 | 3310 | Slave | 0 |
| 10 | 192.168.1.200 | 3307 | Master | 0 |
| 10 | 192.168.1.200 | 3308 | Slave | 0 |
| 11 | 192.168.1.200 | 3307 | Master | 0 |
| 11 | 192.168.1.200 | 3308 | Slave | 0 |
| 12 | 192.168.1.200 | 3309 | Master | 0 |
| 12 | 192.168.1.200 | 3310 | Slave | 0 |
| 13 | 192.168.1.200 | 3309 | Master | 0 |
| 13 | 192.168.1.200 | 3310 | Slave | 0 |
| 14 | 192.168.1.200 | 3309 | Master | 0 |
| 14 | 192.168.1.200 | 3308 | Slave | 0 |
| 15 | 192.168.1.200 | 3309 | Master | 0 |
| 15 | 192.168.1.200 | 3308 | Slave | 0 |
±--------±--------------±-----±-------±-------+
32 rows in set (0.00 sec)

singlestore> show leaves;
±--------------±-----±-------------------±--------------±----------±-------±-------------------±-----------------------------±-------±------------------------+
| Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections | Average_Roundtrip_Latency_ms | NodeId | Grace_Period_In_seconds |
±--------------±-----±-------------------±--------------±----------±-------±-------------------±-----------------------------±-------±------------------------+
| 192.168.1.200 | 3310 | 1 | 192.168.1.200 | 3309 | online | 20 | 0.172 | 11 | NULL |
| 192.168.1.200 | 3308 | 1 | 192.168.1.200 | 3307 | online | 10 | 0.136 | 19 | NULL |
| 192.168.1.200 | 3307 | 2 | 192.168.1.200 | 3308 | online | 16 | 0.128 | 22 | NULL |
| 192.168.1.200 | 3309 | 2 | 192.168.1.200 | 3310 | online | 17 | 0.120 | 23 | NULL |
±--------------±-----±-------------------±--------------±----------±-------±-------------------±-----------------------------±-------±------------------------+
4 rows in set (0.00 sec)

singlestore> remove leaf ‘192.168.1.200’:3307;
Query OK, 1 row affected (4.96 sec)

singlestore> remove leaf ‘192.168.1.200’:3309;
ERROR 1772 (HY000): There are no online leaves to move partitions to.

Thanks in advance.

I added a “force” like this and it worked.

Is this intended?

singlestore> remove leaf ‘192.168.1.200’:3309 force;
Query OK, 1 row affected (1.34 sec)

singlestore> show partitions on test;
±--------±--------------±-----±-------±-------+
| Ordinal | Host | Port | Role | Locked |
±--------±--------------±-----±-------±-------+
| 0 | 192.168.1.200 | 3310 | Master | 0 |
| 1 | 192.168.1.200 | 3310 | Master | 0 |
| 2 | 192.168.1.200 | 3310 | Master | 0 |
| 3 | 192.168.1.200 | 3310 | Master | 0 |
| 4 | 192.168.1.200 | 3308 | Master | 0 |
| 5 | 192.168.1.200 | 3308 | Master | 0 |
| 6 | 192.168.1.200 | 3308 | Master | 0 |
| 7 | 192.168.1.200 | 3308 | Master | 0 |
| 8 | 192.168.1.200 | 3310 | Master | 0 |
| 9 | 192.168.1.200 | 3310 | Master | 0 |
| 10 | 192.168.1.200 | 3308 | Master | 0 |
| 11 | 192.168.1.200 | 3308 | Master | 0 |
| 12 | 192.168.1.200 | 3310 | Master | 0 |
| 13 | 192.168.1.200 | 3310 | Master | 0 |
| 14 | 192.168.1.200 | 3308 | Master | 0 |
| 15 | 192.168.1.200 | 3308 | Master | 0 |
±--------±--------------±-----±-------±-------+

Thanks again…

1 Like

Welcome back! What’s your setting for global variable leaf_failover_fanout? I assume it’s load_balanced, correct?

Yes, that’s right.

I know the’FORCE’ flag disables the rebalancing behavior of ‘REMOVE LEAF’.

Then the cluster will not be able to keep online.

Which version of singlestore? I don’t think this is expected behavior (I couldn’t reproduce it just now when I tried vs our latest internal build).

-Adam

One of our engineers was able to reproduce it internally, and we’ve opened a bug to track it. @kyoungho.kum thank you for bringing it to our attention.