How long does the lock occur when the 'rebalance partitions' running?

hstoyanov · May 20, 2020, 8:01pm

You can see what REBALANCE will execute if you run EXPLAIN REBALANCE PARTITIONS ON db. Further SHOW REBALANCE STATUS will show you some details on the progress being made.

As an example, consider what happens when rebalance attempts to execute a COPY followed by PROMOTE, e.g.:

COPY PARTITION db:0 TO 'memsql-leaf01':3306
PROMOTE PARTITION db:0 ON 'memsql-leaf01':3306

The COPY command will create an async slave on the leaf. The master instance will send a snapshot + logs to the async slave and continue sending all new logs to the newly created slave. The master will commit writes and send logs to the slave without waiting for slaves to acknowledge anything.

The COPY comnand will then turn the async slave into sync slave. During that change, queries to the master partition may see some minimal increase in latency. After that change, all writes to the master partition will succeed only after being acknowledged as received by the sync slave.

Then the PROMOTE will lock access to the partition on all aggregators which will block queries to the partition. Then the master and the existing slave are locked. These two operations make sure there are no running transactions on the partitions while the change happens.

Then the sync slave is promoted to master while the old master is demoted to a slave. After that completes, the PROMOTE operation will unlock the partition and point all aggregators to the new location.

The duration of locking depends on 3 things:

the size of your cluster, in particular, the number of leaves and aggregators;
the kind of workload.
hardware.

A larger cluster will see longer blocking (as there are more aggregators to be coordinated). More data being written or long-running transactions can also make it take longer. Hard drives vs SSDs make a difference as the writes on which instance is the master needs to be written to disk. Network latency also impacts how long each of the many steps takes.

I wrote a short test to measure this on my laptop and I see it blocking a write query for 0.02s during the async to sync change. The PROMOTE parts blocks the write query for 1-2 seconds.

I would recommend measuring it on your cluster. The docs describe how to issue COPY and PROMOTE operations manually.

Is there a target minimal delay you want to hit?