Error when trying to REPLICATE any database from cluster

tomas · November 2, 2020, 3:24pm

Hi,

When I’m trying to do replication from a specific cluster I run a command:

REPLICATE DATABASE my_db FROM root:‘myrootpassword’@master-server:3306;

and it end up with error

ERROR 1218 (08S01) at line 1: Error connecting to master: could not find master information.

I tried it from multiple test servers/clusters, but it end up with the same error. I run replication like this many times on other cluster with no problems except this one cluster. I get this error for every database on that cluster. Anything else on this cluster works fine.

Can somene explain whatthat error means or what to do with it?

Regards
Tom

rodrigo · November 3, 2020, 9:45pm

Hi Tomas,

That error tells us that there was some issue with that cluster connecting to the remote cluster. There are a few ways this could happen, but usually you can find some more information in the tracelogs, telling where in the process the error occurred, and potentially more information about the error.

Further, can you clarify the following:
You said you’ve been able to successfully replicate databases before, but this cluster is not working. Is it this source cluster that always causes you problems (so you’ve tried to replicate from master-server to different clusters, and it always fails the same way), or is it this destination cluster that always causes you problems (so you’ve tried to replicate from many different master servers, but it always has the issue).

This can affect what to look for a lot. The first thing, though, is checking network connectivity. It is important that every node in the source cluster can connect to every node in the destination cluster, and vice-versa, since replication happens leaf to leaf directly, and not through the aggregators. You can do a preliminary check by running ping on every host of one cluster to every host of the other cluster. Even better would be to run mysql to every node on the secondary cluster (though you’ll have to make sure there’s some user you can use to connect to the leaves), just to make sure you don’t have a firewall blocking your memsql’s ports to external connections, but not blocking pings.

If you’re having the issue on the source cluster, and the source cluster at some point in your source cluster you had a master aggregator failover via aggregator set as master, and you’re not on the latest patch release of memsql, it is possible you’re running into a known bug that has been recently fixed.

This bug was fixed in 7.1.8, and 7.0.22 (see release notes on SingleStoreDB Cloud · SingleStore Documentation and SingleStoreDB Cloud · SingleStore Documentation).

If that is the case, then also try upgrading, to see if the issue is resolved (you have to upgrade all involved clusters).

tomas · November 4, 2020, 9:53pm

Thank you rodrigo. Upgrade to 7.1.11 did the job.

Tomas