Please suggest best approach to improve Join between three large tables with different shard keys


Let me give a brief idea about what we are trying to achieve.

Basically we have three tables
Table1 which has shard key on column linkID. It has 2 other columns postId and personId among many others.
Table2 which has shard key on postId.It does not have linkID.
Table3 which has shard key on column personId. It does not have linkID.

We are trying to join the three tables :
Table1 left join Table2 on postId
Left join Table3 on personId.

All three tables have excess of a 5M records.

Any suggestions on how to improve the join performance?

Two tables that are sharded on their join columns can be joined with a “collocated join” so neither table has to be shuffled between nodes. So consider sharding table1 and table2 on postId or table2 and table3 on personId. You can’t do both.