Since there is problem with distributed joins (where shard key columns are different), how we are doing is extract the result set from driving table and then in the loop we are using the extracted result set and querying the other table in the loop.
for i in (collect(first_query))
echo select *
we are returning blank when there is no corresponding match in the 2nd table. All I am trying to is stop echoing when there is no record retuned.
Prathap Reddy G
I suppose you could run a count(*) on a query inside the loop with the same from/where clauses, and if it is > 0, then do the echo select. That’ll take more time but should work. To save effort, you could put the results in a temp table, then if the temp table has > 0 rows, return them, then delete all rows from the temp table, and repeat.
Hi Hanson, Thank you for your response, due to SLA requirements, we could not execute the same query twice one for checking record existence and then returning the results. Temp table in production environment is a big maintenance headache. The result set should be delivered to our UI thru api call. Any better approach is much appreciable.
Another thought is, collect the inner query result in an array, then if it is empty, don’t do anything, else, somehow return the contents of the array.
I’m interested in the fundamental problem though. What was the performance issue with the original query that made you want to do this? Was it doing a broadcast or shuffle that was too expensive, and you are trying to do a more point-to-point lookup approach that was cheaper?
Hi Hanson, it’s doing broadcast and consuming lot of cluster resources because of the shard key on the tables is not same. Array approach won’t help up because we will have results returned from previous iterations.