Improving GROUP BY memory usage on a columnstore table

What are the best practices for minimizing memory usage for GROUP BYs on columnstore tables – is there any way to do so? Does the sort key make a difference?

This is a really general question. The best way is to have fewer groups or fewer total columns in the group by + aggregate lists. Or group by smaller data types, like int instead of bigint.

Also, we do support hash group-by spilling (to disk) during query execution now.

Merge join can take less space, but it can be slower, because it can’t use operations on encoded data.

Why do you ask?

1 Like

Ah, so it’s not super straightforward. I have noticed that the fewer columns in the group by, the better. Glad that SingleStore doesn’t require every non-aggregated value to appear in the group by.

I’m asking because I have a use-case in which such an aggregation is performed in one of my org’s apps approximately every hour. I’m trying to identify ways to minimize resource usage… I can modify the underlying structure of this table however I see fit, and tailor it to this use-case, so I was trying to see what changes might make sense. The columnstore table in question has ~22M rows which results in ~15M records in this aggregation. It takes 14.5 GB memory to perform the Hash Group By as-is.

We’re on 7.5.8 - is the spilling option available in this version, or was that introduced later?
Thanks!