Show slow performance response using order by

cwatts · January 9, 2020, 6:17pm

If you want to be able to avoid scanning the whole table to find the top 300 rows, your sort (clustered columnstore) key needs to support that. This means your order by fields (including order i.e. DESC) must be a prefix of your sort key. This would be accomplished by making event_arrival_time in your current sort key event_arrival_time desc.

Today we have:
event_arrival_time, action, operator_id, game_id, correlation_id
Isnt it almost the same?

The order that your fields appear in the key matters. Rows are sorted lexicographically, so putting game_id directly after event_arrival_time will get you better segment elimination. To help understand this, consider a sorted dictionary of words. There is a contiguous range of words starting with “co” (“c” corresponding to an event_arrival_time and “o” corresponding to a game_id), so it is easier to find those words than words containing “c_o” (where “_” is a wildcard representing possible values for action).

idan · January 9, 2020, 6:26pm

so we do have event_arrival_time as prefix. so we are almost there.
you saying we shall have something like that:

event_arrival_time, game_id, action, operator_id, correlation_id

thats almost the same isnt it?

quoting my last note:

cwatts · January 9, 2020, 6:35pm

If you want to sort DESC, you need event_arrival_time desc as a prefix, which you do not have. My suggestion is event_arrival_time desc, game_id, action, operator_id, correlation_id. The desc is to support the order by limit without scanning the whole table, and the game_id earlier is for better segment elimination.

thats almost the same isnt it?

Sure, they are similar, but the differences are important.

idan · January 9, 2020, 7:34pm

correlation_id has high cardinality are you sure it shall be described in cluster columnstore key ? I thought I understood that only low cardinality coulmns should be added to that key

cwatts · January 9, 2020, 7:51pm

For the purposes of optimizing this query, nothing after game_id matters. I was just trying to make minimal changes to the key you already had.

idan · January 9, 2020, 8:08pm

We are planning to recreate this table so if I can get more tips from experienced expert like you(in regarding how shall I design the columnstore cluster index key Ofcourse that is highly appreciated

Would you recommend keeping high cardinally column on that key?

Thanks again for your help

cwatts · January 9, 2020, 8:21pm

Referring to the documentation on sort key guidelines may be useful.

There is nothing inherently wrong with having high-cardinality columns in the sort key. Date/datetimes are commonly found in sort keys, for example. The two primary things to consider in choosing a key are segment elimination and merge cost. Documentation on segment elimination can be found here and on the merger here. You need to estimate if the benefit to read performance given by segment elimination (which of course requires you to be filtering on correlation_id in the first place) is worth the write amplification given by extra merger work.

idan · January 10, 2020, 6:21am

I read that like 20 times:) I didnt know you can add to the columnstore key the variable DESC. we have veriouse queries on that table. this table planned to be huge(billions of transactions) ! perhaps I did mistake by not explaining you the motivation of having order by desc. mybe Iam doing something wrong here by desgin.

We have backoffice where we transactional reports to customers. we use that query to fetch the results to generate this report. We do this by pagination and show it on the screen (using pages). we dont want to fetch all table at once so we paginate it using ordered by desc with the query you already familiar:

SELECT * FROM contribution.cont_event cont WHERE game_id = 'RUUvr1574773090344' AND action IN ('PLACE_BET') AND event_arrival_time BETWEEN '2019-12-02 00:00:00' AND '2019-12-31 23:59:59' ORDER BY event_arrival_time DESC LIMIT 30 OFFSET 0

This way we can show pages to the end customer ordered by date desc and he can scroll to the next ones as he wishes(google-stype). perhaps we overkill memsql for this and we need to think about other way.

i found things like: Third Normal Form, Star Schema, and a Performance Centric Data Strategy

I do understand the concept still dont understand how memsql applies this. but hey iam sure thats for another question which I already posted (How to leverage memsql for reports that combining multiple tables - #2 by franck.leveneur - Off-Topic - SingleStore Forums)

Ive done: SHOW COLUMNAR MERGE STATUS FOR cont_event; merger showed good results - I think:

Merger,State,Plan,Progress,Partition
(Current groups),NULL,"17,1",NULL,7
(Current groups),NULL,"17,1",NULL,6
(Current groups),NULL,"17,1",NULL,1
(Current groups),NULL,"17,1",NULL,5
(Current groups),NULL,"17,1",NULL,15
(Current groups),NULL,"17,1",NULL,4
(Current groups),NULL,"17,1",NULL,3
(Current groups),NULL,"17,1",NULL,2
(Current groups),NULL,"17,1",NULL,0
(Current groups),NULL,"17,1",NULL,8
(Current groups),NULL,"17,1",NULL,9
(Current groups),NULL,"17,1",NULL,10
(Current groups),NULL,"17,1",NULL,11
(Current groups),NULL,"17,1",NULL,12
(Current groups),NULL,"17,1",NULL,13
(Current groups),NULL,"17,1",NULL,14

idan · January 20, 2020, 12:20pm

Hi @cwatts was wondering if you had another chance to look on my last response?? Thank you