I have the data below represented with over 170 distinct key values which I am supposed to use a shard key column. The counts below are representation of the number of records for each key.
ColumnStore Table. The table will be sourced for analytic reports
Table1 KeyColumn Count(1) 12345 1000 67890 300 98765 5000 76543 2500 43789 100000
I am calculating one more column with the formula (row_number() over (partition by keycolumn) % 100). So that every 100 rows within each key will have a value ranging from 0 to 99 as an addition column ShardColumn.
Now I have given shardkey as (KeyColumn,ShardColumn)
And in the select where clause I am using the below where clause.
where KeyColumn = 12345 and ShardColumn in (0,1,2,3,4,5,6,7,8…99)
Now, when I did the explain and profile the gather partition is InList and also the skew percentage has reduced from 80 to 11.
I need suggestion or the comment on the above approach whether it is an optimized table design in terms of aggregation and the indexing also by avoiding the data skew.
The explain and profile has said it. But I need a direct suggestion from your team would be more helpful and confident.