Unique key in clustered columnstore

faijaz1990 · January 7, 2022, 6:28am

Hi Team ,

I was trying to create a clustered columnstore table with unique key constraints. I understand that columnstore supports primary/unique key but just would like to check whether clustered columnstore supports unique key?I am getting below error while trying to create a columnstore table with primary key .

I’m using singlestore version 7.6

adam · January 10, 2022, 3:37pm

Hi,

In 7.6 all tables are columnstore by default (in version before 7.5 they were rowstore by default). The clustered columnstore syntax is only needed if you want the table to have a sort key. For example, this is a columnstore with a unique key in 7.6

create table t(a int primary key);

show create table t; will give us:

CREATE TABLE `t` (
  `a` int(11) NOT NULL,
  UNIQUE KEY `PRIMARY` (`a`) USING HASH,
  SHARD KEY `__SHARDKEY` (`a`),
  KEY `__UNORDERED` () USING CLUSTERED COLUMNSTORE
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' |`

If you want a sort key on the same column as a unique key. The syntax would be this:

create table abc_test(
a varchar(30), 
b int, 
c varchar(30), 
primary key(a), 
key(a) using clustered columnstore);

Or you can use the shorter sort key syntax:

create table abc_test(
a varchar(30), 
b int, 
c varchar(30), 
primary key(a), 
sort key(a));

The primary key syntax is used to create an index, so you have to specify the sort key separately. We should give a clearer error that combining the two syntax is not allowed.

Note that, as of today, you can’t have a unique key on columns that don’t contain by your shard key columns, so you need to remove the shard key on b if you want to keep the primary key on a.

mpskovvang · October 20, 2022, 11:57am

Hi @adam

Should a columnstore table with a primary key also define a sort key, if all queries against the table use the primary key in a where clause?

Will the sort key make any performance improvement?

I understand, that a sort key is only necessary when querying columns without an index or when sorting the result with an ORDER BY clause. Am I wrong?

adam · October 20, 2022, 6:00pm

Yeah, it depends on your query patterns.

Sort keys give you:

Faster range scans on the primary key (pk > x)
Merge joins (we only do merge join today if there is a sort key on the join columns)
Faster sorting (for order by, window functions, etc.)

The sort key causes some extra IO cost to keep the data sorted, but that cost is mostly asynchronous to writes queries (for the most part… some large data loads do some extra sorting, but otherwise the background merger will keep the LSM tree sorted by the sort key).