Columnstore key guidelines

I wanted to share the key guidelines that I put together so far. I still have a few related questions as well in bold.

Shard keys

  • A table has up to one shard key.
  • If no shard key is defined, then it defaults to the primary key.
  • If no shard or primary key is defined, or an empty shard key is defined, then data is distributed across partitions uniformly at random.
  • Shard key fields cannot be updated (or altered) so they must be fields that never change such as primary key fields.
  • If a table has a primary key defined, then its shard key field(s) must be a subset of the primary key field(s).
  • Shard key values should be evenly distributed, i.e., be mostly unique or have a consistent number of occurrences.
  • The shard key should consist of the most common subset of fields on which there are frequent joins, filters, or aggregations. Generally, fewer fields are better.
  • The order of the shard key fields does not matter because it is ultimately hashed into a single value. Question: If two tables have shard keys that consist of the same fields but in different order, will they still result in local joins, or does the order need to be consistent?

Columnstore table keys

  • A columnstore table has one columnstore key, up to one shard key, and no primary key.
  • Since primary keys cannot be defined, their fields should be defined as an unenforced unique key, which is informational and acts as a hint for query plans. UNIQUE KEY (<primary key fields>) UNENFORCED RELY Question: Are there any downsides to defining an unenforced unique key?
  • The columnstore key should consist of the fields on which there are always, or most commonly, filters generally ordered from least to most unique up to the point of uniqueness. Question: In addition to commonly filtered fields, do commonly aggregated fields benefit from being included in the columnstore key?
  • In general, the first columnstore key fields must be used in a query to benefit from the usage of the remaining fields.
  • As such, the columnstore key should not be more precise than common query filters. For example, it may be better to include a date rather timestamp field followed by another commonly filtered field to leverage both.
1 Like