Columnstore segments scanning optimization

Hello,

I have encountered weird behavior regarding columnstore table, when adding an AND condition will increase number of scanned segments instead of filtering the results scanned by previous more eliminating condition

Here is the table:

CREATE TABLE `recommendations` (
  `_item_id` bigint(20) unsigned NOT NULL,
  `_recommended_item_id` bigint(20) unsigned NOT NULL,
  `_score` float NOT NULL,
  `_algorithm_id` tinyint(3) unsigned NOT NULL,
  /*!90618 SHARD */ KEY `_recommended_item_id` (`_recommended_item_id`),
  KEY `_item_id` (`_item_id`,`_algorithm_id`,`_recommended_item_id`,`_score`) /*!90619 USING CLUSTERED COLUMNSTORE */ /*!90621 WITH(COLUMNSTORE_SEGMENT_ROWS=10000) */
) /*!90623 AUTOSTATS_CARDINALITY_MODE=INCREMENTAL, AUTOSTATS_HISTOGRAM_MODE=CREATE, AUTOSTATS_SAMPLING=ON */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */

When running query like:

SELECT *
FROM recommendations
WHERE
recommendations._item_id IN(X);

For my specific set of ids this scans 27 segments, returns 150k rows which are then filtered.

For query like:

SELECT *
FROM recommendations
WHERE
recommendations._item_id IN(X) AND _recommended_item_id IN(Y);

this scans 118 segments and returns 1.2M rows which are then filtered.

How can I force memsql to scan columnstore table using _item_id condition and then just filter out rows using _recommended_item_id condition? I have tried index hints, adding extra hash index only with _item_id, but that did nothing.

Thanks
Jan

Hi Jan. In your case, the order of the sort key is having an impact. Could you re-profile the queries if you change the sort key to

(`_item_id`,`_recommended_item_id`, `_algorithm_id`,`_score`)

I will follow up to see if better segment elimination is possible with an AND like yours.