Error when using subselect - Scalar subselect where outer table is not a sharded table

When I try to fetch a single value for each row via a sub query, I get the error ERROR 1749 ER_DISTRIBUTED_FEATURE_LOCKDOWN: Feature 'Scalar subselect where outer table is not a sharded table' is not supported by MemSQL Distributed.

Is there something wrong with the SQL or the table creation script?

select x.*,
(select last_visited_ts from link li2 where li2.id=x.url_link_id) last_visited_ts
  from (
  select l.url_link_id,
         count(1) pageviews,
         count(distinct l.sessionid_id) user_sessions
    from log l
   where l.timestamp > date_add(current_date(), interval -1000 day)
     and l.dg_intern_extern_ie = 'I'
group by l.url_link_id
order by pageviews desc
limit 10000
) x;

CREATE TABLE log (
						ID bigint not null,						
						TIMESTAMP DATETIME, 
						URL_LINK_ID bigint, 				
						SESSIONID_ID bigint, 
						SESSIONORDER bigint,
						DG_INTERN_EXTERN_IE bigint,
						month_date date,						
						key(TIMESTAMP, SESSIONID_ID, SESSIONORDER, month_date) using clustered columnstore,
						shard(url_link_id))

CREATE TABLE link (
						ID bigint not null,
						DOMAIN_ID bigint not null,
						INTERN_EXTERN_IEB bigint not null,
						CREATED_DATE DATE,
						url varchar(4000) not null,	
						title varchar(4000) not null,	
						last_visited_ts DATE,
						key(id) using clustered columnstore,
						shard(id));

The outermost nested select, which you are using in the SELECT list of the top-level query, can’t be used there – only stored sharded tables can be used like that.

Consider refactoring your query to use regular joins as much as possible, not nested selects.

We opened an internal task to track this issue and will consider enabling this query pattern to run in a future release.

You can resolve this by adding unique key (id) using hash to the link table.

This limitation is related to the subselect (select last_visited_ts from link li2 where li2.id=x.url_link_id) which only works if there is at most one match (one id in link for each url_link_id). Because the system can’t know for sure that there is at most one match, it is limited in what methods it can use to execute it. Adding the unique key gives the database the information it needs to be able to execute the query efficiently.

@jack Does your workaround only work with rowstore tables? We’re running into the same error using Dapper Plus micro-ORM via C# which must be generating these subselects under the covers.

Heres the table but trying to apply your fix we’re getting error:
Feature ‘UNIQUE HASH index on columnstore table with multiple columns’ is not supported by MemSQL.

CREATE TABLE dataSegment2 (
  `SegmentID` INT NOT NULL,
  `BookingID` INT NOT NULL,
  `AirlineCode` VARCHAR(10) NULL,
  `ArriveDateTime` DATETIME NULL,
  `DepartCityCode` VARCHAR(10) NULL,
  `DepartDateTime` DATETIME NULL,
   unique key(SegmentID,BOOKINGID) USING HASH,
   SHARD(BookingID),
   KEY (BookingID) USING CLUSTERED COLUMNSTORE
  )  ;

I don’t think we also want to shard by SegmentID because most of our tables join to each other on BookingID so that would result in data movement?

The unique key I mentioned needs to be on table link on column id for the query you originally posted to run - it needs to exactly match the columns used in the join condition in the subselect (select last_visited_ts from link li2 where li2.id=x.url_link_id). If there are more subselects of this type you may need to add unique keys for each.

Currently columnstore tables only support single-column unique keys, so unfortunately if your schema/query requires a unique key on two columns this is only possible on rowstore for now (support for it is coming in a future release)

1 Like

I definitely enjoyed reading it, you’re a great author!

is this rectified? At which release we can expect this issue to be resolved?

Please let us know.