How do split table rows into ranges for processing?

hi,
I am trying to build a query to split a given table’s rows into range of values based on a key column and block size. For example, if I have a table with a key column with values starting from 1 to 20000 and my block size is 5000, I want to get 4 ranges in the query response.

strt_ky end_ky
--------------------
1  5000
5001 10000
10001 15000
15001 20000

I have the equivalent queries that works fine on other database platforms. But when I try to form similar query on Singlestore, I run into different issues stating so and so feature is not supported…

I am using rowstore table with v7.8 self managed version of Singlestore. The no of rows in the table may vary from 2M to 50M. Idea is the split the data into ranges and process them in parallel from distributed process.

Below is the Oracle query that I am using as reference.

WITH KEY_COL_RANGE AS (
    SELECT ROW_NUMBER() OVER(ORDER BY A.MYCOL1 ASC) AS RANK,
        A.MYCOL1 AS KEY_COL
    FROM MYTABLE A
    ),
    KEY_RANGE as (SELECT ROW_NUMBER() OVER(ORDER BY RANK) AS RANK1,
        RANK,
        KEY_COL
    FROM KEY_COL_RANGE
    WHERE MOD(RANK, CAST(5000 AS INT)) IN (1, 0)
    OR RANK IN (SELECT MAX(RANK) FROM KEY_COL_RANGE))
    SELECT(SELECT CAST(R2.KEY_COL AS VARCHAR(20))
    FROM KEY_RANGE R2
        WHERE R2.RANK1 = R1.RANK1-1 ) AS STRT_KY_VAL
        , CAST(R1.KEY_COL AS VARCHAR(20)) END_KY_VAL
    ,R1.RANK AS END_POS
        FROM KEY_RANGE R1
    WHERE(MOD(RANK1,2) = 0 OR RANK1 = (SELECT MAX(RANK1) FROM KEY_RANGE))
    ORDER BY END_POS

Thanks!

Consider putting the rowset you want in a temp table, then select from the temp table and use an expression to bucket the serial numbers to desired “block numbers.” It can use MOD, or CASE or some other expression you decide is convenient.

If you are getting an error on SingleStore and not Oracle, (a) consider trying the latest SingleStore, 8.1, and (b) if it still fails, give us your CREATE TABLE statements and the error message you are getting. If we have a gap in our SQL coverage, we’ll take it into consideration for later.

i need to run this as an inline query as our application supports multiple data sources. So cant use temp table. Also, we are not scheduled to upgrade to v8.x anytime soon in next 3 months as we just started out with 7.8. So have to get it working on 7.8 only.

We get the error Feature ‘Scalar subselect where outer table is not a sharded table’ is not supported by SingleStore Distributed.. Looks like RANK1 = (SELECT …) and RANK IN (SELECT …) is causing the issue. I have anyway now modified the query to have some interim CTE to hold the max value and have rewritten the query and it works now.

To create the issue on your end, you can use below script for defining a simple table.

DROP TABLE IF EXISTS MYTABLE FORCE;
CREATE ROWSTORE TABLE MYTABLE (
  MYCOL1	BIGINT	NOT NULL PRIMARY KEY);
INSERT INTO MYTABLE(MYCOL1) VALUES(1);
INSERT INTO MYTABLE(MYCOL1) VALUES(2);
INSERT INTO MYTABLE(MYCOL1) VALUES(3);
INSERT INTO MYTABLE(MYCOL1) VALUES(4);
INSERT INTO MYTABLE(MYCOL1) VALUES(5);

Thanks!

I’m glad you found a workaround.

I tweaked your example to make it parse and ran it on 8.1. It does generate an “unsupported” error in 8.1. I’ve recorded this as a feature request.

This is the version I ran.

DROP TABLE IF EXISTS MYTABLE FORCE;

CREATE ROWSTORE TABLE MYTABLE (MYCOL1 BIGINT PRIMARY KEY not null);
INSERT INTO MYTABLE(MYCOL1) VALUES(1);
INSERT INTO MYTABLE(MYCOL1) VALUES(2);
INSERT INTO MYTABLE(MYCOL1) VALUES(3);
INSERT INTO MYTABLE(MYCOL1) VALUES(4);
INSERT INTO MYTABLE(MYCOL1) VALUES(5);

WITH KEY_COL_RANGE AS (
    SELECT ROW_NUMBER() OVER(ORDER BY A.MYCOL1 ASC) AS RANK,
        A.MYCOL1 AS KEY_COL
    FROM MYTABLE A
    ),
    KEY_RANGE as (SELECT ROW_NUMBER() OVER(ORDER BY RANK) AS RANK1,
        RANK,
        KEY_COL
    FROM KEY_COL_RANGE
    WHERE MOD(RANK, CAST(5000 AS UNSIGNED INT)) IN (1, 0)
    OR RANK IN (SELECT MAX(RANK) FROM KEY_COL_RANGE))
    SELECT(SELECT (R2.KEY_COL :> VARCHAR(20))
    FROM KEY_RANGE R2
        WHERE R2.RANK1 = R1.RANK1-1 ) AS STRT_KY_VAL
        , (R1.KEY_COL :> VARCHAR(20)) END_KY_VAL
    ,R1.RANK AS END_POS
        FROM KEY_RANGE R1
    WHERE(MOD(RANK1,2) = 0 OR RANK1 = (SELECT MAX(RANK1) FROM KEY_RANGE))
    ORDER BY END_POS;