Hi,
I’m working with a large sample of trading data from multiple years, with each row as a single trade. Columns included in this table are the trade_datetime (MM-DD-YYYY HH:mm:ss.SSS), trade_date (MM-DD-YYYY), and trade_hour (HH).
My below query works as intended with a smaller set of data; this retains the latest trade data over a 1hr interval, then deletes everything else.
However with anything larger, 8.55M rows of data, I receive the following error in MemSQL:
ERROR 1956 ER_PARAMETER_CAPACITY_EXCEEDED: Query cannot be completed because the parameter array capacity of 1048576 was exceeded
I have plenty of resources available on this machine, along with 128GB or RAM, yet was wondering if there was a setting within MemSQL placing constraints on this parameter?
Query:
DELETE FROM tbl_trades
WHERE row IN
    (SELECT row
    FROM tbl_trades A
    WHERE trade_datetime <
        (SELECT MAX(trade_datetime)
        FROM tbl_trades
        WHERE ticker = A.ticker
        AND trade_date = A.trade_date
        AND trade_hour = A.trade_hour)
    GROUP BY ticker, trade_price, trade_volume, trade_datetime, trade_date, trade_hour);
Here is how the table was created:
CREATE TABLE IF NOT EXISTS tbl_trades
(
    row INT AUTO_INCREMENT,
    ticker TEXT NOT NULL,
    trade_price DECIMAL(12,2) NOT NULL,
    trade_volume DECIMAL(12,2) NOT NULL,
    trade_datetime DATETIME(6) NOT NULL,
    trade_date DATE NOT NULL,
    trade_hour INT NOT NULL,
    upload_time TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    KEY(`row`) USING CLUSTERED COLUMNSTORE,
    SHARD KEY (`ticker`)
);
Thanks