INSERT IGNORE INTO for Rowstores on MemSQL 7.1

ymonye · October 28, 2020, 7:45pm

Hi all,

I’m using the INSERT IGNORE INTO statement to insert data while ignoring duplicate pre-existing information, into a Rowstore table. Below is how the table was created:

CREATE TABLE IF NOT EXISTS rs_list (id CHAR(42), name TEXT, type1 TEXT, type2 TEXT, type3 TEXT, type4 TEXT, type5 TEXT, SHARD KEY (id, name, type1, type2, type3, type4, type5), UNIQUE KEY (id, name, type1, type2, type3, type4, type5));

This used to work fine in versions of MemSQL prior to 7.1, yet noticed today these tables are still showing duplicate data (where each column matches the column of another row). Is there a revised method of creating Rowstore tables to fully support the INSERT IGNORE INTO command?

Also, has this or something similar been introduced to ColumnStores / Singlestores? I’d previously asked back in March of this year, as our operations would greatly benefit from running these outside of RowStores & freeing up tons of needed RAM.

Thanks!
Nick

adam · November 3, 2020, 5:27pm

Hi,

I likely need to see some example data. insert… ignore won’t insert duplicate data (it will ignore the duplicate key error though).

Insert ignore support for columnstore tables is coming in 7.3.

-Adam

sumanamara2020 · December 5, 2020, 7:02am

Try this:
INSERT INTO retweets_counter(user_id, num_retweets)
SELECT tweet::retweeted_user_id, 1
FROM batch
WHERE tweet::retweeted_user_id is not null;
ON DUPLICATE KEY UPDATE user_id = values(user_id);

Not sure if this is what you want though.

ymonye · December 9, 2020, 6:07am

Thanks @adam and @sumanamara2020.

Basically what I am trying to do, if my table has the following information:

id: "John", type1: "Atlanta", type2: "Georgia", type3: "Southeast", type4: "USA", type5: "North America"

I’d like to insert new rows into my table without creating duplicate entries for rows where all of the columns match.

If the next data I’m inserting is the below, for Sarah who lives in California (while there’s no other row that exists matching her information), it I’d expect it to write to the table with no problems:

id: "Sarah", type1: "San Diego", type2: "California", type3: "Southwest", type4: "USA", type5: "North America"

Now if I receive the same data as the previous “John” data, with all of the columns matching, and attempt to write it to the table, I would like MemSQL to instead ignore and not create a duplicate row:

id: "John", type1: "Atlanta", type2: "Georgia", type3: "Southeast", type4: "USA", type5: "North America"

I guess this is where I was confused with the “INSERT IGNORE INTO” query, as I’d believed it would ignore writing a new row if the data matched all of the columns of a pre-existing row. Now if I were to insert the same data with one of the columns different, for example, “Jack” instead of “John”, then yes, I’d expect MemSQL to write a new row:

id: "Jack", type1: "Atlanta", type2: "Georgia", type3: "Southeast", type4: "USA", type5: "North America"

adam · December 11, 2020, 7:04pm

Hi ymonye

You have a unique key on all the columns of your table correct (like in your table DDL above)? In that case INSERT… IGNORE should do what you want. A duplicate row will trigger a duplicate key error for the unique key with all columns in and INSERT… IGNORE will ignore that error (so the row won’t be inserted).

If I had to guess there is some subtle different in the two rows the look to be duplicates (could be white space or possibly odd utf8 character set differences). You may want to check select length(col) or select md5(col) and see if they are really the same.

ymonye · March 6, 2021, 4:33pm

Hi Adam,

I’m marking this one as resolved, as I figured out my issue. A lot of the rows of data I was using for my “INSERT IGNORE” included NULL data, and I learned below that NULL data was still identified as unique data. Those nulls were replaced with “N/A” strings.

adam · March 8, 2021, 3:08pm

Thanks for the update. That is correct, NULLs in MySQL/Singlestore don’t trigger unique key errors. This is a source of some confusion as different databases have different behaviors - though I believe our behavior is ANSI standard.

-Adam