Not able to write all data to rowstore table

priyatoni88 · July 8, 2020, 1:48pm

Not able to write data to row store table .
issue:-
Data is getting saved to table partially while writing data from spark dataframe to table
there are around 7lakh records to be written in table but only 5lakh(approx) records are getting saved and
it is failing with below error:-
Caused by: java.sql.SQLException: Too many LOAD DATA errors. Limit was set to 1000. Postpend ‘MAX_ERRORS <error_limit>’ to LOAD DATA statement to set a higher limit, or use ‘MAX_ERRORS 0’ to disable error tracking.

table desc:
CREATE TABLE master (
product varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT ‘’,
product_details varchar(200) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT ‘’,
PRIMARY KEY (product,product_details)
)

adam · July 9, 2020, 8:36pm

Can you check on what if the following query has hints as to why your data isn’t loading:

select * from information_schema.load_data_errors

priyatoni88 · July 14, 2020, 1:01am

i tried to execute this query.but it is throwing below error.

Error Code: 1146. Table ‘information_schema.load_data_errors’ doesn’t exist

priyatoni88 · July 14, 2020, 1:04am

I am trying below spark script to write the data.
df.write
.format(“memsql”)
.option(“ddlEndpoint”,PropertiesLoader.ddlendpoint)
.option(“dmlEndpoint”,PropertiesLoader.ddlendpoint)
.option(“user”, PropertiesLoader.memsql_user)
.option(“password”, PropertiesLoader.memsql_password)
.option(“database”, PropertiesLoader.memsql_db)
.option(“truncate”, true)
.mode( SaveMode.Ignore)
.save(dbname.master)

even i tried through other way as well

val connectionProperties = new Properties()
connectionProperties.put(“driver”, PropertiesLoader.memsql_jdbc_driver)
connectionProperties.put(“jdbcUrl”, PropertiesLoader.memsql_url)

df.write
.mode( SaveMode.Ignore)
.option(“truncate”, true)
.jdbc(PropertiesLoader.memsql_url, dbname.master, connectionProperties)

But ended up with same error.

roxannapourzand · July 14, 2020, 2:22am

Hi Priya,

It appears you are using OverWrite Mode of Ignore, which will, by default, run LOAD DATA SKIP DUPLICATE KEY ERRORS. Based on this (and without the explicit load data errors you are experiencing), we can assume that you are likely receiving many errors that relate to Duplicate entries for Unique keys.The behavior of SKIP DUPLICATE KEY ERRORS is described here: LOAD DATA · SingleStore Documentation.

Do you have an existing table called ‘master’ when you are running this load? If so, do you expect to have duplicates on the primary key in the existing MemSQL table as compared dataframe you are loading?

Can you also provide a little bit of context on how you want the data to be saved? Do you want the data to be truncated/overwritten?

I ask the above question because you have specified truncate as true, but your SaveMode is Ignore not Overwrite. I do not believe that this will truncate your table since saveMode is not OverWrite.

Also, note that the truncate option (starting with the release candidate - Spark Version 3.0.0-rc-1) is deprecated. It will still work when used, but we now recommend you use OverWriteBehavior, which you can specify as ‘truncate’, ‘merge’, or ‘dropandcreate’.

Best,
Roxanna