Spark dataframe write to Meemsql

Hi,
I am using memsql version 2.0.1.I am trying to write a dataframe having values(“app_name”,“table_name”,3000,true),I wrote to Meemsql using

df.write
.option(truncate,false)
.mode(SaveMode.Append)
.jdbc(url,tablename,connectionProperties)

which was working fine and taking the boolean value true as ‘1’ till 16/04/2020.From 16/04/2020 onwards its taking Boolean value true as ‘0’.Why this happens?Please help

Regards
Smitha

Hello puliythsmitha,

Thank you for reaching out about this issue. I have a couple of follow up questions.

  • When you say that you are using memsql version 2.0.1, are you referring to the version of the MemSQL Spark Connector?
  • If so, can you confirm what version of MemSQL you are using as well? You can do so by running select @@memsql_version in the database.
  • Can you think of any changes made to the Spark data frame, Connector settings, the MemSQL database, or the table you are loading into in MemSQL on April 16, 2020?
  • Can you provide the makeup of the table you are loading into in MemSQL by running SHOW CREATE TABLE <table_name>?
  • Can you provide a sample output of your data frame that you are loading from in spark using df.show()?

As a side note, we strongly recommend you use the 3.0 version of the MemSQL Spark Connector (if you are indeed using version 2 from your comment above)

Best,
Roxanna

Hi,
Please find the Answers here:
Q)When you say that you are using memsql version 2.0.1, are you referring to the version of the MemSQL Spark Connector?

Ans:Yes.memsql-connector_2.11,version 2.0.1

Q)If so, can you confirm what version of MemSQL you are using as well? You can do so by running select @@memsql_version in the database.

Ans:6.8.12

Q)Can you think of any changes made to the Spark data frame, Connector settings, the MemSQL database, or the table you are loading into in MemSQL on April 16, 2020?

Ans:No changes made

Q)Can you provide the makeup of the table you are loading into in MemSQL by running SHOW CREATE TABLE <table_name>?
Ans:
CREATE TABLE sm_meta_table (
app_name text CHARACTER SET utf8 COLLATE utf8_general_ci,
table_name text CHARACTER SET utf8 COLLATE utf8_general_ci,
record_count bigint(20) DEFAULT NULL,
hdfs_path text CHARACTER SET utf8 COLLATE utf8_general_ci,
timestamp text CHARACTER SET utf8 COLLATE utf8_general_ci,
write_success bit(1) DEFAULT NULL,
memsql2write_status text CHARACTER SET utf8 COLLATE utf8_general_ci
/*!90618 , SHARD KEY () */
)

Q)Can you provide a sample output of your data frame that you are loading from in spark using df.show()?

--------±-------------------±-----------±-------------------±------------------±------------±------------------+
|app_name| table_name|record_count| hdfs_path| timestamp|write_success|memsql2write_status|
±-------±-------------------±-----------±-------------------±------------------±------------±------------------+
| My_App|my_table | 39162|hdfs://test/… |2020-04-19 08-41-54| true| STARTED|
±-------±-------------------±-----------±-------------------±------------------±------------±------------------

The same values reflect on Memsql table too except write_success,which gets converted to 0

Thanks
Smitha

Hi puliythsmitha,

Thank you for providing all the details!

You should use MemSQL’s actual BOOL data type to store the ‘true’ or ‘false’ you are attempting to store, not our BIT type. With our BOOL type, a value of 0 is considered FALSE, non-zero values are considered TRUE.

It is not clear why this value changed behavior on the 16th, but using the BOOL type will guarantee consistency.

Can you please change the write_success datatype to use BOOL instead? Please confirm the outcome of the same load and let us know if this resolves the issue.

For additional reference on our datatypes, please see: https://docs.memsql.com/v6.8/reference/sql-reference/datatypes/

Best,
Roxanna

Hi,

I was creating dataframe on spark and load the dataframe into Memsql using df.write command.

At dataframe level I mentioned the type as Boolean only,not BIT.My schema is like this:

val schema = StructType(Array(
StructField(“app_name”,StringType,true),
StructField(“table_name”,StringType,true),
StructField(“record_count”,LongType,true),
StructField(“hdfs_path”,StringType,true),
StructField(“timestamp”,StringType,true),
StructField(“write_success”,BooleanType,false),
StructField(“memsql2write_status”,StringType,true)
))
I don’t know how it gets converted to BIT type when loaded into Memsql.In that case how and where I can mention BOOL type ?

Regards
Smitha

Hi Puliyathsmitha,

Thank you for clarifying. Initially, I had thought that you were creating your table in MemSQL, but it looks like you are creating the table in Spark which is automatically creating the table and types in MemSQL.

After further discussion with engineering, the conversion from BIT to BOOLE is an issue with 2.0 connector on how BIT is treated. At this time, we recommend switching over to the MemSQL Spark Connector version 3.0 where this issue will not occur and BOOLE type will properly be converted.

We plan on releasing the 3.0 connector next month and will deprecate the 2.0 version of the connector shortly thereafter.

In a previous forum post, I recall you were unable to add the 3.0 connector to your dependencies. This is because you didn’t explicitly specify the version of the Beta (this issue won’t occurr once we GA next month).

Does adding the 3.0 connector to your dependency like the below work for you? This will add the latest version of the connector (beta10) to your project.

<dependency>
  <groupId>com.memsql</groupId>
  <artifactId>memsql-spark-connector_2.11</artifactId>
  <version>3.0.0-beta10-spark-2.3.4</version>
</dependency>

Github Link to the Connector: https://github.com/memsql/memsql-spark-connector/tree/3.0.0-beta

As an early user of the 3.0 Memsql spark connector, I cant agree more with Roxanna. Many bug fixes, features, and performance improvements.

2 Likes

Hi,
I tried the changes you suggested in my dependency file,pom.xml .But its throwing error message like dependency not found.
As a quick fix,I changed the Boolean true or false to Integer 1 or 0 at spark dataframe level and the same reflect on Memsql table also.

Regards
Smitha

Hi Puliyathsmitha,

Thanks for the update. That is strange the dependency is not working. Here is a link to the Maven Artifacts which shows the dependency snippet you can copy for different versions of spark. Are you able to access this?

https://search.maven.org/artifact/com.memsql/memsql-spark-connector_2.11

In any case, our partner teams are doing some testing on our end with Maven and adding our connector dependency, so we will share any instructions/updates when we have them.

Note - we still encourage you to test the new Spark Connector if you can (outside of Maven) and using a Spark instance, you should be able to run the following to start your Spark instance and make it aware of our Connector.

spark-shell --packages com.memsql:memsql-spark-connector_2.11:3.0.0-beta10-spark-2.4.4