"Failed to find data source: memsql" error is observed while connecting to "memsql" from "PySpark"

Content of main.py:

spark.conf.set("spark.datasource.memsql.ddlEndpoint", "hostname")
spark.conf.set("spark.datasource.memsql.dmlEndpoints", "hostname,hostname:3306")
spark.conf.set("spark.datasource.memsql.user", "username")
spark.conf.set("spark.datasource.memsql.password", "password")

df_rules = spark.read.format("memsql").option("ddlEndpoint", "hostname").option("username", "password").load("table_name")
df_rules.show()

Spark-submit command. (spark v2.4.0) :

spark-submit --master local[4] --jars valid_path/mariadb-java-client-2.4.0.jar main.py

Error:

java.lang.ClassNotFoundException: Failed to find data source: memsql. Please find packages at Third-Party Projects | Apache Spark

What could be the possible reason for this error ?

Hi @narendra,
Seems like you haven’t specified a path to memsql-spark-connector jar.
You can find the right jar here.
Please download it and try again with specifying a path to memsql-spark-connector jar file.

Best Regards

1 Like

Hi,
How do I specify the path to memsql-spark-connector jar file
I have it downloaded and I placed it in:
/home/vishwa/spark-3.0.1-bin-hadoop2.7/jars/memsql_connector.jar

still getting the same error :

An error occurred while calling o43.load. : java.lang.ClassNotFoundException: Failed to find data source: memsql

Hi @vishwajeetdabholkar,
Could you please provide how did you specify the path to the jar at Spark?

Best Regards,
Blinov Ivan

1 Like

Hi Ivan,

Please see the details at following location:

Thanks,
Vishwajeet