Best way to push data from a singlestore database

novalis83 · January 2, 2021, 11:29pm

Hi everyone!
I have a tricky task to accomplish. I need a recurring procedure transferring data from a singlestore database to a non-singlestore RDBMS.
Can you suggest me a proficient way to address this task? I think pipelines don’t help since they are used to insert data into memsql.

Thanks in advanced!

hanson · January 4, 2021, 6:13pm

If it’ll work for your scenario, consider running a chron job to periodically export new data from SingleStore and import it to your target. You can filter the data based on the value of a datetime or timestamp column to just find the new items.

If you’re a paying customer, consider also SingleStore Replicate for periodic bulk data transfers.

Many third-party data integration and data movement products can be used with SingleStore native or MySQL connectors to move data into SingleStore also. Check our third-party integrations documentation page, which lists a few, and which we update when new connectors are tested.

What kind of applications are you running against the source and target DBs? How often do you want to move the data? How many tables are involved? How much data per day will you move?

novalis83 · January 4, 2021, 8:01pm

Hey Hanson. Thanks for replying back!

My target is to develop a job that pushes data from a SingleStore database in a cluster to 3 or 4 tables in a Oracle database.
We are currently importing data into SingleStore database from XML files by a Java application connected to database via JDBC.
This new job must be always running. I have to export tens of thousands of data every day from a single table in the Memsql cluster.
I can add a thread in the current Java application to start a procedure exporting these data. My question is about a proficient way to move them to Oracle.
Any specific hints to provide me after these additional info?
Thank you very much!

hanson · January 4, 2021, 10:03pm

Thanks for the additional information. I think my original advice still stands. Given that you have tens of thousands of row updates per day, performance probably won’t be a big issue so you could use regular INSERT/UPDATE/DELETE logic to apply the updates to the target database. You can use datetime/timestamp fields to decide what got inserted or updated, probably. Tracking deletes is harder. You could use a “key differencing” approach to track deletes if you must. You could put the old key set in a special helper table every day (or whatever period you want). Then your app can look for keys that are not there any more. And you can remove those from the target DB.

novalis83 · January 4, 2021, 10:54pm

Yes, I think your approach can be useful to my purposes. What do you suggest to export data from SingleStore into oracle tables? Any specific third-party tool? That’s my biggest issue now.

Thanks!!

hanson · January 4, 2021, 11:26pm

If it is really simple, and just a few tables, I’d try just using SELECT INTO FS or S3 etc. from SingleStore and bulk load the result into Oracle.

If it’s more complicated, and you are a paying customer, consider using SingleStore Replicate.

And a bunch of other data transfer tools will likely work as well, if you have one on hand. You might have to try the MySQL connector if they don’t support SingleStore as a named connector.

novalis83 · January 5, 2021, 7:25am

If I wanted to use the MemSQL spark connector would I be able to handle a MemSQL db as source and a Oracle db as target?

Where can I find the documentation about the syntax SELECT INTO FS or S3 that you’ve mentioned earlier?

Thank you.

hanson · January 5, 2021, 9:38pm

For SELECT INTO FS or S3, see the SELECT topic documentation:

roxannapourzand · January 6, 2021, 6:53pm

Hi novalis83,

We don’t have a standard way with Spark to read data from SingleStore to Oracle.

Though, in addition to what Eric suggested, you may be able to read your data out of SingleStore into Spark and use a standard JDBC connection to load it into Oracle.

For example:

You should read data from SingleStore like this:

val df = spark.read
    .format("memsql")
    .load("foo")

Then you can write the data into Oracle using jdbc:

val jdbcUrl = `"jdbc:oracle://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
df.write.jdbc(jdbcUrl, "<tableName>", connectionProperties)`

novalis83 · January 6, 2021, 7:21pm

Thanks you all for the helpful suggestions.
With respect to writing data into Oracle using jdbc, you showed a code snippet for an insert into a single table. What is the command to write into multiple tables?
Lastly, is the code you showed using memsql spark connector?

Thank you!

roxannapourzand · January 7, 2021, 6:26pm

The read code above is using our Spark Connector – you can see the data source format provided is specifying a “memsql” data source. The df.write code provided above is not necessarily using our connector because all the data is already in a Spark data frame (from the read operation). The code is using a standard Spark write operation via Scala (independent of our connector). Once the data is in Spark and you are moving it elsewhere, you don’t need the connector.

We don’t believe there is a way to write into multiple tables in Spark using a single write operation. You will have to separate out the data you want to write into different tables. E.g.,

val jdbcUrl = `"jdbc:oracle://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

df.write.jdbc(jdbcUrl, "<tableName1>", connectionProperties)`
df.write.jdbc(jdbcUrl, "<tableName2>", connectionProperties)`

novalis83 · January 7, 2021, 8:19pm

I got it!
One more question. Is the memsql spark connector available for .NET Core projects? Should I have to use a MySQL JDBC Spark Connector instead?

Thanks!