The SingleStore team enjoyed sponsoring and attending Spark Summit last week, where we spoke with hundreds of developers, data scientists, and architects all getting a better handle on modern data processing technologies like Spark and SingleStore. After a couple of days on the expo floor, I noticed several common questions. Below are some of the most frequent questions and answers exchanged in the SingleStore booth.
1. When should I use SingleStore?
SingleStore shines in use cases requiring analytics on a changing data set. The legacy data processing model, which creates separate siloes for transactions and analytics, prevents updated data from propagating to reports and dashboards until the nightly or weekly ETL job begins. Serving analytics from a real-time operational database means reports and dashboards are accurate up to the last event, not last week.
That said, SingleStore is a relational database and you can use it to build whatever application you want! In practice, many customers choose SingleStore because it is the only solution able to handle concurrent ingest and query execution for analyzing changing datasets in real-time.
2. What does SingleStore have to do with Spark?
Short answer: you need to persist Spark data somewhere, whether in SingleStore or in another data store. Choosing SingleStore provides several benefits including:
Longer answer: There are two main use cases for Spark and SingleStore:
3. What’s the difference between SingleStore and Spark SQL?
There are several differences:
4. How do SingleStore and Spark interact with one another?
The SingleStore Spark Connector is an open source tool available on the SingleStore GitHub page. Under the hood, the connector creates a mapping between SingleStore database partitions and Spark RDD partitions. It also takes advantage of both systems’ distributed architectures to load data in parallel. The connector comes with a small library that includes the SingleStoreRDD class, allowing the user to create an RDD from the result of a SQL query in SingleStore. SingleStoreRDD also comes with a method called saveToSingleStore(), which makes it easy to write data to SingleStore after processing.
5. Can I have one of those cool t-shirts? (Of course!) What does the design mean?
The design is a graphical representation of Hybrid Transactional/Analytical Processing (HTAP), a term coined by Gartner. It refers to the convergence of transactional and analytical processing in a single database, usually for real-time analytics.
Circling back to the first question, SingleStore excels at this kind of hybrid workload. In addition to reducing latency and consolidating hardware, HTAP powers tight operational feedback loops that can create opportunities for net new revenue and bottom line cost savings. For more information on HTAP, read the Gartner Market Guide for In-Memory Databases.