Real-Time Data Platforms: SingleStore vs. Databricks
SingleStore and Databricks are both exceptional data platforms that address important challenges for their customers.However, when it comes to performance and cost, SingleStore has several, major advantages because it’s built from the ground up for performance, which ends up leading to lower cost. This blog is the first of a multi-part series in which we will examine these differences, and we will begin on the subject of real-time analytics and operations, an area in which SingleStore excels. Additionally, we have observed that SingleStore also has cost and performance advantages in non real-time, batch ETL jobs — and we will cover those in a follow up blog.Understanding the value of real-time dataTo begin, let's establish the significance of real-time data. Why do customers value it? The simple answer is in many use cases, the value of data diminishes as it ages. Whether you're optimizing a marketing campaign, monitoring trade speeds, pushing real-time inventory updates, observing network hiccups or watching security events, delays in customer reactions translate to financial losses. The events generated by these sources arrive continuously — in a stream — which has led to the rise of streaming technologies. Databricks' recent blog, "Latency goes subsecond in Apache Spark Structured Streaming," aptly describes this:“In our conversations with many customers, we have encountered use cases that require consistent sub-second latency. Such low latency use cases arise from applications like operational alerting and real time monitoring, a.k.a "operational workloads."At SingleStore, we deal in milliseconds, because that’s what matters to our customers. Let’s call this quality latency, and define it as the time it takes for one event to enter the platform, reach its destination and generate value. There are other important factors to consider, and Databricks correctly points out two more in their blog which describes “give[ing] users the flexibility to balance the tradeoff between throughput, cost and latency”. We’ll add two more, simplicity and availability, to complete our goals for the ideal real time data platform:Minimize latencyMaximize throughputMinimize costMaximize availabilityMaximize simplicityHow SingleStore handles real-time use casesFirst, we’d like to discuss SingleStore’s recommended approach to real-time data use cases, which is to ingest streaming data into SingleStore and query it, illustrated in the following figure.