SingleStore is a hybrid relational database geared toward data-intensive applications that runs on-prem, or in a private or public cloud. What is a data-intensive application, and why use SingleStore even if I don’t think I have one?
Unfortunately, the ‘why’ becomes difficult because several database vendors lean into the idea of data intensity — and messaging around the fastest databases, best performance and more become even more complicated when analytic and reporting tool vendors add a data front end in their offering. This blog post explores what a data-intensive application means, what makes SingleStore unique and why you should start using SingleStore today.
What Is Data Intensity?
When you hear the phrase ‘data intensive,’ your first thoughts might be “what does data intensive mean”? Or even, “I don’t think I have a data-intensive application.”
The truth is, a lot of your existing applications should be data intensive — but limitations of database and hardware technology have guided architectural decisions every step of the way. Back in the ‘80s, I worked on a server with a 50MB drive. We had to severely limit what information we wanted to capture. When I started in data warehousing, the company I worked for specialized in successfully building multi-TB, customer-centric data warehouses. Although this was an unusual feat in 1998, what made us successful was our architectural decisions not only on what we could include, but also what we should exclude from the data warehouse. Updates were run in a weekend batch, with hopes it would be ready for Monday morning. That logic is still being applied today!
We know that data comes in real time to most of our record keeping systems and applications, but we still use the same approach to our architecture. We limit our designs around bottlenecks. We don’t bring in too much data so our load process doesn't choke. We report on stale data since our processes are batched to our analytics and reporting databases. We limit access so as not to overload the database. We archive data because drives fill up and are expensive. We create aggregate tables and cubes in batches at fixed periods of time, because reporting on large volumes of data is just too slow. We extract, transform and load data multiple times going from our source systems to operational data stores, to data warehouses to data marts, to cubes to data lakes — and then we flatten our data.
We have created this complicated architecture to overcome speed limitations with our storage and retrieval process. I worked for a large online retailer and assisted on a project a few years ago for hourly reporting. We ended up having a completely separate process for the hourly reporting, since our data came in nightly batch runs. Not only that, but it took 10 - 15 minutes into each hour before you could see the previous hour’s data. And, users could only see the results for half the company.
Even if you’re able to work around these limitations, our need for data-intensive applications is in our future. Five years ago it would have been inconceivable to track a UPS truck driving through a neighborhood. Now, I watch it right on my phone. The expectation for real-time access to the things that impact our day-to-day only continues to grow. And if we take out the limitations of technology, we start to see how we can better interact with our customers and suppliers — starting with a database that handles data-intensive applications, and sets the foundation for the future.
We can start now, phasing in applications that are limited by current designs. We can augment our existing application where data-intensive apps demand it — building new applications, and modernizing what we have. This allows us to seamlessly move into the next generation that is data intensive. It also removes the need and associated costs of migrating later, handling everything under pressure to choose a database, programming language and deployment environment.
The other consideration for SingleStore implementations is to have a lower total cost of ownership (TCO) on less intensive applications — allowing budget to be redirected not only from the database, but also the cost of the infrastructure (servers) and to better utilize your organization’s manpower.
Recognizing Data-Intensive Applications: 5 Key Criteria
So, now you know what data intensity means. But what makes up a data-intensive application? Here are five criteria to consider:
It is the combination of these data-intensive properties that make SingleStore unique. Some SingleStore competitors claim they have low latency (which they do) — but only when supporting 50 or 60 concurrent users. Some claim to have high data volumes, but can’t support simple joins at speed. Reading data is quick, but updates and deletes are slow since writes block reads. For some, their numbers look good, but when looking closely they are a function of using in-memory data frames with development languages like Spark or Python, and not in the database.
SingleStore’s connectors allow you to work with dataframes and enhance dataframe performance. Ingest speed is quick — but vendors take minutes (or hours) for data to be usable in queries. SingleStore excels at handling data-intensive requirements at high speeds, complexity and concurrency, and at a lower cost than the competition. Customer Proof of Concepts (POC) and real world deployments have shown SingleStore to be 10 - 100x faster at ⅓ the cost.
Trading ‘or’ for ‘and’: Database Features You Don’t Have to Decide Between
What makes SingleStore unique? SingleStore is able to handle what used to be trade-offs in data architecture. Most database vendors will say they can handle these trade-offs, and they can. It is just a matter of degree. From my personal experience, traditional databases like SQL Server are able to handle these requirements, but on a different scale. Instead of ingesting 10M transactions per second like SingleStore, completing 15,000-20,000 is a challenge. - Our 'new' database competitors boast 100K - 1M transactions per second. In most cases, you will need multiple database offerings from our competitors to handle those needs with duplicate ETL processes, resulting in database sprawl.
On the flip side, we still need to have large scale ingestion of data from other sources that are not in real time. The trade-off was batch OR real time. Being able to load and transform large volumes of data speed up the analysis and discovery on that data. SingleStore is able to handle the demands of real-time data as efficiently as it handles batch processing — batch AND real time. Other databases can have long lags between ingesting data and when it is available for reporting. Being able to run Machine Learning algorithms at speed and scale makes predictions that save lives — like for Thorn, where reducing ingest time to reporting from 20 minutes to 200 milliseconds made all of the difference in the world.
In Summary: Why Choose SingleStore?
Get the only database designed for data-intensive applications.