Using real-world examples and plain language, Bobby Coates, SingleStore Product Marketer, provides a comprehensive overview of In-Memory Databases (IMDBs). Learn what an In-Memory Database (IMDB) is, why it is useful, and how it compares to a traditional database.
What is an in-memory database? (IMDB)
In-memory databases get a lot of attention, but many folks don’t know what they are, why they’re different, and if they’re worth looking into. Let’s take a few minutes and talk about some of the differences between a traditional database and an in-memory database and why those differences are important.
Why use an In-Memory Database?
There are 3 main reasons developers opt to use in-memory databases. These include:
- Very short persistence: Because the data is in-memory, the database can respond very quickly to simpler queries. This is particularly beneficial for systems that need to take action in the moment, such as game leaderboards. These systems don’t do well with analyzing large amounts of data over periods of time. The data volumes are very large and that may make such a system cost-prohibitive.
- Pre-aggregated cache: Sometimes this is done as a sidecar to a slow data mart or slow data warehouse. Aggregates are generated on the large slow system and then copied to the in-memory system. In this way, a fast response can be generated for queries, but the data is not real time and often stale. While not a long term solution, this is sometimes done as a stop-gap while a better solution is developed.
- Dashboard with a large number of users: Dashboards are often valuable across the enterprise. As a result in-memory databases can leverage their performance in responding to the large number of queries - especially if those queries are known and can be planned for.
How a traditional database stores and works with data: a real-world example
Here’s what happens when new data is submitted to a traditional database and later queried. We’ll use a common example in the retail industry of someone buying a shirt in Oregon and, later, a manager who requests the sum of shirt sales in Oregon:
- The sale (including information like selling price and location) is immediately written to memory (sometimes called RAM or volatile memory) on the database and a record is added to a log file (this is so that if the database or server fails the data can be added into the database when restored).
- At some point that is convenient to the database, the data is copied to disk, where that data will remain until used. Disk is the primary storage for the data. The data currently in memory is purged, freeing that memory for a different task.
- When a query is submitted to the database (let’s say a manager wants to know the sum of shirt sales in the Oregon), the database will read all the blocks of data from disk that include the relevant information (sales records in Oregon) and send it back to memory for processing (again, I’m simplifying here but that’s ok, the core premise remains).
- Once in memory, the data can be aggregated (all the sales in Oregon) and a single number is then returned as part of the SQL.
- The data in memory is again purged, freeing that memory for another task. If the same query is submitted later, the process of reading data from disk is repeated.
How is an in-memory database different?
Let’s revisit the same shirt sale in Oregon:
- The sale (including information like selling price and location) is immediately written to main memory (sometimes called RAM, or volatile memory) and a record is added to a log file (this is so that if the database or server fails, the data can be added into the database when restored) - so far nothing has changed.
- At some point that is convenient to the database, the data is copied to disk. The data will stay there until it changes or the database is restarted. The data also remains in memory and memory will be the primary storage of the data.
- As in our previous example, a query is submitted to the database to get the sum of shirt sales in Oregon. Only this time, the data is already in memory so there is no need to go to the disk to read it. It can be calculated from the data already in memory. This step is much faster compared to the previous example.
- The data can be aggregated and the single value (sum of shirt sales in Oregon) returned using SQL.
- The data once again remains in memory for further processing should another request be made.
When we think about step 3, it’s important to remember that although it is presented as one step, it’s actually many thousands or millions of trips to the disk to fetch the data for even a single query. The result is a database that can struggle to keep up. By keeping the data in-memory, we reduce this demanding step to very little effort.
Where an In-Memory Database Falls Short
An in-memory database offers significant improvements in performance, however, challenges arise when applications need to deliver more than one capability at scale, including:
- Handling large amounts of data if your data set is too large (or too expensive) to fit in memory. Memory costs an order of magnitude more than adding disk space. The challenge here is either spending a lot of money on memory or deciding what data will be fast and what will not. The tradeoffs can be hard to determine ahead of time and expensive to change later.
- Handling concurrency. Modern data-intensive applications demand high concurrency of users. Also internally, businesses need to democratize data and need to provide access to information into the hands of workers at all levels, not just executives. Better information in more hands generally produces superior results. However, as the data grows, in-memory data stores struggle to scale their performance with high-concurrency requirements.
- Supporting Analytics. In-memory databases are great at handling operational data, but are not optimized for analytics, especially for complex queries or questions that require answers fast. Analytic data is typically very large in particular when viewing trends over long periods of time and it can be challenging to keep all of that information in memory.
- Ingesting and processing high volumes of streaming data. This is a key requirement for most real-time use cases including a lot of the IoT use cases, where you often have sensors that send out data every second on critical parameters. Being able to ingest, process and analyze data as it lands is critical in order to identify patterns, detect anomalies and pinpoint exactly what components have failed - or are about to fail can significantly shorten downtime.
There are quite a few databases that can do one of the above well but very few that can do two, let alone all three. If these are the challenges you face, consider looking at a modern database built for data-intensive applications, such as SingleStore.
SingleStore - So Much More Than an In-Memory Database
SingleStore behaves like an in-memory database when you need it, but can also handle other workloads, such as data warehouse workloads. We do this with our tiered storage model, which includes in-memory, and on-disk storage, and cloud object store to deliver fast performance for any data-intensive application. Each of these storage types are highly-optimized. For instance, when data is stored in-memory in SingleStore skip-list indexes are used which outperform the traditional B-tree.
Learn more about how SingleStore leverages a tiered storage model to handle workloads targeted for in-memory databases (IMDBs) and simultaneously for cloud data warehouses (CDWs) in this blog article from SingleStore CTO and co-founder Adam Prout.
SingleStore offers a fast, distributed, highly-scalable SQL database designed to power modern data-intensive applications. Interestingly, SingleStore started as an in-memory database (as MemSQL) but has evolved our platform to be a highly performant unified database that can handle both transactions and analytics effectively.
With our patented Universal Storage, SingleStore is designed to deliver maximum performance for both transactional (OLTP) and analytical (OLAP) workloads in a single unified engine to drive maximum performance for your modern applications. Our Universal Storage combines the best of row-stores and columnstores to deliver maximum performance -- like processing hundreds of millions of rows per second on columnstore tables on just a single core, whereas peak performance per core on rowstores is about 20 million rows per second.
With SingleStore you can ingest millions of events per second with ACID transactions using SingleStore Pipelines, while simultaneously delivering blazing fast SQL queries on billions of rows of data.
SingleStore is designed from the ground up as a multi-model database, supporting relational, key-value, document, geospatial, time series data, and full-text search. With a relational engine at its core, SingleStore delivers effortless scalability and simplicity to modern data architectures, reducing the overall complexity and cost.
With 10 -100x performance at one third the cost of traditional databases, SingleStore is ideal for applications that require fast data ingestion, low-latency queries and elastic scaling with familiar relational SQL.
If you have questions about how SingleStore helps address In-Memory Database requirements, ask them on our Forum. Development experts and engineers from SingleStore, as well as members of our user community are always happy to help out.
Follow us on Twitter.