Beginner's Guide to HTAP Databases

HTAP, or hybrid transaction/analytics processing, combines transactions, such as updating a database, with analytics, such as finding likely sales prospects.

An HTAP database supports both workloads in one database, providing speed and simplicity. And today, “cloud-native HTAP” is a thing; users want an HTAP database that they can mix and match smoothly with Kafka, Spark, and other technologies in the cloud. Use cases include fraud prevention, recommendation engines for e-commerce, smart power grids, and AI.

HTAP databases work with – and, to a certain degree, are designed for – integration with streaming data sources, such as Kafka, and messaging systems used for advanced analytics, AI and machine learning such as Spark. They serve multiple analytics clients, from business analysts typing in SQL queries, to BI tools, apps, and machine learning models, which generate queries in the scores or thousands per second.

Before HTAP – Separate Transactions and Analytics

HTAP combines different kinds of data processing into one coherent whole. The two types of processing differ considerably. Transaction processing – adding and updating records in a database – demands a very high degree of reliability for single-record operations, along with accuracy and speed. “Update Sandy Brown’s current address” is an example of a transactional update.

Analytics processing, on the other hand, means looking very quickly through one or more database tables for a single record, or many records, or total counts of a type of record. “Find me all the subscribers who live in Colorado and own their own home” is an example of an analytics request.

The first effective databases, first widely used in the 1970s and 1980s, were transaction-oriented. They came to be called online transaction processing (OLTP) systems. OLTP systems were optimized to work on underpowered computers with small hard disks – by today’s standards, of course. The only analytics was through printed reports, which might be sorted on various key fields, such as by state or ZIP code.

When analytics was added on later, the transactional systems were already busy, so the data was copied onto a separate computer, running different software. These databases are called online analytics processing (OLAP) databases. Data warehouses and data marts are specialized OLAP databases that house non-operational data for analysis.

Data on OLAP systems was queried using various languages, which coalesced around structured query language (SQL). At first, analytics queries were entered directly by individual analysts; eventually, business intelligence (BI) programs were used to make querying easier. More recently, software applications generate queries of their own, often at the rate of thousands per second.

An entire process and discipline called extract, transform, and load (ETL) was created, simply to move data from OLTP to OLAP. As part of the ETL process, data owners may mix different databases of their own, externally purchased data, social signals, and other useful information. However, the use of three different silos means that data in the OLAP databases is always out of date – often from one day to one week old.

The Move to HTAP

The OLTP/ETL/OLAP structure is still widely used today. However, over time, both OLAP and, more slowly, OLTP databases were given the ability to work in a distributed fashion. That is, a single data table can now be distributed across multiple machines.

Being distributed across several servers allows the data table to be much larger. A distributed data table can have its performance boosted at any time, simply by adding more servers to handle more transactions or reply to more queries. A database – one or more data tables, serving related functions on overlapping data – can now run on a flexibly sized array of machines, on-premises or in the cloud.

As these capabilities have added, the exciting possibility of intermixing OLTP and OLAP capabilities in a single database has come to fruition. The database software that makes this possible was named hybrid transaction and analytics processing (HTAP) by Gartner in 2013. In 2014, the first version of SingleStoreDB's HTAP product was made available.

This capability is so new that it has many names, including hybrid operational analytics processing (HOAP) and translytical databases (which combine trans_actions and ana_lytical functions). HTAP, HOAP, and translytical databases are also described as performing operational analytics – “analytics with an SLA,” or analytics that have to deliver near-real-time responsiveness. Gartner has also come up with augmented transaction processing (ATP), which describes a subset of HTAP workloads that include operational AI and machine learning.

See more: The Forrester Wave™: Translytical Data Platforms, Q4 2022

The Benefits of HTAP

HTAP has many benefits. HTAP creates a simpler architecture, because two separate types of databases, as well as the ETL process, are replaced by a single database. And data copies are eliminated. Instead of data being stored in OLTP, for transactions, then being copied to OLAP – perhaps multiple times – for analytics, a single source of truth resides in the HTAP database.

These fundamental changes have add-on benefits. Operations is much simpler and easier, because only one system is running, not several. Making a single database secure is easier than for multiple data copies on different systems. And data can be fresh – as soon as data comes in for processing, it’s also available for analytics. No more need to wait hours or days – sometimes longer – for data to go through OLTP and ETL before it’s available for analytics.

Very large cost benefits can be achieved by HTAP, along with related increases in revenues and decreases in costs. Simplicity in architecture and operations results in significant cost savings. Higher performance makes existing revenue-producing functions more productive, and makes new ones possible.

The Internet of Things (IoT) benefits greatly from HTAP. If you’re running a smart grid, you need to be running fast, from the latest data. Analysts, dashboards, and apps all need access to the same, updated data at once.

Machine learning and AI are actually impractical without HTAP. There isn’t much point to running a machine learning algorithm if you can’t be learning from current, as well as historical, data. No one wants to run a predictive maintenance program that tells you that your oil well was likely to need urgent maintenance a week ago, or that there were several interesting travel bargains available yesterday.

How SingleStore Fits

SingleStore was conceived as an HTAP database before the term was even coined by Gartner. The company was founded in 2011 to create a general-purpose database that supports SQL, while combining transactions and analytics in a single, fast, distributed database.

The result is now a cloud-native, scalable, SQL database that combines transactions and analytics. Today, SingleStoreDB Self-Managed is available for download, so you can run it in the cloud or on-premises, or in the form of an elastic managed service in the cloud, Singlestore Helios.

SingleStore allows customers to handle transactional operations in rowstore tables, which run entirely in memory. Most analytics functions are handled by columnstore tables, which reside largely on disk. Data is moved between tables by Pipelines, a fast and efficient alternative to ETL.

Today, SingleStore is going further, introducing SingleStore Universal Storage in 2019. Universal Storage is a new expression of the core idea behind HTAP. In Universal Storage, rowstore and columnstore tables each gain some of the attributes of the other. For instance, rowstore tables now have data compression, which was formerly the sole province of columnstore. And columnstore tables can now quickly find, and even update, a single record, or a few records – capabilities that were formerly the hallmark of rowstore.

And, we continue to push HTAP databases forward to power the development of real-time applications.

Increasingly, with Universal Storage, a single table type can fill both transactional and analytical needs. Any needed performance boost is provided by scalability, simply adding servers. Future plans for Universal Storage include even higher levels of convergence.