Do Unified Databases Make Polyglot Persistence Irrelevant?

Driven by new requirements and new types of workloads, many organizations have invested in specialized database technologies in recent years. New systems had to support big transactional and analytical workloads. They had been using general-purpose database technology for years, but discovered it was no longer good enough. New data requirements demanded specialized database technology.

General-purpose database technology was suitable for developing almost any type of workload. With this technology, data warehouses, websites, transactional databases, and so on, could be developed. They were good at everything, but they did not excel in certain workloads. It was time for specialized database technologies optimized for certain types of workload. Some were developed specifically for graph analytics, others for massive data ingestion workloads, and others were ideal for processing advanced queries on massive amounts of data initiated by data scientists. Organizations needed the power of these technologies to meet their challenging new data requirements.

Many organizations needed support for several, different workloads, but unfortunately, each of the specialized database technologies was developed for only one or two of them. This led organizations to deploy a myriad of database technologies, including:

Key-value stores
Graph databases
Time-series databases
In-memory lookup databases
And more

A myriad of different databases is not without its drawbacks.

First, it led to unbridled copying of data as data had to be copied from source systems to multiple databases. For example, data was copied from source systems to a key-value store to have very fast direct access to individual records, the same data was copied to a graph database for graph analytics, to a time-series database to process typical time-series based queries, and finally, to Parquet files so that the data could be queried by data scientists using a SQL-on-Hadoop engine. Some organizations have also deployed specialized NoSQL products to develop source systems that could support massive transactional workloads. But some of those technologies were not suitable for reporting and analytics, they were optimized for transaction processing. So that data also had to be copied to systems that could handle the analytical workload.

Second, when data is stored in so many different and independent databases, it goes without saying that data integration becomes a challenge.

Third, users of these databases work with copied data, and therefore not with zero-latency data. But the fact is users increasingly need zero-latency data for their work. In fact, in some companies, data is only copied every night, resulting in a very high data latency.

Fourth, the need eventually arose to combine time-series, graph, and text analytics in a single query. With the specialized database technologies, this had to be done in several steps, with each analytical step yielding an intermediate result that would be passed on to next technology for the next analytical step.

Working with different databases is often referred to as polyglot persistence. It indicates that data is stored in different databases that all support their own languages. Developers must therefore use multiple (poly) languages (glot) to access all the stored data (persistence).

Polyglot persistence was never the goal.

It was a way for organizations to deal with these new data requirements. They had no choice when they made this decision.

To solve the issues described above, it is time to go back to monoglot persistence. It is clear that the need exists for database technology that can handle big transactional and analytical workloads concurrently, support multiple data models, and combine different types of analytics. This category is called the unified databases, sometimes referred to as translytical or HTAP (Hybrid Transaction/Analytical Processing) databases.

Unified databases can:

Concurrently process an analytical and transactional workload.
Support multiple data models.
Combine multiple forms of analytics in a single query.

The effect is less copying of data, simpler data architectures, access to zero-latency data, and combined analytics.

Monoglot Persistence: The New Normal?

Again, polyglot persistence was never the goal, it was the only solution at the time to meet the new data requirements. Monoglot persistence should be the goal and unified databases can help us to achieve that goal. Specialized database technology will always be needed, but with unified databases a lot less than a few years ago.