Modern Data Warehousing, Meet AI

We are enchanted by the possibility of digital disruption. New computing approaches, from cloud to artificial intelligence and machine learning, promise new business models and untold efficiencies. We are closing the gap between science fiction and business operations.

A Quick Look Back

Let’s take a quick look back at data processing, and then come back to the industry frontier.

It started with data and the place to put it, which became the database. Then came a desire to understand the data through analytics, and that spawned the introduction of data warehouses. Data warehouses are a form of databases more suited to analytics. Ultimately databases and data warehouses are both datastores, and within the data industry these terms sometimes merge.

The Current Data Landscape

Today, the transactional operations of many business applications are in fine shape. In particular, transactions driven by human or business-to-business interactions, do not require significant computing resources.

Often core transactional systems, or databases, can benefit from faster analytics, which do not need to disrupt the transactional workflow. For example, adding a real-time data warehouse to the architecture brings instant insights to drive business decisions.

A New Class of Transaction

But there is a new class of modern transactions ready to handle data ranging from IoT sensor information to website traffic logs to global financial reporting. The volume of transactions drives a need beyond what a traditional database or data warehouse can accept. Enter the real-time data warehouse.

A Modern, Real-Time Data Warehouse

Entirely new systems are needed to capture modern transactional, event, and streaming data. A real-time data warehouse fits this need by being able to ingest and persist data in real time while simultaneously serving low latency analytic queries to large numbers of simultaneous users.

In dealing with such large volumes of data, represented by all forms of ‘modern transactions’, a real-time data warehouse also needs the ability to use machine learning to help harness insights from a vast array of live inputs.

Fast Machine Learning Built-In

Incorporating machine learning with your real-time data warehouse leads to a powerful, simplified data infrastructure. Getting there is straightforward.

Step 1 – Identify a modern transactional workload

This could be any volume of data that pushes the limits of your organization’s existing data systems. Even if you pull data from multiple sources, you want to hone your skill set at rapidly ingesting large volumes. Good examples might be data coming from a message queue such as Kafka, or coming in from Hadoop or S3.

Step 2 – Derive immediate insight with SQL

With a real-time data warehouse, you can ingest data, including transactional data, and immediately access that data with SQL. This provides a powerful, efficient, and universal approach to data exploration. It further opens access to a wide range of technical and business analysts at any company who are familiar with SQL.

Step 3 – Delve into ML and AI

Certain real-time data warehouses, such as SingleStore, have machine learning capabilities, including:

DOT_PRODUCT to compare vectors directly in SQL
K-means clustering using extensibility
Bi-directional, high throughput, highly parallel connector to Apache Spark

By incorporating these capabilities within the real-time data warehouse, organizations can dramatically simplify data architectures and provide wide access to real-time information for faster critical decisions.

Today, we launched our latest O’Reilly book, Data Warehousing in the Age of Artificial Intelligence. To learn more, check out the snapshot below and download our latest ebook.

What’s Inside?

Chapter 1: The Role of a Modern Data Warehouse in the Age of AI
Enterprises are constantly collecting data. Having a dedicated data warehouse offers rich analytics without affecting the performance of the application. A modern data warehouse can support efficient query execution, along with delivering high performance transactional functionality to keep the application and the analysis synchronized.

Chapter 2: Framing Data Processing with ML and AI
The world has become enchanted with the resurgence in AI and ML to solve business problems. All of these processes need places to store and process data. For modern workloads, we have passed the monolithic and moved on to the distributed era, and we can see how ML and AI will affect data processing itself.

Chapter 3: The Data Warehouse Has Changed
Decades ago, organizations used transactional databases to run analytics. Then applications evolved to collect large volumes and velocity of data driven by web and mobile technologies. Recently, a new class of data warehouse has emerged to address the changes in data while simplifying the setup, management, and data accessibility.

Chapter 4: The Path to the Cloud
There is no question that, whether public or private, cloud computing reigns as the new industry standard. When considering the right choices for cloud, data processing infrastructure remains a critical enablement decision.

Chapter 5: Historical Data
All of your business’s data is historical data; it represents events that happened in the past. In the context of your business operations, “real-time data” refers to the data that is sufficiently recent to where its insights can inform time sensitive decisions. Historical data itself might not be changing, but the applications, powered by models built using historical data, will create data that needs to be captured and analyzed.

Chapter 6: Building Real-Time Data Pipelines
Building any real-time application requires infrastructure and technologies that accommodate ultra-fast data capture and processing. A memory-optimized data warehouse provides both persistence for real-time and historical data as well as the ability to query streaming data in a single system.

Chapter 7: Combining Real Time with Machine Learning
Machine learning encompasses a broad class of techniques used for many purposes, and in general no two ML applications will look identical to each other. This is especially true for real-time applications, for which the application is shaped not only by the goal of the data analysis, but also by the time constraints that come with operating in a real-time window.

Chapter 8: Building the Ideal Stack for Machine Learning
An ML “stack” is not one dimensional. Building an effective ML pipeline requires balance between two natural but competing impulses – use existing technology or build something yourself. Although many ML algorithms are not (or not easily) parallelizable, this is only a single step in your pipeline.

Chapter 9: Strategies for Ubiquitous Deployment
As companies move to the cloud, the ability to span on-premises and cloud deployments remains a critical enterprise enabler. In this chapter, we take a look at hybrid cloud strategies.

Chapter 10: Real-Time Machine Learning Use Cases
Real-time data warehouses help companies take advantage of modern technology and are critical to the growth of ML, big data, and AI. Companies looking to stay current need a data warehouse to support them. If your company is looking to benefit from ML and AI and needs data analytics in real time, choosing the correct data warehouse is critical to success.

Chapter 11: The Future of Data Processing for Artificial Intelligence
Maximizing value from ML applications hinges not only on having good models, but on having a system in which the models can continuously be made better. The reason to employ data scientists is because there is no such thing as a self-contained and complete ML solution. In the same way that the work at a growing business is never done, intelligent companies are always improving their analytics infrastructure.