Rethinking Lambda Architecture for Real-Time Analytics

Big data, as a concept and practice, has been around for quite some time now. Most companies have responded to the influx of data by adapting their data management strategy. However, managing data in real time still poses a challenge for many enterprises. Some have successfully incorporated streaming or processing tools that provide instant access to real-time data, but most traditional enterprises are still exploring options. Complicating the matter further, most enterprises need access to both historical and real-time data, which require distinct considerations and solutions.

Of the many approaches to managing real-time and historical data concurrently, the Lambda Architecture is by far the most talked about today. Like the physical aspect of the Greek letter it is named for, the Lambda architecture forks into two paths: one is a streaming (real-time) path, the other a batch path. Thus, it accommodates real-time high-speed data service along with an immutable data lake. Oftentimes a serving layer sits on top of the streaming path to power applications or dashboards.

A Fork in the Road

Many Internet-scale companies, like Pinterest, Zynga, Akamai, and Comcast have chosen SingleStore to deliver the high-speed data component of the Lambda architecture. Some customers have chosen to fork the input stream in order to push data into SingleStore and a data lake, like HDFS, in parallel.

Here is an example of the Comcast Lambda Architecture:

The great thing about SingleStore is that it can fulfill both sides of the Lambda architecture, not just the real-time component. Some customers use the SingleStore in-memory rowstore to support real-time streaming, and then use the disk-based columnstore as the batch service and data lake.

Here is an example of a financial services customer using SingleStore for both layers:

Real Time Analytics

In this era of ubiquitous big data, it is not enough for companies to merely process data. Analyzing that data to detect patterns, which can be immediately applied to maximizing operational efficiency, is the real driver of business value. SingleStore delivers real-time analytics on a rapidly changing data set, making it an ideal match for the characteristics of the Lambda Architecture speed service. Other data stores have limitations that inhibit high-speed data ingestion, lack analytical capabilities, or cannot scale affordably. SingleStore delivers a complete solution: the ability to handle millions of transactions per second while simultaneously performing complex multi-table join queries. Let’s dig into some of the features that make SingleStore a great solution for implementing the Lambda architecture.

Scalability

SingleStore uses a distributed shared nothing architecture that scales on commodity hardware and local storage, supporting petabytes of data. SingleStore is a memory-first, relational database that also offers a disk-based columnstore. In-memory optimization delivers high-speed data ingestion while simultaneously delivering analytics on the changing data set. The disk-based columnstore provides historical data management and access to historical data trends to leverage in combination with the “hot” data to deliver real-time analytics.

Multi-model, Multi-mode

SingleStore supports the ingestion of unstructured, structured and semi-structured data. Flexibility to align a structure to data in support of analytics meets the business requirements of the operation. Real-time analytics requires a real-time data structure, which SingleStore supports through a fully relational model. Furthermore, SingleStore supports the ingestion of unstructured and semi-structured (JSON) data into key-value pairs.

Full ANSI SQL support makes SingleStore readily accessible to data analysts, business analysts and data scientists reducing application code requirements. Plugging data visualization and query tools into the analytics architecture delivers immediate value from data to the business.

SingleStore also has extended SQL including JSON support. Traversing a JSON document is similar to SQL with extensions to traverse the key-value pairs.

Open Source Connectors

SingleStore offers users several connectors for smooth integration with other data sources. One example is SingleStore Streamliner: a fully integrated Apache Spark solution. Streamliner provides easy deployment of Spark — a critical component for building real-time data pipelines that delivers advanced data enrichment and transformation. Another important connector is the SingleStore Loader, which can easily important data from HDFS, as well as import and synchronize data from Amazon S3.

Marketplace Perspective

Customers are investing in SingleStore as they realize the value of data in real time along with the power of SQL to analyze it. Pinterest, Akamai, Zynga, Comcast, and Tapjoy have all deployed SingleStore to power mission-critical applications. Customers from many industries have invested either for performance improvement, the power and familiarity of SQL, or the low cost to scale (shared nothing commodity servers and storage). These include financial services, advertising technology, energy, automotive, and retail, among others.

Download the Community Edition of SingleStore today. It is free, fully functional and eager to tackle your workloads: singlestore.com/cloud-trial/