What Is Streaming Analytics?




What Is Streaming Analytics?

Streaming analytics are used in myriad use cases across all industries. Many use cases can be categorized into a handful of technical scenarios, all of which generate enormous streams of operational data.

what-is-streaming-analyticsWhat Is Streaming Analytics?

Streaming analytics is the continuous processing and analysis of data records, typically at high volumes and at high speed, to extract actionable insights and/or generate automated alerts or actions, all in real time.

It’s a different analytic approach than batch processing; streaming analytics continuously processes small data sizes (often just a few kilobytes), while batch processing involves periodic (such as once daily) analysis of large amounts of data (up to multiple terabytes or more) aggregated via extract-transfer-load (ETL) processes. 

streaming-analytics-are-different-than-biStreaming analytics are different than BI

Traditional business intelligence (BI) and analytics tools were designed to work with batch-processed static sources, and often require data to be duplicated into data warehouses or proprietary data stores. These tools lack streaming query capabilities. Modern streaming analytics and data visualization tools enable queries on, and deliver analysis and insights from, streaming data sources.

use-cases-for-streaming-analyticsUse cases for streaming analytics

Streaming analytics are used in myriad use cases across all industries. Many use cases can be categorized into a handful of technical scenarios, all of which generate enormous streams of operational data:

  • Transaction analytics: Analytics performed on streaming transaction data can detect meaningful transaction events and respond in real time. For example, applying streaming analytics to credit card transaction data flows can detect anomalies that signal fraud in real time, identifying potentially fraudulent transactions and stopping them before completion. This can trigger alerts such as notifications via mobile banking app, email, text or phone call, asking the customer to verify a suspicious transaction.

    Analogously, streaming analytics are a cornerstone of online commerce, advertising and content curation, aggregating clicks and interpreting user behavior. Products are suggested, ads promoted or content suggested that the user may be interested in to effect a purchase or other response.

  • Containers: Container technology packages an application so it can be run in the cloud, with its dependencies, isolated from other processes. Amazon Web Services, Microsoft Azure and Google Cloud Platform have all embraced container technology such as Docker, Apache Mesos, rkt and Kubernetes.

    Containers generate vast streams of log data. The more complex container clusters become, the harder it becomes to get to the root cause of performance issues through static log analytics. Through continuous capture, correlation and querying of log data, streaming analytics tools speed root cause analysis of performance problems and optimize container performance.

  • Sensor data: Internet of Things (IoT) applications collect constant streams of data emitted by sensors in all manner of applications and environments: predictive maintenance of machinery and airplanes, electrical power grid monitoring and management, precision farming and aquaculture, oil and gas production, and much more.

    Streaming analytics process and analyze fast-moving live data from IoT devices to trigger automated, real-time actions or alerts. It's essential for gaining insights into faults as they occur and course-correct to maintain process quality and integrity, before these issues become major problems.

  • Edge computing: Edge analytics applies streaming analytics technology to data generated at an individual sensor, network switch or other device located at the network edge, instead of waiting for it to be sent back to a centralized data store. For example, using streaming analytics to analyze incoming traffic at the network edge can identify if a suspicious volume is a DDoS attack, coming from a single IP address or from a user with a single profile.

technology-requirements-for-streaming-analyticsTechnology requirements for streaming analytics

  • Optimized architecture: Columnstore databases traditionally have been restricted to data warehouse uses where low latency queries are a secondary goal. Data ingestion is typically restricted to be offline, batched, append-only or some combination thereof.

    To handle streaming analytics, a column store database implementation must treat low latency queries and ongoing writes as “first-class citizens,” with a focus on avoiding interference between read, ingest, update and storage optimization workloads. This broadens the range of viable column store workloads to include streaming analytics, and their stringent demands on query and data latency. These applications include operational systems that back adtech, financial services, fraud detection and other data streaming applications.

    SingleStoreDB is a modern, unified, real-time distributed SQL database. It uses fragmented snapshot transactions and optimistic storage reordering to meet the extremely low latency requirements of streaming analytics applications. SingleStoreDB stores data in tables and supports standard SQL data types. Geospatial and JSON data types are also first-class citizens in SingleStoreDB, which can store and query structured, semi-structured and unstructured data with equal ease.

    In SingleStoreDB, a table is either distributed or non-distributed (e.g., a reference table). There are two storage types for tables: in-memory rowstore and columnstore. All columnstore tables have an unexposed, in-memory rowstore table. SingleStoreDB automatically spills rows from the in-memory rowstore to columnstore. All data, including the hidden rowstore table, is queryable for the columnstore table.

  • Streaming ingest: Distributed, high-throughput, lock-free ingestion is essential for streaming analytics.

    SingleStoreDB offers built-in batch and real-time data pipelines, allowing massive parallel data ingest into distributed tables. SingleStore Pipelines continuously load data as it arrives from external sources including Apache Kafka, Amazon S3, Azure Blob, file system, Google Cloud Storage and HDFS data sources. JSON, Avro, Parquet and CSV data formats are supported.

Additional Resources