Turning Amazon S3 Into a Real-Time Analytics Pipeline


Seth Luersen

Previous Head of Training, Curriculum, and Certification Programs

Turning Amazon S3 Into a Real-Time Analytics Pipeline

SingleStoreDB Self-Managed 5.7 introduces a new pipeline extractor for Amazon Simple Storage Service (S3). Many modern applications interface with Amazon S3 to store data objects into buckets up to 5TB providing a new modern approach for today’s enterprise data lake.

without-analytics-the-data-is-just-a-bunch-of-filesWithout analytics, the data is just a bunch of files

For modern enterprise data warehouses, the challenge is to harness the unlimited nature of S3 for ad-hoc and real-time analytics. For traditional data warehouse applications, extracting data from S3 requires additional services and background jobs that monitor buckets for new objects and then load those objects for reporting and analysis. Eliminating duplicates, handling errors, and applying transformations to the retrieved objects often requires extensive coding, middleware, or additional Amazon offerings.

from-data-lake-to-real-time-data-warehouseFrom data lake to real-time data warehouse

A SingleStore S3 Pipeline extracts data from a bucket’s objects, transforms the data as required, and loads the transformed data to columnstore and rowstore. SingleStore Pipelines use the power of distributed processing and in-memory computing to extract, transform, and load external data in parallel to each database partition to achieve exactly-once semantics.

To stream existing and new S3 objects while querying the streaming data at sub-second performance, a SingleStore S3 Pipeline runs perpetually. Rapid and continuous data ingest for real-time analytic queries is a native component of SingleStore. The constant data ingest allows you to deliver real-time analytics with ANSI SQL and power business intelligence applications like Looker, ZoomData, or Tableau.

SingleStore Pipelines are a first class database citizen. Database developers and administrators can easily create, test, alter, start, stop, and configure pipelines with basic data definition language (DDL) statements or use a graphical user interface (GUI) in SingleStore Ops.

Excited to get started with SingleStore S3 Pipelines? Follow these steps:

  1. Open an AWS account. AWS offers an AWS Free Tier that includes 5 GB of Amazon S3 Storage, including 20,000 Get Requests and 2,000 Put Requests.

  2. Download a 30-day free trial of the SingleStore Enterprise Edition or use the SingleStore Official Docker Image to run SingleStore.

  3. With an available cluster running, create your first SingleStore S3 Pipeline using our S3 Pipelines Quickstart. The guide covers creating S3 buckets, a SingleStore database, and most importantly, a SingleStore S3 Pipeline.