Understanding Oracle’s Real-Time Ingestion Overhead With Kafka, and How SingleStore Is Better

AG

Ankit Goyal

Pre-Sales Lead, MENA & GSI's

Understanding Oracle’s Real-Time Ingestion Overhead With Kafka, and How SingleStore Is Better

Kafka is a distributed data store optimized for ingesting and processing streaming data in real time. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records in simultaneously.

Kafka is generally used as a source for real time ingestion use cases where the target is one or more relational databases which act as a ODS, data mart or data warehouse for real-time analytics. 

This blog focuses on the difference between Oracle and SingleStore — namely, SingleStore’s integration with  Kafka resulting in differences in  simplicity, scalability and  TCO reduction.

Let's talk about Kafka first, and what exactly it offers:

what-is-kafkaWhat is Kafka?

Kafka is a distributed data store optimized for ingesting and processing streaming data in real time. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records in simultaneously.

Kafka is used to build real-time streaming applications. A data pipeline reliably processes and moves data from one system to another, and a streaming application is an application that consumes streams of data.

But this blog isn’t necessarily about Kafka — it’s about integrating with Kafka and processing streaming data in real time. Incoming data should be consumed by a database which can also act as a data mart or data warehouse.

Let's consider two scenarios: one where the selected database is Oracle, and one where it’s SingleStore.

oracle-integration-with-kafkaOracle integration with Kafka

Oracle is a popular RDBMS and widely used in industries like telecom, banking and infrastructure where there is an excess of streaming data coming in from IOT sensors. In my previous stint with Oracle (and mostly with GoldenGate), every other prospect was looking to use GoldenGate to move data from Kafka to Oracle in real time.

The architecture involved  for ingesting real-time data into Oracle from Kafka would look something like this:

If you look at the previous figure, data from Kafka is extracted by Oracle GoldenGate and replicated to the Oracle database. Oracle GoldenGate is the fastest technology for near real-time replication of Oracle databases.

Some of the overheads with this approach include:

  1. Total Cost of Ownership (TCO). Oracle GoldenGate is a great product, but comes with a very high cost.
  2. Points of Failure (POF).  There is more than one point of failure: if extract has an issue, or if replicate has a failure. 
  3. Operations cost.  A specialized GoldenGate team is needed to make sure the solution is always up and running, and  data is always in sync.
  4. Infrastructure overhead.  GoldenGate requires additional infrastructure to run (CPU, memory and disk), since it creates a number of processes which should always be running.

Now, let’s take a look at how Kafka ingestion looks different if it’s with SingleStore.

What is SingleStore?

SingleStore is a real-time distributed SQL database offering speed, scale and simplicity for real-time  analytics. It is a cloud-native operational database that excels in delivering instant insights and supports a variety of workloads, including machine learning and AI applications.

SingleStore offers a unified solution for both transactions and analytics, capable of managing structured, semi-structured and unstructured data — empowering organizations to make fast, data-driven decisions.

single-store-integration-with-kafkaSingleStore integration with Kafka

SingleStore has native integrations with Kafka, allowing you to ingest data from Kafka topics directly into SingleStore tables. SingleStore Pipelines can pull and push data natively to Kafka — negating the requirement of any ETL/CDC tool in between.

single-store-pipelinesSingleStore Pipelines

SingleStore Pipelines is a feature within SingleStore that enables efficient and continuous data ingestion from various external sources, apart from Kafka. 

Pipelines can extract, transform if necessary and load data in real time into SingleStore. It supports data sources including Amazon S3, Azure Blob Storage, file systems, Google Cloud Storage and  HDFS. It also supports database sources like MongoDB® and MySQL and can handle formats including JSON, Avro, Parquet and CSV.

Pipelines are the first class citizens of SingleStore — with a simple five lines of code that allow you to natively pull and push data from and to Kafka. Here’s an example.

Steps:

  1. You can spin up a SingleStore Cloud instance from our Cloud Portal for free
  2. Assume you have Apache Kafka running and some data on a topic available to replicate it to a SingleStore table
  3. Using our SQL editor within the portal, you can  write following command:

CREATE PIPELINE <Pipeline_name>
AS LOAD DATA KAFKA 'public-kafka.xx.com:9092/<Topic_name>’
BATCH_INTERVAL 2500
INTO TABLE <Table_Name>
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '|\n';
Start Pipeline <Pipeline_name> ;

Various examples of the code are available in our SingleStore docs.

Pipelines will run continuously, pulling any new incoming data related to the topic  and replicating it to the table in SingleStore. You can also opt to stop and start the replication simply by stopping the pipeline and restarting when needed. 

push-to-kafkaPush to Kafka

Another extremely important functionality SingleStore provides its it native capability to push data to Kafka — here is the syntax:

SELECT col1, col2, col3 FROM <table_name>
ORDER BY col1
INTO KAFKA 'host.example.com:9092/test-topic'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY "\t"
LINES TERMINATED BY '}' STARTING BY '{';

More details and examples are available in SingleStore docs.

advantages-of-single-storeAdvantages of SingleStore

Real-time ingestion through Kafka withSingleStore as your primary database delivers the following advantages:

  • TCO reduction. There’s no ETL needed to pull or push data from/to Kafka
  • No points of failure.  Since Pipelines are objects within the database, there are no external points of failure and no hops.
  • No operational overhead.  There’s no specialized team needed to maintain the data replication from Kafka to SingleStore.
  • Simplicity.  SingleStore delivers a clean, simple architecture — without any external third-party integrations. 

So what are you waiting for?   Try SingleStore for free, and let us know your experience!


Share