New

Getting Started With CDC Replication from MongoDB

Notebook

SingleStore Notebooks

Getting Started With CDC Replication from MongoDB

SingleStore's native data replication gives you the ability to do one-time snapshot, and continuous change data capture CDC from MongoDB® to SingleStoreDB. This provides a quick and easy way to replicate data and power up analytics on MongoDB® data.

What you will learn in this notebook:

Setup replication of a collection to SingleStore and see the live updates on MongoDB® collection replicate to SingleStore.

Install libraries and import modules

In [1]:

!pip3 install pymongo --quiet
import pymongo
import random

Replicate a collection to Singlestore

In [2]:

%%sql
DROP DATABASE IF EXISTS cdcdemo;
CREATE DATABASE cdcdemo;

In [3]:

source_mongo_url = "mongodb+srv://mongo_sample_reader:SingleStoreRocks27017@cluster1.tfutgo0.mongodb.net/?retryWrites=true&w=majority"

Create a link to Source MongoDB

In [4]:

s2client = pymongo.MongoClient(connection_url_kai) #Initiatizing client for Kai
s2db = s2client["cdcdemo"]
res = s2db.command("createLink", "mongolink",uri=source_mongo_url)
print(res, res["ok"])
if res["ok"] != 1:
raise Exception("Failed to create link: %s" % "local")

Specify the source database and collection and start replication

In [5]:

create_col_args = {"from": {"link": "mongolink", "database": "cdcdemo", "collection": "scores"}}
res = s2db.create_collection("scores", **create_col_args)

The following command waits till the entire collection from MongoDB is synced to SingleStore

In [6]:

%%sql
USE cdcdemo;
SYNC PIPELINE scores;

Printing some documents that are replicated

In [7]:

s2collection = s2db["scores"]
scores_cursor = s2collection.find().limit(5)
for scores in scores_cursor:
print(scores)

Total documents count

In [8]:

s2collection.count_documents({})

Insert a document in the source MongoDB collection

In [9]:

data = {
"student_id": random.randint(0, 100),
"class_id": random.randint(0, 500),
"exam_score": random.uniform(0, 100) # Generate random score between 0 and 100 as a double
}

In [10]:

sourceclient = pymongo.MongoClient(source_mongo_url)
sourcecol = sourceclient["cdcdemo"]["scores"]
res = sourcecol.insert_one(data)

In [11]:

sourcecol.count_documents({})

The newly added document is now replicated to singlestore, increasing the documents count by 1 demonstrating real time sync

In [12]:

s2collection.count_documents({})

This native replication capability from Singlestore makes it easy to setup and run continuous data replication from your MongoDB at no additional cost or infrastructure requirements

Details

About this Template

Setup Zero ETL data replication from MongoDB to SingleStore

Tags

cdcmongokai

License

This Notebook has been released under the Apache 2.0 open source license.