New

Getting Started With CDC Replication from MongoDB

Notebook

SingleStore Notebooks

Getting Started With CDC Replication from MongoDB

SingleStore's native data replication gives you the ability to do one-time snapshot, and continuous change data capture CDC from MongoDB® to SingleStoreDB. This provides a quick and easy way to replicate data and power up analytics on MongoDB® data.

What you will learn in this notebook:

Setup replication of a collection to SingleStore and see the live updates on MongoDB® collection replicate to SingleStore.

Install libraries and import modules

In [1]:

!pip3 install pymongo --quiet
import pymongo
import random

Replicate a collection to Singlestore

In [2]:

%%sql
DROP DATABASE IF EXISTS cdcdemo;
CREATE DATABASE cdcdemo;

In [3]:

source_mongo_url = "mongodb+srv://mongo_sample_reader:SingleStoreRocks27017@cluster1.tfutgo0.mongodb.net/?retryWrites=true&w=majority"

Create a link to Source MongoDB

In [4]:

s2client = pymongo.MongoClient(connection_url_kai) #Initiatizing client for Kai
s2db = s2client["cdcdemo"]
res = s2db.command("createLink", "mongolink",uri=source_mongo_url)
print(res, res["ok"])
if res["ok"] != 1:
raise Exception("Failed to create link: %s" % "local")

Specify the source database and collection and start replication

In [5]:

create_col_args = {"from": {"link": "mongolink", "database": "cdcdemo", "collection": "scores"}}
res = s2db.create_collection("scores", **create_col_args)

The following command waits till the entire collection from MongoDB is synced to SingleStore

In [6]:

%%sql
USE cdcdemo;
SYNC PIPELINE scores;

Printing some documents that are replicated

In [7]:

s2collection = s2db["scores"]
scores_cursor = s2collection.find().limit(5)
for scores in scores_cursor:
print(scores)

Total documents count

In [8]:

s2collection.count_documents({})

Insert a document in the source MongoDB collection

In [9]:

data = {
"student_id": random.randint(0, 100),
"class_id": random.randint(0, 500),
"exam_score": random.uniform(0, 100) # Generate random score between 0 and 100 as a double
}

In [10]:

sourceclient = pymongo.MongoClient(source_mongo_url)
sourcecol = sourceclient["cdcdemo"]["scores"]
res = sourcecol.insert_one(data)

In [11]:

sourcecol.count_documents({})

The newly added document is now replicated to singlestore, increasing the documents count by 1 demonstrating real time sync

In [12]:

s2collection.count_documents({})

This native replication capability from Singlestore makes it easy to setup and run continuous data replication from your MongoDB at no additional cost or infrastructure requirements

Details

Tags

#cdc#mongo#kai

License

This Notebook has been released under the Apache 2.0 open source license.