Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables
NoSQL has raised a big interest in database trends in the past 10 years, commonly referenced for scalable databases and pure OLTP speed for lookup queries. But when it comes to running analytics queries — or more complex OLTP queries — NoSQL starts to fail.Why? Because NoSQL queries are limited to key value queries, which are very fast but sometimes require you to add an additional layer of computation on the application side to achieve expected results — where with SQL, you can simply query the result you want. Today, with the rise of distributed SQL databases like SingleStore, it’s easier to handle scalability issues you might encounter with legacy NoSQL databases.In this blog, we will go through best practices to move from a NoSQL database to SingleStoreDB — including how to quickly import JSON data into SQL tables. But first, let’s see more on SingleStoreDB. What Is SingleStoreDB?SingleStore is a distributed SQL database that handles both analytical (OLAP) and transactional (OLTP) workloads in the same table type. SingleStoreDB provides fast ingestion and high query speed for complex OLTP and OLAP queries. It provides a robust, scalable solution that is levels above what other legacy single node databases can do. There is also a managed service that can be deployed on AWS, GCP or Microsoft Azure.Moving Past NoSQL LimitationsNoSQL databases are more scalable than legacy SQL databases, handling hundreds of millions of transactions in a high concurrency environment for pure OLTP queries. But today, as data rules the world, you need the best insights from your database via analytics dashboards, or complex OLTP queries. Unfortunately, these insights can not be obtained properly by a NoSQL database, so users often have to add a new layer to handle analytics with a data warehouse — then also add another OLTP SQL database to handle some more complex SQL queries.All of that will result in data movement, ETL processes and database sprawl which leads to high latency and poor user experiences. What we’ve seen is a progression from a SQL to NoSQL era, leading us to where SingleStore is presently — the NewSQL era.Mapping NoSQL JSON to SingleStore TablesSingleStoreDB provides excellent support for JSON, especially since our latest product improvements in version 8.0 Even better, you can ingest data directly from JSON files stored in cold storage — with high ingest speeds being one of the strongest capabilities in SingleStoreDB.To ingest data, we’ll use SingleStore Pipelines.Let’s move some data from a NoSQL database to a AWS S3 bucket, ingesting it into SingleStoreDB Here is basic .json data stored on a bucket called nosql migration :{ "id": "1", "name": "John Smith", "job": "Director" , "Address": { "Street": "Anger street", "StreetNum": "32 bis", "City": "London", "Country": "United Kingdom" }}And here is the corresponding table in SingleStoreDB :CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,Shard key (id));We can also provide a better table definition to have all address information directly in that table, and ingest this information later :CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,street varchar(32),streetnum varchar(32),city varchar(32),country varchar(32),Shard key (id));Now, as we define the equivalent table in SingleStoreDB, let’s ingest it via a Pipeline:CREATE PIPELINE pipeline_migration_nosql ASLOAD DATA S3 'nosqlmigration'CONFIG '{"region": "eu-west-1"}'CREDENTIALS '{"aws_access_key_id": "aws_access_key_id", "aws_secret_access_key": "your_aws_secret_access_key", "aws_session_token": "your_aws_session_token"}'INTO TABLE employee(id <- id,name <-name,job <- job,gender <- gender,address <- address,street <- address::street,streetnum <- address::streetnum,city <- address::city,country <- address::country)FORMAT JSON;We want to extract the JSON object address and ingest it directly into table fields. That way, we can easily use these fields to run more advanced queries. Now, let’s see what we can do if we have a more nested JSON with an array. It’s pretty common for some NoSQL databases to have a collection of items with one array as field type.Handling a JSON ArrayAn array in JSON is a list of keys and values. There are multiple options to import it properly into SingleStoreDB tables. The best options depend on which type of operation (aggregation, lookup select, etc.) and how often you want to access these items in an array.Let’s use this nested JSON as an example :{ "id": "1", "name": "John Smith", "job": "Director" , "address": { "street": "Anger street", "streetnum": "32 bis", "city": "London", "country": "United Kingdom" }, "experience": [ { "role": "Lead Engineer", "company": "Json", "yoe": 3 },{ "role": "Senior Engineer", "company": "Avro", "yoe": 3 },{ "role": "Junior Engineer", "company": "Parquet", "yoe": 4 } ]}Option 1: Import it as a field JSONThis option is very performant (and even better now with our 8.0 release!) if you want to complete simpler operations, like lookups or aggregations.The idea is to store the array field into a JSON type, directly using the JSON field in your query. The drawback of this method is that you lose the interesting table structure of a SQL database.Here is an example of table definition and query to access the specific value you are looking for:CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,street varchar(32),streetnum varchar(32),city varchar(32),country varchar(32),experience JSON,shard key (id));Query 1 : Find previous experiences of employee 1SELECT emp.name,emp.job, exp.table_col::$role as 'role',exp.table_col::$company as 'company'FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) expWHERE emp.id=1;Query 2 : Aggregate the total years of experience for employee 1SELECT emp.name, SUM(exp.table_col::yoe) as 'Total YoE'FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) expWHERE emp.id=1;Option 2: Create a table to represent a JSON ArrayThe second option is to fully use the relational database SQL to build tables that represent these arrays. Here is the second table that will represent the array (employee experience):CREATE TABLE experience (id_employee int,role varchar(32),company varchar(32),yoe int,Shard key (id_employee));This table structure makes your data more readable and usable for your application or tool. You will need to re-write the pipeline that ingests the JSON file to interpret this array, and insert into the corresponding tables. To make it work, we will write a stored procedure that will insert into two tables, redirecting the ingestion from the pipeline into this procedure.Here is an example of a stored procedure that will insert into these two tables:-- Stored Procedure for ingesting json array in multiple tableDELIMITER //CREATE OR REPLACE PROCEDURE employee_proc(batch QUERY(idvarchar(32),name varchar(32),job varchar(32),gender varchar(32),address JSON,street varchar(32),streetnum varchar(32),cityvarchar(32),country varchar(32),experience JSON))ASDECLARE json_array ARRAY(json); e json;BEGINFOR batch_record IN COLLECT(batch) LOOP BEGIN INSERT INTO employee(id, name,job,gender,address,street,streetnum,city,country) VALUES(batch_record.id,batch_record.name,batch_record.job,batch_record.gender,batch_record.address,batch_record.address::$street,batch_record.address::$streetnum,batch_record.address::$city,batch_record.address::$country); json_array = JSON_TO_ARRAY(batch_record.experience); FOR i IN 0 .. LENGTH(json_array) - 1 LOOP e = json_array[i]; INSERT INTO experience(id_employee,role,company,yoe) VALUES(batch_record.id,e::$role,e::$company,e::yoe); END LOOP; END; END LOOP; END //DELIMITER ;And here is the new pipeline SQL statement:-- Pipeline for ingesting json into a Stored procedureCREATE PIPELINE pipeline_migration_nosql ASLOAD DATA S3 'nosqlmigration'CONFIG '{"region": "eu-west-1"}'CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id", "aws_secret_access_key": "your_aws_secret_access_key", "aws_session_token": "your_aws_session_token"}'INTO PROCEDURE employee_procFORMAT JSON;Query 1: Find experiences of employee 1SELECT emp.name,emp.job, exp.role , exp.companyFROM employee emp JOIN experience exp ON emp.id = exp.id_employeeWHERE emp.id=1;Query 2: Aggregate the total number of years experience for employee 1SELECT emp.name,SUM(exp.yoe) as 'Total YoE'FROM employee emp JOIN experience exp ON emp.id = exp.id_employeeWHERE emp.id=1;Performance Comparison Between Both OptionsBoth options can achieve very fast performance in SingleStore — but they came with some drawbacks. Option one is good if the query is light on the JSON array. Option two makes your data schema more readable, and offers more possibilities in terms of queries shapes and computation. But option two will have an impact on ingest speed and making joins between two big tables can be costly in terms of CPU. Here is a benchmark chart that shows performance for both options for the two queries described above. The experience table has an average of three times the number of rows of the employee table.