SingleStoreDB Self-Managed with IBM
IBM provides a one-stop experience for the procurement, use, and management of SingleStoreDB Self-Managed. In addition, users will gain access to IBM’s global deployment capabilities and expert support, which includes client consultancy on individual database technologies and overall solution architectures.
Test Blogs + Resources
View All
Blog
Loading half a billion records in 40 minutes
Disclaimer. This will not be an apples to apples comparison with derwik, since we obviously don’t have the same dataset, and we need a much bigger machine to load everything into memory. But I believe this experiment will get the point across. So without further ado, let’s go through the steps.Adam Derewecki wrote a cool post about his experience loading half a billion records into MySQL. SingleStore is a MySQL-compatible drop-in replacement that is built from the ground up to run really fast in memory.Disclamer: this isn’t an apples-to-apples comparison with derwik since we don’t have his dataset and need a much beefier machine to load everything into memory.SchemaHere is the schema for the table. Note that the table has three indexes. On top of it memsql will automatically generate a primary index under the covers. We won’t be disabling keys.drop table if exists store_sales_fact;create table store_sales_fact( date_key smallint not null, pos_transaction_number integer not null, sales_quantity smallint not null, sales_dollar_amount smallint not null, cost_dollar_amount smallint not null, gross_profit_dollar_amount smallint not null, transaction_type varchar(16) not null, transaction_time time not null, tender_type char(6) not null, product_description varchar(128) not null, sku_number char(12) not null, store_name char(10) not null, store_number smallint not null, store_city varchar(64) not null, store_state char(2) not null, store_region varchar(64) not null, key date_key(date_key), key store_region(store_region), key store_state(store_state));HardwareWe have a very cool machine, with 64 cores and 512 GB of RAM, from Peak Hosting . You can rent one for yourself for a little under two grand a month. They were kind enough to give it to us to use for free. Here is a spec of one core.vendor_id : AuthenticAMDcpu family : 21model : 1model name : AMD Opteron(TM) Processor 6276stepping : 2cpu MHz : 2300.254cache size : 2048 KBYou read that correctly, this machine has sixty-four 2.3 GHz cores and 512 GB of RAM or almost 8 times the largest memory footprint available in the cloud today, all on dedicated hardware with no virtualization overhead or resource contention with other unknown third parties.Loading efficientlyLoading data efficiently is actually not that trivial. The best way of doing it with SingleStore is to use as much CPU as you can get. Here are a few tricks that can be applied.1. Multi-insertsWe can batch inserts into 100 row multi-inserts. This will reduce the number of network roundtrips. Each roundtrip now accounts for 100 rows instead of one. Here is what multi-inserts look like.insert into store_sales_fact values('1','1719','4','280','97','183','purchase','14:09:10','Cash','Brand #6 chicken noodle soup','SKU-#6','Store71','71','Lancaster','CA','West'),('1','1719','4','280','97','183','purchase','14:09:10','Cash','Brand #5 golf clubs','SKU-#5','Store71','71','Lancaster','CA','West'),('1','1719','4','280','97','183','purchase','14:09:10','Cash','Brand #4 brandy','SKU-#4','Store71','71','Lancaster','CA','West'),2. Load in parallel.Our customer has a sample file of 510,593,334 records. We can use the command line mysql client to pipe this file into SingleStore, but this would not leverage all the cores available in the system. So ideally we should spit the file to at least as many chunks as there are CPUs in the system.3. Increase granularitySplitting the file into 64 big chunks will introduce a data skew. The total data load time will be the time the slowest thread loads the data. To address this problem we will split the file into thousands of chunks. And every time a thread frees up we will start loading another chunk. So we split the file into 2000 chunks. 1 -rw-r--r-- 1 35662844 2012-06-06 11:42 data10.sql 2 -rw-r--r-- 1 35651723 2012-06-06 11:42 data11.sql 3 -rw-r--r-- 1 35658433 2012-06-06 11:42 data12.sql 4 -rw-r--r-- 1 35663665 2012-06-06 11:42 data13.sql 5 -rw-r--r-- 1 35667480 2012-06-06 11:42 data14.sql 6 -rw-r--r-- 1 35659549 2012-06-06 11:42 data15.sql 7 -rw-r--r-- 1 35661617 2012-06-06 11:42 data16.sql 8 -rw-r--r-- 1 35650414 2012-06-06 11:42 data17.sql 9 -rw-r--r-- 1 35661625 2012-06-06 11:42 data18.sql10 -rw-r--r-- 1 35667634 2012-06-06 11:42 data19.sql11 -rw-r--r-- 1 35662989 2012-06-06 11:42 data1.sql4. Load script. The load script uses python multiprocessing library to load data efficiently. 1 import re 2 import sys 3 import os 4 import multiprocessing 5 import optparse 6 7 parser = optparse.OptionParser() 8 parser.add_option("-D", "--database", help="database name") 9 parser.add_option("-P", "--port", help="port to connect. use 3306 to connect to memsql and 3307 to connect to mysql", type="int") 10 (options, args) = parser.parse_args() 11 12 if not options.database or not options.port: 13 parser.print_help() 14 exit(1) 15 16 total_files = 2000 17 18 def load_file(filename): 19 try: 20 print "loading from cpu: %d" % os.getpid() 21 query = 'mysql -h 127.0.0.1 -D %s -u root -P %d < %s' % (options.database, options.port, filename) 22 print query 23 os.system(query) 24 print "done loading from cpu: %d" % os.getpid() 25 except e as Exception: 26 print e 27 pass 28 29 os.system('echo "delete from store_sales_fact" | mysql -h 127.0.0.1 -u root -P %d' % options.port) 30 p = multiprocessing.Pool(processes = 2*multiprocessing.cpu_count()) 31 for j in range(0, total_files): 32 p.apply_async(load_file, ['data/data%d.sql' % j]) 33 34 p.close() 35 p.join()5. Running itI started loading by issuing the following command.time python load.py -D test -P 3306After we do this let’s start htop to check the processor saturation. It looks pretty busy.
Read Now

Webinar
Anuvu Speed Tests MariaDB, Snowflake & SingleStoreDB
In this webinar, we’ll discuss Anuvu’s database journey with Senior Infrastructure Engineer Orlando Jimenez. We’ll review the challenges Anuvu faced with their previous architecture including MariaDB, MySQL, Elasticsearch and Snowflake, and the optimizations that led to 20X performance gains with SingleStoreDB.
Watch Now

Solution Brief
Supercharge your SaaS Applications
SingleStore powers fast modern applications for over 100 leading SaaS players and Tier-1 enterprises around the globe.
Read Now

Solution Brief
Data Warehouse Augmentation with SingleStore
SingleStoreDB Self-Managed surpasses the limits of data warehouses to drive up to 20-100x faster time-to-insights, powering low-latency analytics and applications.
Read Now

Solution Brief
SingleStoreDB Self-Managed with IBM
IBM provides a one-stop experience for the procurement, use, and management of SingleStoreDB Self-Managed. In addition, users will gain access to IBM’s global deployment capabilities and expert support, which includes client consultancy on individual database technologies and overall solution architectures.
Read Now