How to Migrate From PostgreSQL to SingleStoreDB

ST

Sarung Tripathi

Global Head of Pre-Sales

How to Migrate From PostgreSQL to SingleStoreDB

The global Postgres community boasts tens of thousands of users and several thousand enterprises. That must say something good about the technology, right? 

Postgres has become very attractive to developers, thanks to benefits like its strong support for transactions, ability to handle JSON natively and ecosystem extensibility. Moreover, the database is widely adopted — given that it is open source and available as an option on popular services like AWS RDS. SaaS applications like Instagram, FlightAware and Reddit even use Postgres. 

Postgres developers looooooooove Postgres, so much so that it’s the #4 database on TrustRadius and on the DB-Engines ranking!

so-whats-the-catchSo, What’s the Catch?  

The data-intensive world we live in today has introduced new, far more complex customer demands than ever before. SaaS providers have gone from simply providing a service to providing data as a service, even if it wasn’t originally part of their business plan. For example, take Strava:  once an app to help athletes track their runs and bike rides, Strava now has a value-add analytics product to compare today’s workout to historical ones (and they can charge for it, too!). 

As these demands for in-app analytics continue, developers managing these services have become hard-pressed to scale Postgres (and other databases, like MySQL). Analytics over trillions of records become quite slow on single-node systems that are not built for large scale aggregations, window functions and filters with tight SLAs. 

Several providers saw an opportunity to scale Postgres to be a distributed SQL database  — like CitusDB. Users could then take advantage of distributing their database across several machines and be able to partition their data cleanly. It turns out that this approach works fine if all you need is more compute for transactional queries, but the band-aid still falls off when it comes to the complex analytics. Not to mention, there have been several anecdotal reports of reliability issues with distributed Postgres.

the-good-news-theres-another-wayThe Good News? There’s Another Way

As discussed so far, the demands of data-intensive applications are centered around support for transactions and analytics. SingleStoreDB is the only database that supports both transactions and analytics, while still maintaining all of the great features of open source databases like Postgres! SingleStoreDB is a multi-model, real-time distributed SQL database with strong JSON support, ecosystem integration through our MySQL wire compatibility and robust support for analytics with our patented Universal Storage — and SingleStoreDB is the #1 database on TrustRadius.

Many organizations have migrated their applications from Postgres to SingleStoreDB. The majority of these migrations were completed in weeks — some in just days. Here’s a great example:

Foodics is a leader in the restaurant and point-of-sale industry. One huge differentiator for Foodics’ business is their ability to provide advanced analytics on inventory, menus and overall restaurant operations. As Foodics added more analytics features to their offering, they experienced database-level challenges including: 

  • Ongoing service instability 
  • Constant re-balancing of data 
  • Low concurrency that only supported 200 users 

Foodics came to SingleStore looking to improve their analytics performance on transactional data, and engineers from both teams collaborated on a two-week sprint to change their database destiny. Some of the tests included loading 10 billion rows, using complex queries with wide date ranges and leveraging SingleStoreDB’s dbbench to simulate large concurrency loads. Data loads via S3 were seamless thanks to SingleStoreDB Pipelines. After experimenting with a few different shard keys and sort keys, Foodics saw fantastic results:

  • A performant analytics engine (with Columnstore) to democratize data access
  • High concurrency to support a large number of reports being generated simultaneously 
  • A fully managed database with low TCO that freed up engineering teams 

Watch the webinar: Supercharging SaaS Applications: The Foodics Story 

so-how-do-i-do-itSo How Do I Do It?

Bulk data load 

In this example, we’ll use a table we’ve created in AWS Aurora MySQL and migrate it to SingleStoreDB. 

AWS RDS Postgres Table: 

CREATE TABLE `scan` (
`scan_id` bigint(20) NOT NULL,
`package_id` bigint(20) NOT NULL,
`loc_id` int(11) DEFAULT NULL,
`Loc_name` char(5) DEFAULT NULL,
PRIMARY KEY (`package_id`,`scan_id`)
) ;

Scan table in Postgres:

select count(*) from scan;
7340032

Simple export of data as a CSV:

SELECT * from scan
INTO OUTFILE s3 's3://data-bucket-pb/Jan13/scan.csv'
FORMAT CSV
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';

Create a database and a table. Note the addition of a SHARD KEY and a COLUMNSTORE KEY in the SingleStore DDL. These will enable optimal distribution and organization of data to ensure lightning fast queries. SingleStoreDB Documentation offers advice on how to select these keys for your tables: 

create database mem_0113;
use mem_0113;

create table scan (
scan_id BIGINT NOT NULL,
package_id BIGINT NOT NULL,
loc_id INT,
loc_name CHAR(5),
KEY (scan_id) USING CLUSTERED COLUMNSTORE,
SHARD(package_id) );

Create SingleStore Pipeline to get data from S3. This is a super simple way to get data from several external sources:

CREATE PIPELINE pipe1_scan
AS LOAD DATA S3 'data-bucket-pb/Jan13/scan.csv.part_00000'
CONFIG '{REDACTED} '
CREDENTIALS '{REDACTED} '
INTO TABLE mem_0113.scan
FORMAT CSV FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
(scan_id,package_id,loc_id,loc_name);

Start SingleStore Pipeline:

start PIPELINE pipe1_scan

Check table for records, and we have the same # of rows as we did in Aurora:

select count(*) from scan; --7340032

single-store-db-replicate-toolSingleStoreDB Replicate Tool

SingleStoreDB offers lightweight migration tooling for your bulk data load needs in initial migrations. This can also be used for incremental CDC after the initial load of data. These two features allow users to test out their workload on SingleStoreDB, and then have a zero-downtime cutover when moving to production. Let’s look at another example of a table in RDS Postgres, which covers the bulk data load:

AWS RDS Postgres Table:

CREATE TABLE `scan` (
`scan_id` bigint(20) NOT NULL,
`package_id` bigint(20) NOT NULL,
`loc_id` int(11) DEFAULT NULL,
`Loc_name` char(5) DEFAULT NULL,
PRIMARY KEY (`package_id`,`scan_id`)
) ;

select count(*) from scan;
7340032

The scan table includes 7.3 million records.

Configuration File: 

To configure the connectivity between RDS Postgres and SingleStoreDB, we simply populate two configuration files pointing to the respective databases. Below is the example of the SingleStoreDB config file (yaml):

type: POSTGRESQL

host: demo.cynodgz9a7ys.us-east-1.rds.amazonaws.com
port: 5432

database: my_pg_db
username: <redacted>
password: <redacted>

max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000

replication-slots:
io_replicate:
- wal2json
io_replicate1:
- wal2json

-----------------------------------------------------------

type: SINGLESTORE

host: svc-1732741a-f499-467c-a722-9887d73150c1-ddl.aws-virginia-2.svc.singlestore.com
port: 3306

username: <redacted>
password: <redacted>

#credential-store:
# type: PKCS12
# path: #Location of key-store
# key-prefix: "memsql_"
# password: #If password to key-store is not provided then default password will be used

max-connections: 30

max-retries: 10
retry-wait-duration-ms: 1000

Execute Replicate Command:

Now, we execute the REPLICATE command based on the configuration file previously populated.

./bin/replicant snapshot conf/conn/postgres.yaml conf/conn/singlestore.yaml

Verify that databases and tables exist in SingleStoreDB:

select count(*) from scan; --7340032

summarySummary

As you can see, there are a few different ways to easily migrate your existing Postgres database to SingleStoreDB. This simple migration will elevate your database from single-node and slow, to distributed and lightning fast. Ingest data more easily, make your queries faster and improve support for concurrency with one of these migration options today. 

Singlestore Helios offers $600 in free credits to get started with just a few clicks. The Singlestore Helios Engineering Team that contributed to this blog is always standing by to assist you with your MySQL migration, or any subsequent questions you may have about the platform. 

Schedule your time to chat with SingleStore engineers today.


Share