Ready for 2023? Up Your Game With These Updates to Our Laravel and Python Connectors
Engineering

Ready for 2023? Up Your Game With These Updates to Our Laravel and Python Connectors

We’re bringing you all new Laravel and Python connectors in SingleStoreDB. Here’s what’s new — and how you can try them out. Laravel Connector Ready for Production! We shipped the 1.0 release of our native Laravel connector back in June, but haven't stopped improving it since then! In total we have had 12 contributions from six contributors since 1.0. For the uninitiated, Laravel is a PHP framework for building modern web applications. It's used by some amazing companies you may have heard about including Twitch, Disney, The New York Times and Fathom Analytics. Laravel is focused on developer productivity and generally getting out of your way to let you build applications quickly. The official SingleStoreDB Laravel Connector extends Laravel's built-in ORM (Eloquent) to support SingleStoreDB specific features including shard keys and JSON. This allows you to do something like this, which wouldn't work out of the box in Laravel:Schema::create('events', function (Blueprint $table) { $table->string('name')->unique()->shardKey(); $table->json('properties'); $table->datetime('created_at')->sortKey()->seriesTimestamp(); }); The functions are all extensions our connector adds to Eloquent to support the similarly named features in SingleStoreDB. You can install our Laravel connector via composer:composer require singlestoredb/singlestoredb-laravel You can also find the driver and it's documentation on Github. See more: Laravel and SingleStoreDB Quickstart Guide SingleStoreDB for Laravel Masterclass With Jack Ellis
Read Post
[r]evolution Summer 2022: Wasm Space Program
Engineering

[r]evolution Summer 2022: Wasm Space Program

For some time now I have wanted to build a universe simulation. The idea of millions of spaceships each making their own real-time decisions in a massive universe sounds compelling. This idea isn't new — many video games and AI competitions have explored the same topic. But to my knowledge, none of them have tried to run the entire simulation within a database. So, I made this my goal and started building. To complete this project, I would need a unique database. To start with, it needs to support large volumes of transactions and analytics at the same time on the same data. This is known in the database community as "HTAP" (Hybrid transactional/analytical processing). Next, it must be able to handle a highly concurrent read and write workload to support thousands of clients. Finally, the database needs to be extensible with custom logic to run the custom AI powering each spaceship. Luckily for me, SingleStoreDB 7.9 satisfies all of these requirements and more. A bit of elbow grease and some long nights later, I put the finishing touches on what I call the Wasm Space Program. Simulation Walkthrough
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

Databases and DevOps: How to Use SingleStore With GitHub Actions
Engineering

Databases and DevOps: How to Use SingleStore With GitHub Actions

It's very important to test your application with the same database that you run in production. In this blog post, we will explain how to set up SingleStoreDB to run on your machine and within your Github Actions workflows. This post is for developers that want to learn best practices for integrating databases into your development and deployment process. We will also walk through how to set up SingleStoreDB to run within your GitHub Actions workflows. What Are Technical Best Practices for Databases and DevOps? First of all, let’s cover best practices for databases and DevOps: Test! This is a no brainer. You should be testing your database schema and queries every time you make changes. You need to make sure the components that house your data, will not compromise or lose any data by not testing your application thoroughly. I have seen databases neglected when it comes to testing — and often, it comes down to the job of a single developer who manually tests and deploys each build. It doesn’t need to be like that… Test! Developers need a way to easily create local databases Right off the bat, it needs to be easy for everyone on the team to set up databases either locally, in a cloud sandbox environment or both! Here’s where containers come to the rescue. Containers are a good way to practice, they’re easy and cheap to set up, and most importantly, if something goes wrong you can throw everything out and start over again. Your team needs to easily develop in a non-shared environment to ensure everything is working correctly. The database schema — including all indexes — needs to be in source control If developers need to create local builds of the database, that also means that all components that shape the database or control how it performs business logic need to be maintained using source control. Maintaining these changes can be simplified by making sure all changes are performed using migrations. Practice in a production-like environment Everyone on the team should be able to develop and test out database code in a production-like database environment before pushing out changes. Trust me, you would rather have one of your developers topple over a staging environment than the production environment. This environment should also be simple to take down, and set up again. You need to test a change before applying it to a production environment. If the table data is huge — so huge that it would be costly to replicate it in a different environment from production — make sure you can at least simulate the change with a significant set of data. This will help ensure the change won’t take forever, and you won’t be blocking a table for a long period of time. Be sure to monitor database systems for performance and compliance Did you know you can automate this? Like any good CI/CD pipeline, all the important business logic should be thoroughly tested and automated. This ensures that any changes you make to your database environment won’t break the build, your user's trust or the law. Be sure that you are taking into account regional differences and regulatory requirements. Microservices are a good way to decouple the database The only way other microservices interact with the data is by using the exposed methods from the service, rather than going directly to the database — even if it’s possible and “easier” to do it that way. This can make testing and deploying dangerous database changes (like alters) easier, since only one service needs to be directly tested against the database. Let’s get into the code Now that we've discussed why you should integrate databases into your development and deployment process, let's talk about how to do it with SingleStoreDB. Running SingleStoreDB locally SingleStoreDB can be run locally using Docker or any other docker-compatible container platform. For local development, we recommend using the singlestoredb-dev-image which you can find on github here. To use this image, just follow these steps: Get a license key for SingleStoreDB. You can sign up for a license here and then get your license key from Customer Portal.Define environment variables for your license key and the password you want to use to login to SingleStore SINGLESTORE_LICENSE="YOUR LICENSE KEY" SINGLESTORE_PASSWORD="PASSWORD" 3. Run the Docker container docker run \ -d --name singlestoredb-dev \ -e SINGLESTORE_LICENSE="${SINGLESTORE_LICENSE} " \ -e ROOT_PASSWORD="${SINGLESTORE_PASSWORD} " \ --platform linux/amd64 \ -p 3306:3306 -p 8080:8080 -p 9000:9000 \ ghcr.io/singlestore-labs/singlestoredb-dev:latest Assuming nothing went wrong, SingleStoreDB will now be running on your machine. You can check the status of the container by running: docker ps --filter name=singlestoredb-dev You can check the logs of the container by running: docker logs singlestoredb-dev To learn how to query SingleStoreDB and access the included SingleStore Studio UI, please read the documentation here. Running SingleStoreDB in Github Actions After setting up SingleStoreDB locally and getting it working with your application, it's time to also get it running in Github Actions to ensure that you are building, testing and deploying with the same database technology throughout. Using the same Docker image we used above, we can easily run it in Github Actions via these steps: Get a license key for SingleStoreDB. You can sign up for a license here and get your license key from Customer Portal. If you have a license key for local development, you can re-use that license key here.Next, you need to create an encrypted secret in Github Actions called SINGLESTORE_LICENSE which contains your license key as the value.Finally, you simply need to add a service container definition to your Github Actions workflow. As an example, here is a simple Github Actions workflow which runs SingleStore and then runs a simple query against it (select 1): name: my-workflow on: push jobs: my-job: runs-on: ubuntu-latest needs: build-image services: singlestoredb: image: ghcr.io/singlestore-labs/singlestoredb-dev ports: - 3306:3306 - 8080:8080 - 9000:9000 env: ROOT_PASSWORD: test SINGLESTORE_LICENSE: ${{secrets.SINGLESTORE_LICENSE} } steps: - name: sanity check using mysql client run: | mysql -u root -ptest -e "SELECT 1" -h 127.0.0.1 And that is it! Now you can relax and commit your code and migrations as your CI/CD workflows are carried out automatically. Resources: SingleStoreDB Dev ImageQuickstart for GitHub Actions
Read Post
An Engineer's Guide to Building a Database for Data-Intensive Applications
Engineering

An Engineer's Guide to Building a Database for Data-Intensive Applications

As a developer you spend a lot of time making choices. Do you use React or Vue for your web app? Will your algorithm run faster if you use a hash table? Developers are acutely aware that these kinds of decisions are often not obvious or intuitive. In the case of React vs Vue, you might spend some time reading docs or looking at open source examples to get a feeling for each project. When deciding between data structures you will probably start muttering about "Big O" to the confusion of non-programmers around you.What do these choices all have in common? They depend on another set of tradeoffs made by the programmers at the next layer down the stack. The application developer depends on the framework engineer, the framework engineer on language designers, language designers on systems programmers, systems programmers on CPU architects... it's turtles all the way down. (unless you are an electrical engineer, although perhaps silicon miners might even argue that point)One of the decisions that some developers have to make is which database to use. I don't envy someone in this position - you have hundreds of options and a lot of FUD (fear, uncertainty, and doubt) to dig through. It's no wonder that many of my friends simply pick what they have used before. What's that old idiom…? "Better the devil you know than the devil you don't".This blog post is for those of you who have to choose a database. I'm not here to convince you that SingleStore is the best database, I am going to explain some of the key trade offs we had to make, and what we ultimately decided to do. Enjoy!TLDR; Here is a quick and extremely concise summary for people who are already familiar with these problem spaces:Design DecisionSingleStore's choicesHorizontal vs vertical scalabilityhorizontal scalability by partitioning datahash(keys...) % num_partitionsColumn-oriented vs row-oriented storageuniversal storage: column-oriented LSM tree with OLTP optimizationsoltp storage: row-oriented in memory storage enginePhysical storage choicescolumn-oriented storage; hybrid LSM treerow-oriented storage: in-memory, lock-free skip listHow to protect from data loss?replication: high-speed synchronous commitunlimited storage: data can be tiered to blob storageincremental backup and restoreHow to make queries go fast?llvm-based query acceleratorautomatic statistics collection for query planninginterpreted execution during compliationhot swapped to compiled plan during executionvectorized SIMD executionTradeoff #1 - Horizontal vs vertical scalabilityAs applications require more data, it's getting harder to fit everything into single-server databases. SingleStore's goal is to support data-intensive applications which combine a continuously growing data footprint with the need to support many different kinds of workloads. This ultimately requires the ability to scale the database out into a cluster of servers which drastically increases the available CPU, RAM, and Disk at the expense of coordination, network overhead, and classic distributed systems problems.Historically, some large data-intensive companies have tried to solve this problem by manually splitting their data across many single-server databases such as MySQL or Postgres. For example, putting each user along with all of their content on a single MySQL server.  Assuming that an individual user's content never grows too large, this strategy works well for a portion of their transactional workload. But what happens when you want to ask a question about all of the users, or look up the friends of a particular user? Eventually, companies with this architecture are forced to build custom "query proxies" which know how to split up and aggregate the result of these questions from multiple single-server databases. You can read more about the complexities of this architecture in this great post by Michael Rys.SingleStore has been designed to handle data-intensive applications from the very beginning. At the surface level, we look a lot like the final result of the architecture outlined above: a bunch of individual servers each storing a portion of the data and an intelligent query proxy that knows how to split up and aggregate the results to answer questions. But with anything this complex, the devil is in the details. Since SingleStore clusters natively know how to work together as a whole, we are able to move data during query execution in a highly optimal manner. As an example, here is how a distributed join might be executed in SingleStore.
Read Post
Scaling Worldwide Parcel Logistics with SingleStore and Vectorized
Engineering

Scaling Worldwide Parcel Logistics with SingleStore and Vectorized

Learn how SingleStore and Redpanda can work together to solve the operational complexity of global logistics. In this blog post we present a reference architecture using SingleStore’s relational database and Redpanda’s streaming platform in combination to scale worldwide parcel shipping to never-before-seen volumes. Scaling worldwide parcel logistics with SingleStore and Vectorized Today, let's talk about how SingleStore and Redpanda can work together to solve the operational complexity of global logistics while handling 100x the number of packages delivered annually. SingleStore is a scale-out relational database built for data-intensive workloads. Redpanda is a Kafka API compatible streaming platform for mission-critical workloads created by the team at Vectorized. In this blog post we present a reference architecture using these two systems to scale worldwide parcel shipping to never-before-seen volumes. Conveniently, logistics simulation has always been on my bucket list of things to build, so when the opportunity arose to create something with Redpanda and SingleStore it was an easy choice to make. Before we get into the reference architecture, let's talk about some of the challenges logistics companies face every day. On October 12th, 2020, the Pitney Bowes Parcel Shipping Index reported that 103 billion packages were delivered in 2019 alone. That's an average of over 282 million packages delivered each day or 3200 packages delivered per second and we aren't slowing down! The same report forecasts those numbers will increase by 2-3x by 2026. These numbers get even more extreme when you consider peak rather than volume. For example, Amazon just announced that during the 48 hours of Prime Day 2021 over 250 million items were purchased. That's a peak rate of up to 5 million packages shipped per hour (1300 packages per second) which is half of the global average. It's clear from these statistics that we need logistics systems which can scale to handle unprecedented peak activity in order to handle the growing demand for e-commerce worldwide. Scale isn't the only challenge in architecting a system to handle global logistics, complexity is another problem. As an example, let's consider one possible package life cycle.
Read Post
Enabling Coronavirus Research with SingleStore and SafeGraph
Case Studies

Enabling Coronavirus Research with SingleStore and SafeGraph

As part of the #singlestore4good initiative, hosted at SingleStore.org, SingleStore is excited to contribute to the SafeGraph COVID-19 Data Consortium. The Consortium provides free access to relevant datasets to researchers, non-profits, and governments. SingleStore has created a repository to help existing customers, and those who wish to get started with SingleStore for free, to use our software in processing coronavirus-related data. SafeGraph is a company that offers point of interest (POI), business listing, and foot traffic data. They have started the COVID-19 Data Consortium to enable access to free data for responses to the worldwide coronavirus crisis. SingleStore joins more than 1,000 organizations contributing to data consortium, including the US Centers for Disease Control, the California Governor’s Office, and Johns Hopkins Hospital. It’s easy to get started using SafeGraph’s COVID-19 datasets today and to gain the benefits of leveraging speed, scalability, and SQL with SingleStore, for free. As CDC director Robert Redfield told the US Senate health committee, “There are a number of counties that are still doing this pen and pencil.” At SingleStore, we  encourage data-led approaches to the coronavirus crisis.
Read Post
Introducing the SingleStore Kubernetes Operator
Product

Introducing the SingleStore Kubernetes Operator

Kubernetes has taken the world by storm, transforming how applications are developed, deployed, and maintained. For a time, managing stateful services with Kubernetes was difficult, but that has changed dramatically with recent innovations in the community. Building on that work, SingleStore is pleased to announce the availability of our SingleStore Kubernetes Operator, and our certification by Red Hat to run on the popular OpenShift container management platform. Kubernetes has quickly become one of the top three most-loved platforms by developers. Now, with the SingleStore Kubernetes Operator, technology professionals have an easy way to deploy and manage an enterprise-grade operational database with just a few commands. Note: The SingleStore Kubernetes operator is currently experimental, and in beta. It will reach general availability in the coming months. The new beta Operator is certified by Red Hat to run SingleStore software on Red Hat OpenShift, or you can run it with any Kubernetes distribution you choose. Running SingleStore on Kubernetes gives data professionals the highest level of deployment flexibility across hybrid, multi-cloud, or on-premises environments. As Julio Tapia, director of the Cloud Platforms Partners Ecosystem for Red Hat, put it in our press release, services in a Kubernetes-native infrastructure “‘just work’ across any cloud where Kubernetes runs.” As a cloud-native database, SingleStore is a natural fit for Kubernetes. SingleStore is a fully distributed database, deploys and scales instantly, and is configured quickly and easily using the native SingleStore API. SingleStore customers have requested the Kubernetes Operator, and several participated in testing prior to this release. The majority of SingleStore customers today deploy SingleStore on one or more public cloud providers. Now, with the Kubernetes Operator, they can deploy on any public or private infrastructure more easily.
Read Post
Psyduck: The SingleStore Journey to Containers
Engineering

Psyduck: The SingleStore Journey to Containers

One of the main themes at DockerCon 2017 was the challenge of migrating legacy applications to containers. At SingleStore, we’re early adopters. We are already into our third year of running Docker at scale in production for our distributed software testing regime, where the performance, isolation, and cost benefits of containers are very attractive. The Challenge Before I take you through our journey to containers, let me start by outlining some of the general challenges of testing a distributed database like SingleStore. Our solutions are built for real-time data warehousing. Databases are really difficult to test — especially when they are designed to be distributed, real-time, and scalable. At SingleStore, we have millions of unique queries to test, highly variable runtimes, and some tests take hours of 100 percent CPU usage. We have over 10,000 unique tests, not to mention the number of test transformations, which may multiply that number by one hundred again. Our tests also require gigabytes of RAM and multiple cores. That’s the kind of scale you have to think about for testing a platform like SingleStore. We also ran into some interesting new testing challenges as our product can take advantage of Intel’s unique AVX technology and vectorization — to run orders of magnitude more SQL queries and speed up the sequences per cycle. These modern architectures bring awesome advantages, but they can also add to testing scenarios. We started with off-the-shelf, but once you see all the things we’re doing, you can’t just throw it onto a common test platform. Commercial testing solutions are not designed for distributed applications. We paid our dues with a lot of experimentation and DIY. Creating Psyduck We started our first test cluster about five years ago, when I first arrived at SingleStore. We named it Psyduck after the Pokémon character. Like testing, Psyduck always has a headache.  For me, it was a really fun effort to be a part of the test program because it’s key to how we continue to evolve SingleStore while maintaining its reliability. Initially we had a mixture of home grown, bare metal boxes in the office. There were 25 boxes that were just basically Dell desktop machines, manually managed. Additionally, we had a bare-bones Amazon EC2 presence — for bursty job requirements. That was our initial scaling strategy, manually managed VMs to take on additional load. From there we looked at operationalizing the whole stack. First we invested in operationalizing VMs on bare metal. We took the 25 machines, cleared them, and built an OpenStack cluster. We scaled that up to about 60 machines on-premises. This allowed us to eliminate EC2, which saved us a huge amount of money monthly. But as we scaled that cluster we experienced a lot of pain and complexity on OpenStack. So we took a portion of the cluster and ran Eucalyptus instead. That ended up being interesting, but not very mature compared to OpenStack, and we were a little burned out on VMs at that point with the infrastructure. We learned about Docker about three and a half years ago when Docker was still called DotCloud. We tested it out and prototyped what Psyduck could look like with containers. Containers matched our use case really well — software testing is basically a bunch of short lived, ephemeral jobs, and you need a way to run hundreds of thousands of tests per day, in an isolated environment. Docker gave us the ability to do that and spin up on the order of seconds (rather than the overhead of VMs in minutes). Basically we saw that Docker would give us a way to scale Psyduck in an operationally friendly way. From there, we took the plunge and over a couple of weeks we rebuilt Psyduck from the ground up using Docker. And we’ve been running on that base architecture ever since. For the last three years, we’ve been running with Docker and we built a home grown scheduling system, because at that time Kubernetes and Apache Mesos didn’t exist. Creating Pokedex We also wrote our own key value storage system that we call Pokedex — think bare bones S3, bare bones HDFS. We actually took the on-premises version of the Docker registry, and wrote an adapter for Pokedex to provide the parallelism we required during image distribution. We have over 150 test machines, physical Dell servers each running many containers. We have to deliver 5 GB of data to every machine per test run, so there’s tons of data to send around all at once. So Pokedex runs on a number of machines and takes advantage of leading performance optimizations for delivering files at scale. Pokedex backing the registry allowed us to deliver Docker images in the order of minutes, whereas with the old VM architecture we had to build out an arcane architecture based on torrets and other crazy technologies to deliver large VM image files. We’re also running appliances from a startup called Diamanti to help run the Psyduck control plane. Today, Diamanti makes use of Kubernetes, and over time we plan to expand the use of Kubernetes across our entire cluster. We expect our use of Kubernetes to be ideal for orchestrating our container environment compared to our initial homegrown scheduling. What We Learned We’re very happy with the outcome of this journey. The container abstraction is solid for performance, isolation, ephemerality — everything that matters in software unit testing. We don’t care about persistence: the test runners come up as containers, do a bunch of work, write out to a distributed file system, and disappear. This abstraction allows us to not worry about what happens to containers, only that they complete their job and then go away. On the down side, it’s really the DIY nature of tooling today for a lot of container production scenarios. In the early days, we hit issues with isolation across networking and storage. Over time, those things have become better and we have learned how to deal with Docker to make sure we eliminate noisy neighbors on the machines. A fundamental challenge that remains with containers is related to the Linux namespace abstraction — if you run a container as privileged, that container can do things on a machine that applies to every other container. We want to be able to give our engineers the ability to do what they want with our machines, but with containers we have to remove some of that functionality. With VMs they could do anything — detach or add network devices. With VMs the test is completely isolated and we have very strong controls. With Docker, it’s the opposite — you have to tell engineers what they can and can’t do. Linux namespaces are gaining better semantics over time, but it’s still a challenge. It’s also tricky debugging and testing software running in containers. Due to PID mapping normal Linux tools like perf or GDB don’t know how to properly work with the containerized software. This results in complex or impossible debugging scenarios that can make engineers very frustrated. In the end, Psyduck helped us alleviate most of the headaches around testing software at our scale. Running containers on bare metal, we can afford to put our own products through the most demanding testing regime in our industry. After all, we’re the real-time data warehouse for Akamai, Kellogg, Pinterest, Uber, and other leading enterprises. Our customers run some of the largest distributed systems in the world. They know they can rely on us to keep their own customers happy. For more technical information, see our previous blog on Psyduck, or learn more from Matt Asay and this YouTube video.
Read Post
Run SingleStore in Minutes with Docker
Engineering

Run SingleStore in Minutes with Docker

Evaluating software infrastructure is important, but it should not be difficult. You should be able to try and see quickly whether a piece of core software suits your needs. This is one of the many helpful use cases for Docker. Of course Docker has many more uses, including helping run a 107 node cluster with CoreOS, but this post focuses on the quick start scenario. With an install of boot2docker.io for Mac or Windows, and a pre-configured ‘cluster-in-a-box’ Docker container, you can be on your way to interacting with a distributed system like SingleStore in a few minutes. If you are ready to jump in, head to our Quick Start with Docker documentation. In a nutshell, we have built a ‘quickstart’ container that comes installed with SingleStore Ops for management, a single-node SingleStore cluster, and some sample programs referenced in tutorials. Obviously, this is not the configuration to test drive maximum performance. If that were the case, you would want to take advantage of the distributed architecture of SingleStore across several nodes. But it is a great way to get a sense of working with SingleStore, connecting with a MySQL client, and experiencing the database first hand. If you already have Docker installed, you can jump right in with a few simple commands. Spin up a cluster $ docker pull memsql/quickstart $ docker run --rm --net=host memsql/quickstart check-system $ docker -d -p 3306:3306 -p 9000:9000 --name=memsql memsql/quickstart At this point you can create a database, and interact with the ‘cluster-in-a-box’ install of SingleStore. For example you can run a quick benchmark against SingleStore with the following Docker command `$ docker run --rm -it --link=memsql:memsql memsql/quickstart simple-benchmark` For more information on working with SingleStore and the ‘cluster-in-a-box’ Docker container, visit our documentation at docs.singlestore.com/latest/setup/docker. We’ve also made the Dockerfile available on Github at github.com/memsql/memsql-docker-quickstart. And if you would like to try SingleStore in full, please visit singlestore.com/free. There we have a free unlimited scale and capacity Community Edition and a free 30-day Enterprise Trial.
Read Post
Running SingleStore’s 107 Node Test Infrastructure on CoreOS
Engineering

Running SingleStore’s 107 Node Test Infrastructure on CoreOS

At a recent CoreOS meetup, I gave a presentation on the design and implementation of Psyduck, which is the name for SingleStore’s test infrastructure.  In this talk, I’ll explain how SingleStore runs over 60,000 tests every day on our 107 machine cluster running CoreOS.
Read Post