Technical Deep Dive into SingleStore Streamliner
Engineering

Technical Deep Dive into SingleStore Streamliner

SingleStore Streamliner, an open source tool available on GitHub, is an integrated solution for building real-time data pipelines using Apache Spark. With Streamliner, you can stream data from real-time data sources (e.g. Apache Kafka), perform data transformations within Apache Spark, and ultimately load data into SingleStore for persistence and application serving. Streamliner is great tool for developers and data scientists since little to no code is required – users can instantly build their pipelines. For instance, a non-trivial yet still no-code-required use case is: pulling data in a comma-separated value (CSV) format from a real-time data source; parsing it; then creating and populating a SingleStore table. You can do all this within the Ops web UI, depicted in the image below. As you can see, we have simulated the real-time data source with a “Test” that feeds in static CSV values. You can easily replace that with Kafka or a custom data source. The static data is then loaded into the hr.employees table in SingleStore.
Read Post
How to Deploy SingleStore on the Mesosphere DCOS
Engineering

How to Deploy SingleStore on the Mesosphere DCOS

The Mesosphere Datacenter Operating System (DCOS) is a distributed operating system designed to span all machines in a datacenter. It provides mechanisms for deploying applications across the entire system with a few simple commands. SingleStore is a great fit for deployment on DCOS because of its distributed, memory-optimized design. For example, users can scale computation and storage capacity by simply adding nodes. SingleStore deploys across commodity hardware and cloud, giving users the flexibility to operate with existing infrastructure or build custom hardware solutions. SingleStore and DCOS can optimize and simplify your test and development projects, however it is not a supported configuration and is not recommended for production deployments. In this blog post, we will illustrate an example of how to deploy SingleStore for your development or test environment on a cluster of DCOS-configured machines. Deploying SingleStore on DCOS Users can quickly get started with DCOS by deploying a cluster on Amazon AWS. Mesosphere provides a Mesosphere DCOS template specifically for this purpose, which leverages the AWS CloudFormation infrastructure. Follow the steps on docs.d2iq.com to set up DCOS on AWS. Deploying SingleStore on DCOS is simple with the DCOS command line. Once you have deployed a DCOS cluster and installed the DCOS command-line interface (check out the Mesosphere documentation for more information on this step), simply run the following command on the DCOS command line: `$ dcos package install memsql` At that point, if you check the DCOS web interface, you should see the SingleStore service running:
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

Run Real-Time Applications with Spark and the SingleStore Spark Connector
Product

Run Real-Time Applications with Spark and the SingleStore Spark Connector

Apache Spark is one of the most powerful distributed computing frameworks available today. Its combination of fast, in-memory computing with an architecture that’s easy to understand has made it popular for users working with huge amounts of data. While Spark shines at operating on large datasets, it still requires a solution for data persistence. HDFS is a common choice, but while it integrates well with Spark, its disk-based nature can impact performance in real-time applications (e.g. applications built with the Spark Streaming libraries). Also, Spark does not have a native capability to commit transactions. Making Spark Even Better That’s why SingleStore is releasing the SingleStore Spark connector, which gives users the ability to read and write data between SingleStore and Spark. SingleStore is a natural fit for Spark because it can easily handle the high rate of inserts and reads that Spark often requires, while also having enough space for all of the data that Spark can create.
Read Post
Load Files from Amazon S3 and HDFS with the SingleStore Loader
Engineering

Load Files from Amazon S3 and HDFS with the SingleStore Loader

One of the most common tasks with any database is loading large amounts of data into it from an external data store. Both SingleStore and MySQL provide the LOAD DATA command for this task; this command is very powerful, but by itself, it has a number of restrictions: It can only read from the local filesystem, so loading data from a remote store like Amazon S3 requires first downloading the files you need. Since it can only read from a single file at a time, loading from multiple files requires multiple LOAD DATA commands. If you want to perform this work in parallel, you have to write your own scripts. If you are loading multiple files, it’s up to you to make sure that you’ve deduplicated the files and their contents. Why We Built the SingleStore Loader At SingleStore, we’ve acutely felt all of these limitations. That’s why we developed SingleStore Loader, which solves all of the above problems and more. SingleStore Loader lets you load files from Amazon S3, the Hadoop Distributed File System (HDFS), and the local filesystem. You can specify all of the files you want to load with one command, and SingleStore Loader will take care of deduplicating files, parallelizing the workload, retrying files if they fail to load, and more. Use a load command to load a set of files
Read Post