“SingleStore’s Columnstore Blows All of the Free and Open Source Solutions Out of the Water” — Actual User

JL

Jacky Liang

Product Manager

“SingleStore’s Columnstore Blows All of the Free and Open Source Solutions Out of the Water” — Actual User

A columnstore database takes all the values in a given column – the zip code column in a customer database, for instance – and stores all the zip code values in a single row, with the column number as the first entry. So the start of a columnstore database’s ZIP code record might look like this: 5, 94063, 20474, 38654… The “5” at the beginning means that the ZIP code data is stored in the fifth column in the rowstore database of customer names and addresses that the original data comes from.

Columnstore databases make it fast and easy to execute reporting and querying functions. For instance, you can easily count how many customers you have living in each US zip code – or combine your customer data with a zip code marketing database.

SingleStore combines rowstore and columnstore data tables in a single, scalable, powerful database that features native SQL support. (See our blog post comparing rowstore and columnstore.) And, in addition to its fast-growing, paid enterprise offering, SingleStore also has a highly capable free option.

You can use SingleStore for free, with community support from our busy message board. SingleStore is free up to the point where you reach four nodes, or four separate server instances, with up to 32GB of RAM each – 128GB of RAM total. This large, free capacity is particularly useful for columnstore tables, where 128GB of RAM is likely to be enough to support a terabyte or so of data on disk, with excellent performance.

We have several existing customers doing important work using SingleStore for free. And when you need more nodes, or paid support, simply contact SingleStore to move to an enterprise license.

why-use-columnstoreWhy Use Columnstore?

The columnstore is used primarily for analytical applications where the queries mainly involve aggregations over datasets that are too large to fit in memory. In these cases, the columnstore performs much better than the rowstore.

A column-oriented store, or “columnstore,” treats each column as a unit and stores segments of data for each column together in the same physical location. This enables two important capabilities. The first is to scan each column individually – in essence, being able to scan only the columns needed for the query, with good cache locality during the scan. These features of columnstore get you excellent performance and low resource utilization – an important factor in the cloud, particularly, where every additional operational step adds to your cloud services bill.

The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together. SingleStore compresses data up to about 90% in many cases, with very fast compression and decompression as needed. As with the data design of columnstore tables, compression delivers cache locality, excellent performance, low resource utilization, and cost savings.

In summary, you should use a columnstore database if you need great analytics performance. It also helps that SingleStore, as a scalable SQL database with built-in support for the MySQL wire protocol, natively supports popular analytic tools like Tableau, Looker, and Zoomdata.

A big advantage with SingleStore is that you get both rowstore and columnstore tables in a single database, with built-in SQL support. This gives you a number of advantages:

  • If you need to have rowstore table data duplicated and, in many cases, augmented in one or more columnstore tables, you can do this in a single database.
  • You can run queries that join, or otherwise operate on, data spread across multiple rowstore and columnstore tables.
  • You can make “game-time” price-performance decisions between storing your data in super-fast, in-memory rowstore tables vs. large, disk-based columnstore tables, then modify your decision as your business needs change.
  • The training and experience you gain in using SingleStore for one use case extends automatically to many other use cases, whether rowstore or columnstore. For some of our customers, SingleStore is the last new database they’ll ever need – and ends up replacing one or more competing database options.

You can also read more about the difference between rowstore and columnstore in our documentation.

some-current-columnstore-optionsSome Current Columnstore Options

There are several columnstore options out in the market. Here are a few of the more popular ones that we see.

Note. Most of these options are not fully comparable to SingleStore because they don’t support both rowstore and columnstore, in-memory and disk-based tables, as SingleStore does. (See the list of benefits to this converged capability above.) However, you should consider a range of options before choosing any database provider, including SingleStore.

click-houseClickHouse

ClickHouse is an open source columnstore database developed by Yandex specifically for online analytical processing (OLAP). ClickHouse allows for parallel processing of queries using multiple cores and very fast scanning of rows, while offering good data compression.

However, there are disadvantages to using ClickHouse. There is no real DELETE/UPDATE support and no support for transactions. ClickHouse also uses its own query protocol, which means limited SQL support. This also means your favorite SQL tools may not be supported if you choose to use ClickHouse. Also, if you are migrating from a SQL database, you will likely have to re-write all your queries which have joins – a common operation in SQL.

maria-db-columnstoreMariaDB Columnstore

MariaDB is an open source fork of MySQL. This fork was done by Michael “Monty” Widenius, the author of MySQL, after Oracle purchased Sun Microsystems.

MariaDB supports an open and vibrant community that has frequent updates and excellent community support. Additionally, MariaDB maintains high compatibility with MySQL, so it can be used as a drop-in replacement which supports library binary parity and exact matching with MySQL APIs. MariaDB also offers a columnstore engine for analytical use cases.

However, since MariaDB only supports storing data on disk, if query speed and latency is a priority, then you may not be too happy with the performance. Additionally, MariaDB’s columnstore product is also still quite new, so there is likely to still have work to be done.

pivotal-greenplumPivotal Greenplum

Greenplum is a columnar data warehouse based on PostgreSQL. Greenplum uses massively parallel processing (MPP) techniques, with each database cluster containing different nodes, such as the master node and the segment node. This allows for parallel processing of queries and storage of data. Greenplum is also fully SQL-compliant and ACID-compliant. Finally, unlike most columnstore databases – but like SingleStore – Greenplum also supports both row and columnstore data storage.

However, customers sometimes complain about the performance and usability of Greenplum. Many customers found the product to be difficult to tune, as Greenplum tends to use all the available system resources for every single query, which can lead to performance degradation when multiple queries are executed at the same time. Also, under high write load conditions, Greenplum would cause something called a Linux journaling error. Errors of this type may require rebuilding the entire database, which might take many hours to complete.

sap-hanaSAP HANA

HANA is an entirely in-memory columnstore database developed by SAP. A major strength with SAP HANA is that it’s built as a data platform — there are multiple “engines” that sit inside HANA columnstore. There are specialty engines built for calculations, spatial use cases, predictive algorithms, and more, allowing users to pick and choose the right engine for their specific use case without having to use materialized views.

However, common complaints among SAP HANA users are the specialized skills one may need to work with the product. Furthermore, since SAP HANA is entirely in-memory-only with no disk component, it can get fairly expensive just for the RAM to contain all your data. Finally, the licensing costs of SAP HANA can get fairly high as well.

where-single-store-shinesWhere SingleStore Shines

In November 2018, we launched SingleStoreDB Self-Managed 6.7. With SingleStoreDB Self-Managed 6.7, as well as later SingleStore releases, you can use SingleStore for free, within fairly robust limits. When using SingleStore for free, you can create clusters with up to four nodes, with no limit on the amount of data stored on disk. You also receive community support via online forums, rather than direct, paid support.

Since launching SingleStoreDB Self-Managed 6.7, we have been listening to how people using our software for free use SingleStore. Consistently in these conversations, our users – including those that run production workloads on the free tier – have consistently praised our columnstore.

Purcado helps people get the best deals, on the best shoes,from the best retailers, quickly and easily.

What makes it so good?

  • You can use it for free up to four nodes with unlimited disk. But that’s not all — it also consistently outperforms other free, open source, and even enterprise-grade columnstore databases. This conclusion came from a number of people that have tested many rival databases – sometimes, even ten or more – before finally arriving at SingleStore.
  • SingleStore has built-in support for ANSI SQL, so the query language is very familiar. We also support the MySQL wire protocol, meaning we support a wide range of tools in the data ecosystem.
  • SingleStore offers incredible compression in disk-space columnstore, allowing you to store more data and save precious storage space at the same time. Real customers like Pandora are able to reliably achieve 85–90% on-disk compression for columnar data.
  • SingleStore’s fully distributed nature means you can simply add affordable commodity hardware to increase query performance, concurrency, and ingest speed, and reduce data size.
  • Finally, unique to SingleStore, the ability to combine rowstore and columnstore data in one query means you get the benefits of real-time and historical data unified in one query! This means simplicity for your data engineering stack, lower maintenance costs, and improved performance, as SingleStore can, in many cases, be the one database to rule them all.

Did we mention you can use all this for free?

what-people-are-sayingWhat People Are Saying

We can tout the benefits of SingleStore all we want, but we think it’s even better to let people who are using SingleStore for free do the talking for us. These are testimonials we have received directly from software developers and data engineers, company founders, and others using SingleStore to run their applications, answer their queries, and drive their businesses forward.

Paul Moss, E-commerce Startup in the United Kingdom

“I use SingleStore primarily for its columnstore. Your columnstore blows all of the free / open source columnstores in the market currently out of the water — it’s just so fast. PostgreSQL and CitusDB are inferior to your product. It’s not even close, especially since I’m running SingleStore on a single CentOS workstation machine. Additionally, as a business owner, you want the simplest engineering stack possible.

SingleStore is one database to rule them all, replacing three to four different databases. It does it all well.”

Hajime Sano, Nikkei in Tokyo, Japan

“The performance of SingleStore free tier is just as good as the enterprise version, which means performance for each query is really fast, the fastest in the columnstore databases out there. That is the greatest thing. SingleStore also supports both rowstore and columnstore in one query. We’re now able to balance real-time query performance (in rowstore) with lower hardware cost (in columnstore). 24/7 operational data goes in in-memory, while archival data goes to disk.”

Nikkei Asian Review, with scores of bureaus and more than 1000 journalists throughoutAsia, delivers both business-focused and general coverage across the region.

Software Developer in Publishing Company in Germany

“The incredible compression and speed of the columnstore engine really is something, querying gigabytes of data in seconds was amazing to see. Also the possibility of combining rowstore and columnstore in one query is a very nice feature.”

Elad Levy, Entrepreneur in the Mobile Games Industry

“SingleStore in particular has columnstore, which is free, and it’s amazing. If you want to analyze data and get business insight, just go with SingleStore’s columnstore. You also get the ability to mix and match transactions (OLTP) and analytics (OLAP) in a single query, which saves us from deploying and querying another database. It’s a 2-in-1 solution.”

Peter Baylies, Purcado in Durham, NC

“I appreciate SingleStore’s speed even on modest, single-box hardware, as well as its storage efficiency on disk.”

next-stepsNext Steps

Don’t take our word for it — you can find out yourselves why our customers are saying such positive things about our columnstore and choose to run their businesses on SingleStore, both for free and paid, with support.

We have a tutorial on loading data into SingleStore and a webinar for building an analytics app using SingleStore’s columnstore. These resources show just how fast and easy it is to set up and use SingleStore.

To sum up, when using SingleStore for free, you can:

  • Use up to 4 nodes, with no specific limit on disk storage
  • Get rich community support at forums.singlestore.com
  • Deploy to production
  • Not face any time limits

Want 24/7 support and even more nodes? You can contact us to begin the conversation.


Share