
Engineering
SingleStore Pipelines: Real-Time Data Ingestion with Exactly-Once Semantics
Today we launched SingleStoreDB Self-Managed 5.5 featuring SingleStore Pipelines, a new way to achieve maximum performance for real-time data ingestion at scale. This implementation enables exactly-once semantics when streaming from message brokers such as Apache Kafka.
An end-to-end real-time analytics data platform requires real-time analytical queries and real-time ingestion. However, it is rare to find a data platform that satisfies both of these requirements. With the launch of SingleStore Pipelines as a native feature of our database, we now deliver an end-to-end solution from real-time ingest to analytics.
Real-Time Analytical Queries and Data Ingestion
Let’s define real-time analytical queries and real-time data ingestion separately.
A data platform that supports real-time analytical queries quickly returns results for sophisticated analytical queries, which are usually written in SQL with lots of complex JOINs. Execution of real-time analytical queries differentiates SingleStore from competitors. In the past year, Gartner Research recognized SingleStore as the number one operational data warehouse, as well as awarded Visionary placements to the company Operational Database and Data Warehouse Magic Quadrants.
A data platform that supports real-time ingestion can instantly store streaming data from sources like web traffic, sensors on machines, or edge devices. SingleStore Pipelines ingests data at scale in three steps. First, performantly pulling from data sources – Extract. Second, mapping and enriching the data – Transform. Finally, loading the data into SingleStore – Load. This all occurs within one database, or pipeline. The transactional nature of Pipelines sets it apart from other solutions. Streaming data is atomically committed in SingleStore, and exactly-once semantics are ensured by storing metadata about each pipeline in the database.
Read Post

Engineering
What is BPF and why is it taking over Linux Performance Analysis?
Performance analysis often gets bottlenecked by lack of visibility. At SingleStore, we architected our database to easily observe its inner workings. Observability allows our engineers to easily identify components that need to be faster. Faster components mean our database’s performance skyrockets. These tools also enable support engineers to react quickly and precisely to customer needs. In the spirit of using the best available tools to which we have access, the performance team is currently evaluating next-generation tools just recently available in Linux.The newest tool for observing the Linux operating system is the “Berkeley Packet Filter” (BPF). BPF allows users to run a small piece of code quickly and safely inside the operating system. Originally used for packet filtering, it has since been enhanced from its eponymous use-case to support dynamic tracing of the Linux operating system. For example, it is possible to write a small BPF program that prints every time a particular file was accessed by a user.The power of the Berkeley Packet Filter, when used with Userland Statically Defined Tracepoints (USDT), expands beyond the operating system to the database. USDT probes are well defined locations in the database where BPF programs run, allowing engineers to ask questions previously unanswerable. For example, engineers can now examine the interactions between the database and the operating system by running BPF programs in each at the same time.Adding a USDT static tracepoint is as easy as a single macro call, which declares the probe and its arguments. This probe fires when each query is executed and records the query string:`DTRACE_PROBE1(memsqld, querystart, query);`To use this USDT probe, we need to attach a BPF program to it. We write our program in C and use the BPF Compiler Collection to compile it to BPF and attach it to our probes. The following BPF script traces queries and records their latencies:BPF_HASH(pid_to_start_hash, u32);BPF_HISTOGRAM(latency);// This function runs each time a query begins. It records the current time stamp// (`start_ts`) and save it in the `pid_to_start_ht` hash tableint querystart(struct pt_regs *ctx){ u64 start_ts = bpf_ktime_get_ns(); u32 pid = bpf_get_current_pid_tgid(); pid_to_start_hash.update(&pid, &start_ts); return 0;}// This function runs at the end of each query. Look up the saved start timestamp// (`start_ts`) for the current thread’s id (pid) using the hash table// (`pid_to_start_hash`) and record the elapsed time (`delta_ms`) in the latency// histogram.int queryend(struct pt_regs *ctx){ u32 pid = bpf_get_current_pid_tgid(); u64 *start_tsp = pid_to_start_hash.lookup(&pid); // Edge case: this query began before we started tracing. if (!start_tsp) return 0; u64 delta_ms = (bpf_ktime_get_ns() - *start_tsp) / 1000 / 1000; // Take the log of the elapsed time to put into the logarithmic histogram. latency.increment(bpf_log2l(delta_ms)); // Make sure to delete values from the hash table when they are no longer needed. pid_to_start_hash.delete(&pid); return 0;}We run `query_latency.py`, a script that wraps the above BPF program using the BCC toolchain, and get a nice histogram of query latencies:$ sudo ./query_latency.py /var/lib/memsql/master-3306/memsqld --histogramTracing queries. ^C to exit.latency (ms): value : count distribution 0 -> 1 : 9 |****************************************| 2 -> 3 : 1 |**** | 4 -> 7 : 1 |**** | 8 -> 15 : 2 |******* | 16 -> 31 : 1 |**** | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 1 |**** |Once engineers have the ability to trace when a thread is executing a SingleStore query, they can ask more interesting questions about how the database interacts with Linux. For example, engineers can investigate and determine how long queries are spending acquiring locks. With BPF, engineers are able to instrument the start and end of the queries as above, and additionally instrument the futex system call itself (used in Linux to acquire and release locks) to trace how long it takes acquire locks while executing our query:futex latencies (ms) for 'select count(distinct sid_1g) where...' value : count distribution 0 -> 1 : 0 | | 2 -> 3 : 2 |**** | 4 -> 7 : 2 |**** | 8 -> 15 : 1 |** | 16 -> 31 : 5 |*********** | 32 -> 63 : 17 |****************************************|What about how a query spends its time? On- and off-CPU flamegraphs are helpful, but they are too coarse for query investigations. We instrumented the kernel scheduler tracepoints to conditionally collect information for threads that queries run on. This tracing tells us how long the thread of a query spends in various states (waiting, running, blocked, I/O, and sleeping).The power of BPF allows us to inspect our database at runtime and ask precise questions. Increased observability provided by BPF improves the rate of performance and optimizes customer interaction with the SingleStore database. Overall, BPF provides the observability necessary to build a transparent and easily accessible modern in-memory database.Access scripts, documentation, and additional reference information on BCC and BPF [here](https://github.com/memsql/memsql-perf-tools).After joining the SingleStore performance team this summer, Kyle Laracey will be returning to Brown University in the fall. At Brown, he studies computer science, is a teaching assistant for CS167: Operating Systems, and is expected to graduate in May 2017.
Read Post

Engineering
New Performance Benchmark for Live Dashboards and Fast Updates
Newest Upsert Benchmark showcases critical use case for internet billing with telcos, ISPs, and CDNs
SingleStore achieves 7.9 million upserts per second, 6x faster than Cassandra
Benchmark details and scripts now available on GitHub
The business need for fast updates and live dashboards
Businesses want insights from their data and they want it sooner rather than later. For fast-changing data, companies must rapidly glean insights in order to make the right decisions. Industry applications like IoT telemetry monitoring, mobile network usage, internet service provider (ISP) billing, and content delivery network (CDN) usage tracking depend upon real-time analytics with fast-changing data. Web traffic merits special attention since it continues to grow at an astounding rate. According to Cisco, Global IP traffic will increase nearly threefold over the next 5 years, and will have increased nearly a hundredfold from 2005 to 2020. Overall, IP traffic will grow at a compound annual growth rate (CAGR) of 22 percent from 2015 to 2020. Many businesses face the challenge of monitoring, analyzing, and monetizing large scale web traffic, so we will explore this use case.
Use case example
In particular, we dive into the example of a content delivery or distribution network (CDN). A CDN is a globally distributed network of web servers deployed in multiple data centers across different geographic regions and is relied upon by content providers such as media companies and e-commerce vendors to deliver content to end users. CDNs have a business need to monitor their system in real-time. In addition to logging customer usage for the purpose of billing, they want to be alerted to sudden increases and decreases in their workloads for load balancing as well as for detecting network events like “denial of service attacks”. The sheer volume of web traffic mandates a massive parallel processing (MPP) system that can scale out to support the load. The concurrent need for real-time analytics points to the direction of hybrid transaction/analytical processing, or HTAP. HTAP systems enable high speed ingest and sophisticated analytics simultaneously without data movement or ETL.
Background on the Upsert Benchmark
This benchmark demonstrates the raw horsepower of a database system capturing high volume updates. Update, or upsert, is the operative word here. With a conventional `insert` a new row is created for each new database entry. With an upsert, individual rows can be updated in place. This upsert capability allows for a more efficient database table and faster aggregations, and it is particularly useful in areas such as internet billing. For more detail on this workload in use, take a look at this blog post, Turn Up the Volume With High-Speed Counters.
SingleStore delivers efficient upsert performance, achieving up to 8 million upserts per second on a 10 node cluster, using the following parameterized query:
Upsert query for SingleStore
insert into records (customer_code, subcustomer_id, geographic_region, billing_flag, bytes, hits) values
on duplicate key update bytes=bytes+VALUES(bytes),hits=hits+VALUES(hits);
Comparing Upsert performance
Legacy databases and data warehousing solutions are optimized for batch loading of data and subsequently are unable to handle fast data insertions along with ad-hoc analysis of freshly generated data. NoSQL databases like Cassandra can handle fast data insertions but have more challenges with upserts, which are critical for web traffic monitoring across end-customer behavior and tracking web requests. More importantly however, Cassandra does not provide native support for analytics and requires users to bring in additional components like SparkSQL in order to support meaningful querying of data.
We created the following query for Cassandra:
Upsert query for Cassandra
`update perfdb.records set hits = hits + 1 where timestamp_of_data=1470169743185 and customer_code=25208 and subcustomer_id='ESKWUEYXUKRB' and geographic_region=10 and billing_flag=1 and ip_address='116.215.6.236';`
The upsert benchmark is based on a simulated workload that logs web traffic across ten different geographic regions. SingleStoreDB Self-Managed 5.1 runs on a 10 node m4.10xlarge cluster on AWS, at \$2.394 per Hour (effective pricing with 1-year reserved instances), and is able to execute up to 8 million upserts per second and simultaneously run live queries on the latest data to provide a real-time window on the changing shape of traffic.
Cassandra running on an identical cluster achieves 1.5 million upserts per second. We tested the most recent 3.0.8 version of Apache Cassandra. In the Cassandra query, update means upsert.
As noted in the following chart, SingleStore scales linearly as we increase the number of machines with a batch size of 500. Cassandra however, does not appear to support large batch sizes well. According to Cassandra,
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
# Fail any batch exceeding this value. 50kb (10x warn threshold) by default.
So we set `batch_size_fail_threshold_in_kb: 5000` to support a 10,000 row batch size, but we encountered numerous errors that prevented the benchmark from running on Cassandra with these settings.
Read Post

Engineering
Massive Data Ingest and Concurrent Analytics with SingleStore
The amount of data created in the past two years surpasses all of the data previously produced in human history. Even more shocking is that for all of that data produced, only 0.5% is being analyzed and used. In order to capitalize on data that exists today, businesses need the right tools to ingest and analyze data.
At SingleStore, our mission is to do exactly that. We help enterprises operate in today’s real-time world by unlocking value from data instantaneously. The first step in achieving this is ingesting large volumes of data at incredible speed. The distributed nature of the SingleStore environment makes it easy to scale up to petabytes of data! Some customers use SingleStore to process 72TB of data a day, or over 6 million transactions per second, while others use it as a replacement for legacy data warehouse environments.
SingleStore offers several key features for optimizing data ingest, as well as supporting concurrent analytics:
High Throughput
SingleStore enables high throughput on concurrent workloads. A distributed query optimizer evenly divides the processing workload to maximize the efficiency of CPU usage. Queries are compiled to machine code and cached to expedite subsequent executions. Rather than cache the results of the query, SingleStore caches a compiled query plan to provide the most efficient execution path. The compiled query plan does not pre-specify values for the parameters, which allows SingleStore to substitute the values upon request, enabling subsequent queries of the same structure to run quickly, even with different parameter values. Moreover, due to the use of Multi-Version Concurrency Control (MVCC) and lock-free data structures, data in SingleStore remains highly accessible, even amidst a high volume of concurrent reads and writes.
Query Execution Architecture
SingleStore has a two-tiered architecture consisting of aggregators and leaves. Aggregators act as load balancers or network proxies, through which SQL clients interact with the cluster. Aggregators store metadata about the machines in the cluster and the partitioning of the data. In contrast, leaves function as storage and compute nodes.
Read Post

Engineering
New Release of dbbench Streamlines Database Workload Testing
Our performance engineering team is committed to delivering high quality tools. Since we released `dbbench` 7 months ago, it has been widely adopted across our engineering and sales teams as the definitive tool for testing database workloads. Today we are announcing availability of a new version of `dbbench`, as well as a package of high level tools to enhance it.
In this latest release, we enhanced both the flexibility and ease of use of the tool. We augmented capabilities of `dbbench` and added a tutorial to help new sales engineers get started. In addition to these enhancements, we released a package of internal tools specifically designed to support high level workflows that use `dbbench`. These changes improve our technical proof-of-concept (POC) process for our customers and open up `dbbench` for new workloads and use cases. This version of `dbbench` not only increases the performance and power of this benchmark testing tool, but also makes it easily accessible to anyone interested in using it.
dbbench in Action
One of the new features of `dbbench` is the ability to display latency histograms during the final analysis to easily detect and understand outliers. In the example below, we can see that most of the queries executed between 0.2 and 0.5 ms, but there were outliers as high as 30ms:
Read Post

Engineering
Third Normal Form, Star Schema, and a Performance Centric Data Strategy
Keeping it Straight
Data value comes from sharing, so staying organized and providing common data access methods across different groups can bring big payoffs.
Companies struggle daily to keep data formats consistent across applications, departments, people, divisions, and new software systems installed every year.
Passing data between systems and applications is called ETL, which stands for Extract, Transform, and Load. It is the process everyone loves to hate. There is no glamour in reconfiguring data such as date formats from one system to another, but there is glory in minimizing the amount of ETL needed to build new applications.
To minimize ETL friction, data architects often design schemas in third normal form, a database term that indicates data is well organized and unlikely to be corrupted due to user misunderstanding or system error.
Getting to Third Normal Form
The goal of getting to third normal form is to eliminate update, insertion, and deletion anomalies.
Take this employee, city, and department table as an example:
employee_id
employee_name
employee_city
employee_dept
101
Sam
New York
22
101
Sam
New York
34
102
Lori
Los Angeles
42
Update Anomalies
If Sam moves to Boston, but stays in two departments, we need to update both records. That process could fail, leading to inconsistent data.
Insertion Anomalies
If we have a new employee not yet assigned to a department and the ‘employee_dept’ field does not accept blank entries, we would be unable to enter them in the system.
Deletion Anomalies
If the company closed department 42, deleting rows with department 42 might inadvertently delete employee’s information like Lori’s.
First Normal Form to Start
First normal form specifies that table values should not be divisible into smaller parts and that each cell in a table should contain a single value.
So if we had a customer table with a requirement to store multiple phone numbers, the simplest method would be like this
customer_id
customer_name
customer_phone
101
Brett
555-459-8912 555-273-2304
102
Amanda
222-874-3567
However, this does not meet first normal form requirements with multiple values in a single cell, so to conform we could adjust it to
customer_id
customer_name
customer_phone
101
Brett
555-459-8912
101
Brett
555-273-2304
102
Amanda
222-874-3567
Second Normal Form
2nd normal form requires that
Data be in 1st normal formEach non-key column is dependent on the tables complete primary key
Consider the following example with the table STOCK and columns supplier_id, city, part_number, and quantity where the city is the supplier’s location.
supplier_id
city
part_number
quantity
A22
New York
7647
5
B34
Boston
9263
10
The primary key is (supplier_id, part_number), which uniquely identifies a part in a single supplier’s stock. However, city only depends on the supplier_id.
In this example, the table is not in 2nd normal form because city is dependent only on the supplier_id and not the full primary key (supplier_id, part_number).
This causes the following anomalies:
Update Anomalies
If a supplier moves locations, every single stock entry must be updated with the new city.
Insertion Anomalies
The city has to be known at insert time in order to stock a part at a supplier. Really what matters here is the supplier_id and not the city. Also unless the city is stored elsewhere a supplier cannot have a city without having parts, which does not reflect the real world.
Deletion Anomalies
If the supplier is totally out of stock, and a row disappears, the information about the city in which the supplier resides is lost. Or it may be stored in another table, and city does not need to be in this table anyway.
Separating this into two tables achieves 2nd normal form.
supplier_id
part_number
quantity
A22
7647
5
B34
9263
10
supplier_id
city
A22
New York
B34
Boston
Third Normal Form
We’re almost there! With 1st normal form, we ensured that every column attribute only holds one value.
With 2nd normal form we ensured that every column is dependent on the primary key, or more specifically that the table serves a single purpose.
With 3rd normal form, we want to ensure that non-key attributes are dependent on nothing but the primary key. The more technical explanation involves “transitive dependencies” but for the purpose of this simplified explanation we’ll save that for another day.
In the case of the following table, zip is an attribute generally associated with only one city and state. So it is possible with a data model below that zip could be updated without properly updating the city or state.
employee_id
employee_name
city
state
zip
101
Brett
Los Angeles
CA
90028
102
Amanda
San Diego
CA
92101
103
Sam
Santa Barbara
CA
93101
104
Alice
Los Angeles
CA
90012
105
Lucy
Las Vegas
NV
89109
Splitting this into two tables, so there is no implied dependency between city and zip, solves the requirements for 3rd normal form.
customer_id
customer_name
zip
101
Brett
90028
102
Amanda
92101
103
Sam
93101
104
Alice
90012
105
Lucy
89109
zip
city
state
90028
Los Angeles
CA
92101
San Diego
CA
93101
Santa Barbara
CA
90012
Los Angeles
CA
89109
Las Vegas
NV
Benefits of Normalization
Normalizing data helps minimize redundancy and maintain the highest levels of integrity. By organizing column attributes and the relations between tables, data administrators can design systems for efficiency and safety.
More specifically, normalization helps ensure
Data is not unnecessarily repeated within a databaseInserts, modifications, and deletions only have to happen once in a database
Data Management with Star Schema
Star schema is an approach of arranging a database into fact tables and dimension tables. Typically a fact table records a series of business events such as purchase transactions. Dimension tables generally store fewer records than fact tables but may have more specific details about a particular record. A product attributes table is one example.
Star schemas are often implemented in a denormalized fashion, with typical normalization rules relaxed. The advantage of this can be simpler reporting logic and faster performance as data may be stored multiple ways to facilitate queries.
The disadvantage of this approach is that integrity is not necessarily enforced through the model leaving room for an update in one place that may not successfully propagate elsewhere.
Further, with normalization, a large variety of data analytics tools and approaches can be used to query data without explicit advanced knowledge. Without normalization, schemas tend to become isolated to specific functions and less flexible across a large organization.
Flexible Star Schema Deployments with SingleStore
Is it possible or desirable to merge normalization and star schemas? Sure.
While data management strategies can be very application specific, retaining data in the most universally accessible forms benefits larger organizations. With normalization, data organization transcends application use cases and database systems.
Star schemas often skip normalization for two reasons: simplicity of queries and performance.
Regarding query simplicity, this is a tradeoff between application-specific approaches and data ubiquity across an organization. Independent of the database, this tradeoff remains.
When it comes to performance, historical systems have had challenges with operations like fast aggregations, and a large number of joins driven by third normal form. Modern database architectures have eliminated those performance challenges.
With a solution like SingleStore, a memory-optimized, relational, distributed database, it is possible to achieve normalization and performance. Even with the increased number of tables, and subsequent joins, often resulting from third normal form, SingleStore maintains stellar performance. And the core relational SQL model makes it easy to create or import a range of tables as well as maintain relations between tables.
In the next sections, we’ll explore table types in SingleStore and the associated benefits.
Using Multiple Table Types in SingleStore
SingleStore includes two table types:
A rowstore table where all the data is retained in memory and all data is persisted to diskA columnstore table where some data resides in memory and all data is persisted to disk
Using these two table types is it possible to design a wide range of schema configurations.
Contrary to popular belief, determining whether you use an all-memory or memory-plus-disk table has less to do with data size, and more with how you plan to interact with the data.
Columnstores are useful when rows are added or removed in batches, and when queries touch all or many records but only for a few columns. Aggregations like Sum, Average, and Count are good examples.Rowstores work well when operating over whole rows at a time. This includes updates to individual attributes or point lookups.For more detail on rowstores and columnstores check out
Should You Use a Rowstore or a Columnstore? From SingleStore VP of Engineering Ankur Goyal
Creating a Star Schema in SingleStore
Whether or not you lean towards normalization, SingleStore makes it easy to create a star schema within a single database across multiple table types.
Figure: Basics of a star schema with Fact and Dimension tables
Read Post

Engineering
SingleStoreDB Self-Managed 5.1 Enhances Security for Real-Time Enterprises
Enterprises seek real-time data and analytics solutions to stay current in competitive, fast-evolving markets. Companies dealing in private information, such as healthcare organizations, financial institutions, and the public sector have historically been limited in their pursuit of real-time results, given stringent security requirements. Today, we announce the availability of SingleStoreDB Self-Managed 5.1. This release adds Role-Based Access Control (RBAC) to the already powerful SingleStoreDB Self-Managed 5, unlocking the gateway to real-time for companies with comprehensive security requirements.
Industry Standard Security
Protecting data from malicious users is a top priority for SingleStore customers. Granting access to specific people is a key mechanism for data protection. But managing users one at a time is difficult to when a large number of people use the system. Role-Based Access Control (RBAC) is an industry best practice used by many organizations for simplifying data access management.
With this release, SingleStore Enterprise Edition enables organizations to leverage the power of roles. To enable this feature, two new objects were added to the security model: roles and groups. Roles are collections of permissions and groups are collections of users. Roles are then applied to groups and users are put into groups. Customers can create their own roles such as:
Security Officer – manages users and passwordsCluster Administrator – manages the SingleStore clusterBackup Operator – performs backupsApplication User – executes per-application DML statements
SingleStore also provides management functions that determine the set of permissions for a given user.
The RBAC feature was extensively tested with a rigorous set of functional and performance tests, including inside a FIPS 140-2 environment. With our performance tests, we validated that a cluster can have up 30,000 roles and 30,000 groups without any significant degradation in query performance.
Read more about RBAC in SingleStore reference documentation.
Try SingleStore Today
Download SingleStore today and harness your real-time data. Build streaming data pipelines in minutes, taking your data ingest and analytics to unprecedented levels of scale and sophistication while keeping your data secure.
Read Post

Engineering
Monitoring A/B Experiments In Real Time
This post originally appeared on the Pinterest Engineering Blog by Bryant Xiao.
As a data driven company, we rely heavily on A/B experiments to make decisions on new products and features. How efficiently we run these experiments strongly affects how fast we can iterate. By providing experimenters with real-time metrics, we increase our chance to successfully run experiments and move faster.
We have daily workflows to compute hundreds of metrics for each experiment. While these daily metrics provide important insights about behavior, they typically aren’t available until the next day. What if the triggering isn’t correct so that Pinners are not actually logged? What if there’s a bug that causes a big drop in the metrics? What about imbalanced groups? Before the real-time dashboard, there was no way to tell until the next day. Also, any subsequent changes / corrections would require another day to see the effect, which slows us down.
The real-time experiment dashboard solves these problems. Here we’ll share how we build the real-time experiment metrics pipeline, and how we use it to set up experiments correctly, catch bugs and avoid disastrous changes early, including:
Setting up the real-time data pipeline using SingleStoreBuilding the hourly and on-demand metrics computation frameworkUse cases for real-time experiment metrics
Data Pipeline
Below is the high level architecture of the real-time experiment metrics.
Read Post

Engineering
Should You Use a Rowstore or a Columnstore?
The terms rowstore and columnstore have become household names for database users. The general consensus is that rowstores are superior for online transaction processing (OLTP) workloads and columnstores are superior for online analytical processing (OLAP) workloads. This is close but not quite right — we’ll dig into why in this article and provide a more fundamental way to reason about when to use each type of storage.
Background
One of the nice things about SQL-based databases is the separation of logical and physical concepts. You can express logical decisions as schemas (for example, the use of 3NF) and SQL code (for example, to implement business logic), and for the most part avoid thinking about physical implementation details or runtime considerations. Then, based on what your workload is trying to do, you can make a series of physical decisions which optimize performance and cost for your workload. These physical decisions include where to put indexes, what kind of indexes to use, how to distribute data, how to tune the database, and even which database product to use (if you use ANSI SQL). Importantly, making physical decisions does not require changing SQL code.
Until recently, indexes were almost always backed by B-Trees and occasionally hash tables. This started to change when Sybase IQ and then more popularly C-Store/Vertica hit the market and provided incredible cost-savings and performance for data-warehouse workloads with columnstore indexes. Columnstores have hit the mainstream and are the primary storage mechanism in modern data-warehouse technology (e.g. Redshift, Vertica, HANA) and are present in mainstream databases (Oracle, SQL Server, DB2, SingleStore). Nowadays, one of the key physical decisions for a database workload is whether to use a rowstore or columnstore index.
Performance Considerations
Let us frame the discussion about when to use a rowstore or columnstore by boiling down the fundamental difference in performance. It’s actually quite simple:
Rowstores are better at random reads and random writes.Columnstores are better at sequential reads and sequential writes.
Feeling déjà vu? This is a fairly familiar concept in computer science, and it’s pretty similar to the standard tradeoff between RAM and disk. This reasoning also obviates several myths around rowstores and columnstores:
Is a RAM-based rowstore faster than a disk-based columnstore? Not necessarily — if the workload has sequential reads (e.g. an analytical workload with lots of scans) a columnstore can be significantly faster.Are writes slow in a columnstore? Not necessarily — if the writes are mostly ordered and you don’t need to run updates, then a columnstore can be as fast or even faster to write into than a rowstore, even for relatively small batches.Are columnstores bad at concurrent writes? It depends on the type of disk. Both rotational and solid-state disks are good at sequential writes, but solid-state disks tend to be significantly faster to write into concurrently; therefore, columnstores running on SSDs can be very fast at concurrent writes.
Rowstores for Analytics, Columnstores for Transactions
Let’s look at a few use cases which violate the common belief that rowstores are superior for transactions and columnstores are superior for analytics. These are based on workloads that we’ve seen at SingleStore, but these observations are not specific to SingleStore.
In analytical workloads, a common design choice is whether to append (aka log-oriented insertion) or update/upsert. For operational analytics, the upsert pattern is especially common because by collapsing overlapping rows together with an update, you partially-aggregate the result set as you write each row, making reads significantly faster. These workloads tend to require single-row or small-batch random writes, so a rowstore is a significantly better choice as columnstores can’t handle this pattern of writes at any reasonable volume. As an aside, the read pattern is often still scan-oriented, so if the data were magically in a columnstore, reads could be a lot faster. It still makes sense to use a rowstore, however, because of the overwhelming difference in write performance.
Another example of rowstores in analytical workloads is as dimension tables in a traditional star schema analytics workload. Dimension tables often end up on the inside of a join and are seeked into while scanning an outer fact table. We’ve seen a number of customer workloads where we beat columnstore-only database systems simply because SingleStore can back dimension tables with a rowstore and benefit from very fast seeks (SingleStore even lets you use a lock-free hashtable as a type of rowstore index, so you have a pre-built hash join). In this case, rowstores are the superior choice because dimension tables need to be randomly, not sequentially, read.
Finally, columnstores can be used for transactional workloads as well, in particular workloads that are computationally analytic but have operational constraints. A common use case in ad-tech is to leverage a dataset of users and groups (which users belong to) to compute overlap on-demand, i.e. the number of users who are in both Group A and Group B. Doing so requires scanning every row for all users in both Group A and Group B, which can be millions of rows. This computation is significantly faster in a columnstore than a rowstore because the cost of executing the query is dominated by the sequential scan of user ids. Furthermore, sorting by group id not only makes it easy to find the matching user ids but also to scan them with high locality (since all user ids in a group end up stored together). With SingleStore, we were able to get this query to consistently return within 100 ms over 500 billion rows stored in the columnstore. When your workload is operational and fundamentally boils down to a sequential scan, then it can run significantly better in a columnstore. As a side benefit, columnstores offer exceptional data compression so this workload has a relatively small hardware footprint of less than ten nodes on Amazon and fit in SSD.
Because these workloads bleed the traditional definitions of OLTP and OLAP, they are often referred to as Hybrid Transactional/Analytical Processing (HTAP) workloads.
Conclusions and Caveats
The main takeaway is pretty straightforward. Rowstores excel at random reads/writes and columnstores excel at sequential reads/writes. You can look at almost any workload and determine which bucket it falls into. Of course, there are still a few caveats:
If you have both random and sequential operations, it’s usually better to go with a rowstore. Rowstores tend to be better at sequential operations than columnstores are at random operations. However, if storage space is a primary design constraint, you may want to still consider using a columnstore.Some databases, namely SingleStore and Oracle, have hybrid tables with both rowstore and columnstore functionality. In SingleStore, there is a small rowstore sitting alongside every columnstore table that is used for small-batch writes (e.g. singleton inserts and updates). Oracle redundantly stores rowstore data in an in-memory columnstore to transparently speed up reads which would be faster against column-oriented data (see A Case for Fractured Mirrors). Of course, this comes at the expense of write performance since you have to pay the cost of inserting into both stores every time, rather than whichever is cheapest.Sorting in columnstores is a big deal, especially for workloads with tight requirements on performance, because it enables consistent performance with low latency. Generally, commercial columnstores (Redshift, Vertica, SingleStore) support sorting but open source and SQL-on-Hadoop databases do not (Kudu is a notable exception).
Thanks to Gary Orenstein, Andy Pavlo, and Nikita Shamgunov for reviewing drafts of this post.
This is a repost of an article by Ankur Goyal, VP of Engineering, published on Medium ⇒
Read Post

Engineering
dbbench: Bringing Active Benchmarking to Databases
In my last blog post, I investigated a Linux performance issue affecting a specific customer workload. In this post, I will introduce the tool I created to drive that investigation.
Read Post

Engineering
Investigating Linux Performance with Off-CPU Flame Graphs
The Setup
As a performance engineer at SingleStore, one of my primary responsibilities is to ensure that customer Proof of Concepts (POCs) run smoothly. I was recently asked to assist with a big POC, where I was surprised to encounter an uncommon Linux performance issue. I was running a synthetic workload of 16 threads (one for each CPU core). Each one simultaneously executed a very simple query (`select count(*) from t where i > 5`) against a columnstore table.
In theory, this ought to be a CPU bound operation since it would be reading from a file that was already in disk buffer cache. In practice, our cores were spending about 50% of their time idle:
Read Post

Engineering
The Lambda Architecture Isn’t
The surest sign you have invented something worthwhile is when several other people invent it too. That means the creative pressure that gave birth to the idea is more general than your particular situation.
Read Post

Engineering
Introducing a Performance Boost for Spark SQL, Plus Python Support
This month’s SingleStore Ops release includes performance features for Streamliner, our integrated Apache Spark solution that simplifies creation of real-time data pipelines. Specific features in this release include the ability to run Spark SQL inside of the SingleStore database, in-browser Python programming, and NUMA-aware deployments for SingleStore.
We sat down with Carl Sverre, SingleStore architect and technical lead for Ops development, to talk about the latest release.
Q: What’s the coolest thing about this release for users?
I think the coolest thing for users is that we now support Python as a programming language for building real-time data pipelines with Streamliner. Previously, users needed to code in Scala – Scala is less popular, more constrained, and harder to use. In contrast, Python syntax is widely in use by developers, and has a broad set of programming libraries providing extensibility beyond Spark. Users can import Python libraries like Numpy, Scipy, and Pandas, which are easy to use and feature-rich compared to corresponding Java / Scala libraries. Python also enables users to prototype a data pipeline much faster than with Scala. To allow users to code in Python, we built SingleStore infrastructure on top of PySpark and also implemented a ‘pip’ command that installs any Python package across machines in a SingleStore cluster.
Read Post

Engineering
Technical Deep Dive into SingleStore Streamliner
SingleStore Streamliner, an open source tool available on GitHub, is an integrated solution for building real-time data pipelines using Apache Spark. With Streamliner, you can stream data from real-time data sources (e.g. Apache Kafka), perform data transformations within Apache Spark, and ultimately load data into SingleStore for persistence and application serving.
Streamliner is great tool for developers and data scientists since little to no code is required – users can instantly build their pipelines.
For instance, a non-trivial yet still no-code-required use case is: pulling data in a comma-separated value (CSV) format from a real-time data source; parsing it; then creating and populating a SingleStore table. You can do all this within the Ops web UI, depicted in the image below.
As you can see, we have simulated the real-time data source with a “Test” that feeds in static CSV values. You can easily replace that with Kafka or a custom data source. The static data is then loaded into the hr.employees table in SingleStore.
Read Post