In my last blog post, I investigated a Linux performance issue affecting a specific customer workload. In this post, I will introduce the tool I created to drive that investigation.
Recently, a customer was running a test where data was loaded into SingleStore via
. The customer’s third-party benchmarking tool found that SingleStore took twice as long to load the same amount of data as a competing database; however, the numbers reported by this tool did not make sense. Local tests had shown SingleStore could process ~200 MB/s of data on a single machine(#fn-1), but the third-party tool reported SingleStore was only processing ~60 MB/s of data. Looking at raw TCP socket data on both sides of the connection, I realized that the _benchmarking tool_ was inadvertently throttling itself when running against SingleStore. When I used the standard `mysql` client to run the `LOAD DATA` command directly, performance doubled as SingleStore saturated the network card and matched competitor performance at 128MB/s2.
The problem here was that the customer was passively running their tool and only looking at the final result. I follow a methodology known as active benchmarking, a technique for validating the results of a benchmark. The original description summarizes the active benchmarking process quite well:
While the benchmark is running, analyze the performance of all components involved using other tools, to identify the true limiter of the benchmark.
Every active benchmarking session is a fully-fledged performance investigation that takes more time and effort. But it turns up the actual bottleneck3 and reveals the maximum possible performance.
When I first joined SingleStore, I found myself frequently writing one-off scripts to simulate complicated workloads for these active benchmarking investigations. Having previously used
to test filesystems, I wanted a similar tool for defining a custom workload and running it efficiently. I found the existing tools to be fairly restrictive:
In addition, I had concerns about the performance overhead of running these thread-based workload drivers that generate queries or data4. I commonly run the workload generator on the database server, where it is unacceptable for the generator to compete with the database for resources. I needed a tool to be as lightweight as possible.
Ultimately, these concerns motivated me to write a new tool,
is a fast, lightweight database workload generator that executes a workload defined a flexible configuration file.
can behave like the other stand-alone testing tools, but it really shines as a tool for active benchmarking complicated workloads. The following configuration file describes a common analytics workload: analytics queries on a table with streaming writes that are periodically deleted from a table:
[streaming inserts] ; Insert 100 rows per second. query=insert into t (value) select rand() * 1000 rate=100 [clean old rows from table] ; Every 100s, clean old rows from the table query=delete from t where insert_date < now() - interval 1 hour rate=0.01 [read nearest values] ; 8 clients reading as fast as possible. query=select count(*) from t where value between 900 and 1000 queue-depth=8
Given this configuration file,
will run forever and log some simple statistics about each job. While it is running, I can very easily observe interesting performance behaviors such as how the performance of read queries changes dramatically after the table is cleaned up. In the snippet from the logs below, the average latency of the “read nearest values” job drops from 225ms to 56ms after the table is cleaned:
2016/01/20 02:56:06 **streaming inserts**: latency 3.078086ms±1.173286ms; 100 transactions (100.177 TPS); 100 rows (100.177 RPS) 2016/01/20 02:56:06 **read nearest values**: latency _225.831701ms_±63.85481ms; 39 transactions (28.282 TPS); 39 rows (28.282 RPS) 2016/01/20 02:56:06 **clean old rows from table**: latency 7.824990134s±0; 2 transactions (0.155 TPS); 3259108 rows (253306.618 RPS) 2016/01/20 02:56:07 **streaming inserts**: latency 2.785273ms±931.039µs; 100 transactions (100.870 TPS); 100 rows (100.870 RPS) 2016/01/20 02:56:07 **read nearest values**: latency _56.268517ms_±5.004933ms; 143 transactions (131.732 TPS); 143 rows (131.732 RPS)
I recently used a very similar configuration to test a customer workload and noticed that write query throughput dropped precipitously after the table cleanup was executed. It turned out that the background log merger thread was misconfigured to be aggressive on the SingleStore instance. When the table cleanup executed, the background thread triggered and used up so many disk IOPs that the transaction log was unable to keep up with incoming writes:
was instrumental in this investigation because it provided a simple way to describe the workload and a reliable way to execute it. By monitoring and questioning the data throughout the execution of the benchmark, I was able to observe and fix a complicated performance anomaly.
version 0.1 is publicly available under an Apache license. I’m actively improving and stabilizing
. SingleStore is already using it for many customer Proof of Concepts and performance testing. Try out or contribute to
on Github today.