We wanted to evaluate and understand system performance on modern analytics applications.

To this end, our goal was to create a suitable test suite and use that test suite to benchmark databases that claim to provide exceptional performance for modern analytics applications.
In this blog, we describe the first part of our test suite — which focuses on join performance — and our efforts benchmarking SingleStore vs. ClickHouse using this test suite.
Data set and queries
We chose to use the TPC-DS dataset for our analysis. The TPC-DS dataset and benchmark make use of many features needed by modern analytical workloads, utilize a snowflake schema and are based on a realistic use case.
In addition to a dataset, we needed a query workload. The full TPC-DS query set contains 99 queries that use joins, common table expressions and more. While all of these features are used by modern analytical workloads, not all systems can run all of the TPC-DS queries.
Thus, we created a custom query workload that includes selected TPC-DS queries and custom queries inspired by the TPC-DS queries. All queries in our workload use the TPC-DS schema. We chose to use a small set of queries to provide an opportunity to better understand the reasons behind the performance of the systems, as opposed to the need to trust a specified workload, as one must do with TPC-DS.
Analysis
We compared SingleStore and ClickHouse using this workload. Both systems claim to have excellent analytics performance and have significant overlap in their target use cases. Our aim was to tease apart the performance differences and identify where each database excels and why.
For our testing environment, we used:
- A 1TB TPC-DS dataset
- ClickHouse Cloud running on two replicas with 59 vCores, 236 GB RAM each. For 2 x 59 = 118 vCores, 2 x 236 = 472 GB RAM
- SingleStore Helios® with an S-8 configuration, 64 vCPUs, 512 GB RAM
- The SingleStore Helios and ClickHouse Cloud configurations were chosen to be as close as possible in compute power
Round 1. Is ClickHouse really that bad at joins?
In this round, we ran a set of 10 queries. These queries were designed to test a variety of query shapes, and included multiple types of aggregations and filters. All queries except one included a join.
We used the schema as specified by TPC-DS for both systems.
The results were initially surprising. The majority of the graphs looked similar to the following, which shows SingleStore significantly outperforming ClickHouse on TPC-DS Query 3 on a 1TB dataset (see Appendix for the queries associated with the graphs).
.png?width=1024&disable=upscale&auto=webp)
This second graph shows SingleStore significantly outperforming ClickHouse on a modified TPC-DS Query 52 on a 1TB dataset.
.png?width=1024&disable=upscale&auto=webp)
The pattern in these two graphs was consistent across queries: SingleStore dramatically outperformed ClickHouse, often by orders of magnitude.
So, what was going on?
Round 2. Join order matters
So, after the admittedly boring results of Round 1, we tried again with an even smaller set of queries and some optimizations for ClickHouse.
For this round, our queries included:
- A "large join". The modified TPC-DS Query 52 from the first round which joins multiple large tables.
- A "small join". Joins a large table with a very small table.
Large join: A lesson in the importance of join order
We learned the ClickHouse optimizer has limited capabilities to re-order joins, a shortcoming which ClickHouse acknowledges; however, it still surprised us.
Thus, we rewrote the large join query for the ClickHouse tests to reorder the joins, putting the smallest table on the right-hand side of the join (as recommended by ClickHouse) and re-ran the query.
This rewritten query, shown as ClickHouse-Opt in the following graph, did significantly improve ClickHouses’s performance.
.png?width=1024&disable=upscale&auto=webp)
A couple of remarks:
- We did not rewrite the query (to change the join order) for SingleStore as SingleStore's query optimizer automatically reorders joins — as is done by most modern query optimizers.
- Even with the rewritten query, SinglesStore still outperformed ClickHouse on this query by approximately a factor of 10.
- Re-writing queries to reorder joins is possible in a benchmark, but is unrealistic in a workload where queries are automatically generated by business intelligence (BI) tools.
Small join: Similar results
The results for the small join were similar.
The small join results showed that even with small table joins, SingleStore demonstrated superior performance.
.png?width=1024&disable=upscale&auto=webp)
Yes, sometimes you really do need a join
It can be argued that data can be denormalized so joins aren't required. It is true that, in certain analytical use cases, a single table may suffice.
However, joins are valuable and in many cases, necessary. For example, joins are required for:
- Privacy-sensitive data separation. Storing privacy-sensitive data in a separate table to limit access to the privacy-sensitive data.
- Dynamic dimension table updates. Update a dimension table without rewriting the entire fact table.
- Best-practice data modeling. Maintaining normalized data models or star or snowflake schemas for integrity, storage efficiency and ease of understanding by business analysts.
The join performance gap isn't just a benchmark artifact; modern applications require joins.
Key findings
- Join rewrites and tuning. ClickHouse requires users to manually specify join orders and, in general, requires tuning to perform well. We were surprised at the poor performance of untuned ClickHouse. Tuning requires significant resources, and:
- Distracts focus from delivering business value
- Presents a significant operational burden
- Increases total cost of ownership (TCO)
- Out-of-the-box performance. SingleStore's robust performance suggests lower operational complexity and faster time-to-value for organizations implementing analytical solutions.
- Join performance. SingleStore substantially outperforms untuned ClickHouse, and even outperformed ClickHouse with join rewrites.
- The performance differences we observed have direct implications for organizations building AI, analytical workloads and analytics applications.
- Join operations are fundamental to real-world analytical workloads, and ClickHouse's limitations in this area will force organizations to choose between query performance and data architecture best practices — a compromise that shouldn't be necessary in modern analytical systems.
Conclusion
While ClickHouse has earned recognition in the analytics space, our testing reveals performance gaps and operational complexity challenges that organizations should carefully consider. In short, SingleStore delivers superior out-of-the-box performance across diverse workload types, while maintaining the simplicity and reliability enterprises need for production AI and analytics applications.
When it comes to real-world analytical workloads, SingleStore provides the performance and operational simplicity that modern enterprises require for their AI and analytics initiatives — a simplified data architecture to provide single-shot retrieval for AI applications with real-time insights on operational data; transactions, analytics, search and more.
Want to test how SingleStore outperforms Clickhouse yourself? Start free today.
Benchmarking performed by benchANT.
Appendix
Methodology
These results used a 1TB TPC-DS data set with the schema as defined by TPC-DS. The systems used were SingleStore Helios S-8 v8.9.29 and ClickHouse v24.12 Cloud 2 x (59 vCores, 236 GB RAM). Both database instances were deployed on AWS in the Frankfurt region (eu-central-1).
The workload generator was implemented as an extension to the popular and widely known Yahoo Cloud Serving Benchmark (YCSB). More precisely, we added a custom workload implementation to execute the set of queries required for our experiments. For running the experiments, the workload generator was executed on a virtual machine (type c5.4xlarge) deployed in the same region as the database instance.
For all experiments discussed in this blog entry, we chose a set-up where all queries are issued sequentially, i.e., one after the other. Further, during each experiment, a set of queries was selected and during the experiment, the workload generator chose the next query uniformly amongst the available queries. So if query A, query B and query C were selected for an experiment, the workload generator would choose each of them with a probability of ⅓.
For round one, the workload was a mix of ten queries including TPC-DS query 3 (listed below) and analytics queries inspired by TPC-DS queries. For round two, we used three different queries. In several experiments, we evaluated all permutations of the JOIN order of all queries.
Key queries
Figure 1. TPC-DS Query 3
1define AGGC= 2text({"ss_ext_sales_price",1},{"ss_sales_price",1},{"ss_ext_discount_amt",1}3,{"ss_net_profit",1});4 define MONTH = random(11,12,uniform);5 define MANUFACT= random(1,1000,uniform);6 define _LIMIT=100;7 8select dt.d_year 9 ,item.i_brand_id brand_id 10 ,item.i_brand brand11 ,sum([AGGC]) sum_agg /* sum(ss_ext_sales_price) sum_agg */12 from date_dim dt 13 ,store_sales14 ,item15 where dt.d_date_sk = store_sales.ss_sold_date_sk16 and store_sales.ss_item_sk = item.i_item_sk17 and item.i_manufact_id = [MANUFACT] 18 and dt.d_moy=[MONTH] 19 group by dt.d_year 20 ,item.i_brand 21 ,item.i_brand_id 22 order by dt.d_year23 ,sum_agg desc24 ,brand_id25 limit 100;
1define MONTH= random(11,12,uniform);2define YEAR = random(1998,2002,uniform);3define _LIMIT=100;4 5select dt.d_year6 ,item.i_brand_id brand_id7 ,sum(ss_ext_sales_price) ext_price8 from date_dim dt9 ,store_sales10 ,item11 where dt.d_date_sk = store_sales.ss_sold_date_sk12 and store_sales.ss_item_sk = item.i_item_sk13 and item.i_manager_id = 1 /* reduces to 5000 from 300000 */14 and dt.d_moy=[MONTH]15 and dt.d_year=[YEAR]16 group by dt.d_year17 ,item.i_brand_id18 order by dt.d_year19 ,ext_price desc20LIMIT 100;
1define MONTH= random(11,12,uniform);2 define YEAR = random(1998,2002,uniform);3 define _LIMIT=100;4 5select dt.d_year6 ,item.i_brand_id brand_id7 ,sum(ss_ext_sales_price) ext_price8 from store_sales9 ,date_dim dt10 ,item11 where dt.d_date_sk = store_sales.ss_sold_date_sk12 and store_sales.ss_item_sk = item.i_item_sk13 and item.i_manager_id = 1 /* reduces to 5000 from 300000 */14 and dt.d_moy=[MONTH]15 and dt.d_year=[YEAR]16 group by dt.d_year17 ,item.i_brand_id18 order by dt.d_year19 ,ext_price desc20LIMIT 100;
1define SDT = random(2450816,2452642,uniform); 2 3select s_store_id 4 ,count(ss_quantity) as store_sales_quantitycount5 ,avg(ss_quantity) as store_sales_quantityave6 ,stddev_samp(ss_quantity) as store_sales_quantitystdev7 ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov8 ,sum(ss_sales_price) as store_sales_spricesum9 ,avg(ss_sales_price) as store_sales_spriceave10 ,avg(ss_wholesale_cost) as store_sales_wscostave11 ,avg(ss_wholesale_cost) as store_sales_wscostave12 ,sum(ss_list_price) as store_sales_lpricesum13 ,avg(ss_list_price) as store_sales_lpriceave14 ,sum(ss_ext_tax) as store_sales_taxsum 15 ,max(ss_ext_sales_price) as store_sales_espricemax16 ,min(ss_ext_list_price) as store_sales_elpricemin 17from store_sales, 18 store19where s_store_sk = ss_store_sk AND20 ss_sold_date_sk = SDT21group by s_store_id; 22