We Spent a Bunch of Money on AWS And All We Got Was a Bunch of Experience and Some Great Benchmark Results

JS

John Sherwood

Senior Software Engineer

We Spent a Bunch of Money on AWS And All We Got Was a Bunch of Experience and Some Great Benchmark Results

Editor’s note: Running these benchmarks and documenting the results is truly a team effort. In addition to John Sherwood, Eric Hanson and Nick Kline authored this blog post.

If you’re anything like us, you probably roll your eyes at “company benchmarks its own software, writes a post about it.” This of course raises a question: if we’re cynical industry veterans, numb from the constant deluge of benchmarketing … why are we writing this? Simple: we wanted to prove to ourselves that we’re the only modern, scalable database that can do a great job on the three well-known database benchmarks, TPC-C, TPC-H, and TPC-DS, which cover both OLTP and data warehousing. Now, we hope to get you to suspend your skepticism long enough for us to prove our capabilities to you.

SingleStore’s primary focus is on what we call operational analytics (you might have heard this referred to as HTAP, translytical, or HOAP), running analytical queries across large, constantly changing datasets with consistently high performance. This performance is provided through the use of scale-out, compilation of queries to machine code, vectorized query execution, and use of single instruction, multiple data (SIMD) instructions. Our operational analytics capability blurs OLTP and data warehouse (DW) functionality, and different use cases take advantage of different slices of our spectrum of capabilities.

To test some of the new features in our upcoming 7.0 release, we used multiple TPC benchmarks to push SingleStore beyond what our normal everyday stress tests can reach. Our results show that we can do both transaction processing and data warehousing well, and we scale well as workload size increases. No other scale-out database running on industry-standard hardware can do this.

playing-fairPlaying Fair

TPC benchmarks have a certification process, and we did our best to follow the specifications, but we did not work with TPC to certify the benchmarks, so these are informal results. But we hope it goes without saying that we didn’t cheat. In fact, SingleStore does not have any “benchmark specials” (features designed just to make the benchmark result better, but that nobody would ever use for a real application) built into the code.

the-tl-drThe TL;DR

This is a long post because we wanted to write an in-depth overview of how we ran the benchmarks so you can see how we achieved all these results. But if you want just a quick summary, here’s how we did on the benchmarks.

  • TPC-C: SingleStore scaled performance nearly linearly over a scale factor of 50x.
  • TPC-H: SingleStore’s performance against other databases varied on this benchmark, but was faster than multiple modern scale-out database products that only support data warehousing.
  • TPC-DS: SingleStore’s performance ranged from similar, to as much as two times faster than other databases. Expressed as a geometric mean, as is often done for benchmarking results, our performance was excellent.

What we accomplished was solid performance across three distinct benchmarks, establishing SingleStore as a top choice database for operational analytics and cloud-native applications. Now let’s move on to the details of the benchmarks.

tpc-c-scale-out-oltpTPC-C: Scale-out OLTP

On the massively transactional side of the spectrum, we have the TPC-C benchmark. To quote the 1997 SIGMOD presentation announcing it, a benchmark is a “distillation of the essential attributes of a workload,” and TPC-C distills the absolute hell out of a sharded transactional workflow. As the scale (defined by the number of warehouse facilities being emulated) increases, the potential parallelism increases as well. From our perspective, this is ideal for discovering any bottlenecks in our own code.

Our TPC-C database design used single tables for the large data sets, allowing us to use an existing driver targeting MySQL. Unlike some other official results published by major database vendors, we did not artificially split the data into multiple separate tables to reduce locking and contention. The way we ran the benchmark is much simpler, and shows how our scale-out architecture can make application development easier.

While experimenting with smaller datasets, we quickly discovered that driving significant load required us to nearly match the CPU of the aggregators with the leaves in the cluster. This configuration, strong all over (“very Schwarzenegger”, as one of our engineers put it) is quite unusual; it’s rare that customers require such a ratio of aggregators to leaves.

In part, our internal benchmarking service colocates drivers on aggregators, which required us to size up boxes to have the extra CPU. Additionally, under the pure OLTP TPC-C workload, aggregators are primarily coordinating transaction state and are effectively concurrency bound.

As we would recommend for a real workload of this nature, we had redundancy and synchronous replication enabled. This means that there are two copies of every data partition on two different nodes for high availability (HA), and transaction commits are not acknowledged to the client until both the primary and secondary replicas are updated. This of course requires twice the memory compared to running without HA, and adds to transaction overhead, since row updates must be applied on the replica node before the commit. This cost is to be expected if HA with fast failover is required. If you want HA, you need to pay.

After experimenting with smaller datasets, we drove just over 800,000 transactions per minute (tpmC) against 10,000 warehouses, using a cluster with 18 leaves and 18 aggregators, all r3.8xlarge instances. This led us to scale up, moving onto TPC-C at the 50,000 scale. This required what we would term a formidable cluster; storing a single replica of the data (including indexes) required more than 9 TB of RAM.

For HA, we stored two copies of the data, which, after some fairly straightforward math, took us to almost 20 TB. We decided to use r5.metal instances, using 2x the leaf instances, for a total of 6x the cores compared to the 10,000 warehouse cluster. Notably, the only change we made in running the benchmark itself was in tuning the partition count and hash index buckets; from the perspective of the open source benchmark driver we were using, originally written for MySQL, everything just worked.

The last notable thing about our 50,000 warehouse run was that we ran out of hardware from AWS while trying to reach the limit of the cluster; as we had taken 36 hosts for leaves and 21 for aggregators, we suppose that’s pretty reasonable.

With 1,728 physical cores on the leaf nodes for the megacluster (to use the scientific term), it’s hard to argue with calling this “throwing hardware at the problem.” However, as discussed above, we were able to easily meet the same per-core performance as at smaller scales. We feel that our overall results are a strong validation of the performance improvements to our new intra-cluster replication implementation in SingleStoreDB Self-Managed 7.0.

Figure 1 – TPC-C results for SingleStore.

ScalevCPUs
1,00048
10,000576
50,0003,456

Table 1 – Leaf Cores, normalized to vCPUs. This table only includes leaf vCPUs for SingleStore, as we also had to use aggregator CPU to run the benchmark driver instances. Full details are included in Appendix A.

In Table 1, for SingleStore, we show both the number of leaf cores and the total number of cores, including aggregator cores. The way our benchmark kit is designed, it was easiest for us to run the benchmark with this configuration. In addition, our benchmark setup ran the application threads driving the transaction work on the aggregator machines. Normally that is not accounted for in the benchmark results.

The aggregator nodes were not fully utilized, and significantly over-provisioned to drive this test, though we were not able to determine the minimum amount of equipment needed to obtain performance at the level shown in Figure 1. Despite the caveats, these results show the dramatic benefits of our distributed architecture, in-memory skiplist and hash storage structures, and compilation of queries to machine code.

While the above data is a beautiful (in our eyes) illustration of our ability to consume an OLTP firehose, these results are only possible by violating the rate limiting requirement found in the TPC-C specification. TPC-C is intended to emulate a limited number of physical terminals being interacted with by humans, and so more faithful drivers are limited to only pushing a handful of tpmC per warehouse. In a great example of why the TPC and the concept of audited results exist, we explicitly state here that these results are not comparable to others on that basis.

per-core-throughput-for-oltpPer-Core Throughput for OLTP

It’s comforting to know that you’re getting your money’s worth as you scale a cluster and workload to larger sizes. Figure 2 illustrates the ability to maintain throughput per core for a transaction processing workload as data and cluster size increase. This translates to roughly linear scalability in throughput as the size of the transaction processing workload increases. Yet you can still run analytical queries across all the data, unlike the common approach of manually sharding large data sets across multiple databases instances with a single-node DBMS, such as Postgres, MySQL, or SQL Server. The nature of SingleStore allows for running analytic-style queries on this data in the same instance that is also handling transaction processing.

Figure 2 — Per vCPU transaction throughput (includes aggregator vCPUs) . When increasing the dataset size and the hardware to match, SingleStore TPC-C throughput per vCPU held within a 30% band. We had plenty of headroom on the leaves at the 50,000 scale, and could have driven more throughput, but AWS ran out of r5.metal hosts for us to use as benchmark drivers, which was a pretty cool consolation prize in our opinion.

tpc-h-and-tpc-ds-dw-scalability-and-then-someTPC-H and TPC-DS: DW Scalability and Then Some

On the opposite end of the spectrum, we have TPC-H (H) and TPC-DS (DS), categorized by the TPC as “decision support” benchmarks. H and DS use similar datasets, and DS is effectively the next-generation version of H. While H generates fairly straightforward queries and tends to be shard-friendly, DS thrills in its use of advanced SQL features and functions and exhilarates in its lopsided filters. Running DS is notoriously, purposefully difficult, and SingleStore can run all 99 DS queries, while many decision support-oriented databases cannot.

tpc-dsTPC-DS

At scale factor 10,000, the largest TPC-DS table contains just shy of 29 billion rows, with some 24 billion others spread out across the rest of the tables. TPC-DS then runs a set of 99 queries using a  formidable array of operators and features against this large data set. Below, we show all published official TPC-DS benchmark results to date  [AL19, AL19b, Cis19] , comparing the total query runtimes with our unofficial SingleStore run. In addition, we sought out one more informal result on a well-known cloud data warehouse product, which we refer to as “Cloud DW Product.” Although SingleStore’s core strength is operational analytics, these comparisons show that we also are also competitive with data warehouse and big data products on a very challenging pure data warehouse workload.

Figure 3 — comparative TPC-DS runtimes at 10TB scale factor — lower is better. For TPC-DS, we performed a ‘power run,’ where the sequential runtime of each query is summed. Please see the differences in benchmark versions and hardware used, described below.

ProductPhysical Cores (not VCPUs)TPC-DS VersionIs Certified Benchmark ResultPower Run Time (s)Geometric Mean (s)
SingleStore6401.4.0no6,4956.03
Cloud DW Product32 nodes, unknown size1.4.0no4,48111.03
Alibaba AnalyticsDB5122.10.1yes3,9359.81
Alibaba Hadoop5202.10.1yes16,20781.96
Transwarp Data Hub10242.7.0yes21,61572.73

Table 2 – Hardware used for TPC-DS runs and results, Power Run times, and Geometric Mean times . We normalized to physical cores rather than vCPUs to match TPC usage.

The values displayed in Figure 3 and Table 2 don’t imply a strict ranking of the systems because of the hardware and version differences. Nevertheless, we believe that the overall results illustrate important relative strengths of the different systems, because the hardware for all the systems is of the same order of magnitude in scale, and the queries are largely equivalent between versions.

We’ve included the geometric mean of the results in Table 2. You can calculate the geometric mean as the nth root of the product of n numbers. (For instance, the 9th root of the product of 9 numbers.) The interesting aspect of a geometric mean, when used for benchmark results, is that it de-emphasizes outliers. As a result, many consider it the most useful single value when reporting results of this type.

For the result shown in Figure 3, where we spent about 6,500 seconds to run the queries, we used a cluster consisting of 19 i3.16xlarge leaves and a single i3.16xlarge aggregator instance. With over 600 leaf CPUs, this was certainly a sizable (and, we have been informed by our finance team, expensive) cluster, but still similar in scale to other clusters that successfully completed the benchmark.

tpc-hTPC-H

Figure 4 — TPC-H total runtimes comparing SingleStore and SQL Server — smaller is better. We performed the same ‘power run’ with TPC-H as for DS, where the result is the sum of the run times of the sequentially executed queries that constitute the benchmark. This figure compares SingleStore, SQL Server 2017 at scale factor 10,000, and SQL Server 2016 at scale factor 30,000 (10 TB and 30 TB).

Compared to TPC-DS, TPC-H is a much friendlier (and older) benchmark. We ran H at the 10 TB scale factor on a cluster of 8 i3.8xlarge leaf nodes, and then scaled to run it at 30 TB using 24 i3.8xlarge leaf nodes. The 30 TB dataset contains a table with 180 billion rows, compressed to about 11 TB on disk in our columnstore, and took about three and a half hours at an average of 2.4 GB/s to load from S3 using our Pipelines feature.

While researching the benchmark, we found that every run submitted to the TPC at these larger scales was for SQL Server, the best of which we’ve graphed next to our results above. As shown in Figure 4, at a lower scale, we are outperformed by a quad-socket host running SQL Server with 224 vCPUs versus our 256. But at a larger size, scale-out catches up to scale-up – and running on commodity hardware versus specially-crafted quad socket monsters is a virtue of its own, as far as data center operators are concerned. Details of the SQL Server results are given in benchmark results published by Microsoft [MS16, MS17].

Figure 5 – Data volume processed per second per core for TPC-H . This chart shows a normalized throughput measure (DB_size_MB/cores/sum_of_query_runtimes_in_seconds) for each of TPC-H 10000 and TPC-H 30000. This is to illustrate how SingleStore scales for the benchmark as data size increases.

What we’re most looking for here is validation of our scale-out architecture. A fantasy, “perfectly scalable” database would process 3x the data with 3x the hardware in the same amount of time; as illustrated in Figure 5, we took about 30% longer than that ideal. Given that queries often require a final aggregation step, some falloff is expected. Moreover, more hardware adds substantial marginal return and doesn’t hit a ceiling in these tests — per core throughput drops by 30% as total work done goes up by a factor of around 300%.

Figure 6 — comparative TPC-H results at 30 TB scale factor.

We again scoured the web for TPC-H results and found several for well-known cloud data warehouse products, including Amazon Redshift, Microsoft Azure SQL Data Warehouse, and Snowflake Data Warehouse in reports from GigaOm [GO18, GO19], as well as certified single-box results on the TPC web site. As shown in Figure 6, SingleStore provides competitive results compared to these products.

cost-and-valueCost and Value

At this point you might think the results are pretty impressive, but you’re probably also thinking there is some special-purpose hardware helping make these benchmarks run smoothly. Well, there isn’t. Just as we didn’t add any code tricks to make the benchmark run better, we didn’t use anything other than industry-standard hardware. We really wanted to make these benchmarks replicable by anyone who wanted to test them, and you can. We explain how to do that below.

The enterprise edition of SingleStore will cost you dramatically less than legacy vendors such as Oracle, plus we have a free version running on up to a 4 node cluster that supports all enterprise features. Beyond just price, we believe SingleStore offers new value to customers by enabling them to build operational analytics applications, mixing OLTP and analytics operations, in a way that has not been feasible before. This enables real-time dashboards and other analytics on just-born data, driving new business value.

By showing results for TPC-C, TPC-H, and TPC-DS running on one scale-out-capable SQL DBMS, we believe we demonstrate that SingleStore delivers new value. In addition, given the breadth SingleStore covers, once a group of application developers in an organization are trained to use it, they can apply their skills to a broad range of applications, giving flexibility to them and their managers. This of course is also valuable to the business.

conclusionConclusion

We’ve shown that SingleStore can deliver strong results for the TPC-H and TPC-DS analytical benchmarks and the TPC-C transaction processing benchmark. To our knowledge, we’re the only SQL DBMS supporting true shared-nothing scale-out that can do this. Our hardware needs are comparable to competitors when running these benchmarks at similar scale factors, and our software licensing costs are significantly less than those of the established database vendors. SingleStore skills can be used to build a variety of applications, including systems for transaction processing, operational analytics, operational data stores, and data warehousing, without scale limits. This will give you and your team the ability to solve multiple problems at once, in less time, with one powerful tool.

This post was edited to remove a comparison to another company’s published data after we discovered that the comparison was technically inaccurate.

appendix-a-more-detailed-resultsAppendix A: More Detailed Results

tpc-cTPC-C

The hardware configurations we used for TPC-C for 1,000, 10,000, and 50,000 warehouse scale factors are given in the following table. The node types given are for AWS.

WarehousesLeaf NodesSum Leaf vCPUsSum Leaf RAMAggregator NodestpmC
1,0003 x r3.4xlarge48366 GB3 x r3.4xlarge94,900
10,00018 x r3.8xlarge1,1524.29 TB18 x r3.8xlarge800,400
50,00036 x r5.metal3,45627 TB21 x r5.metal5,300,000

We used the open-source MySQL driver from Percona for TPC-C, with slight modifications to the DDL. Specifically, we made the item table into a reference table, the history table into a columnstore table, changed indexes where appropriate to be hashes instead of skiplists, and sized the bucket count appropriately for the size of the dataset. Our cluster was configured to use two replicas for data, replicated synchronously across paired hosts, with asynchronous writes to disk.

Aggregator nodes ran the benchmark driver alongside the SingleStore aggregator instances, where the driver consumed roughly half of the CPU used. Although we used the same instance type for leaves and aggregators out of convenience, in the TPC-C workload aggregators also required very little RAM, and it should be easily possible to save on costs by running on lower-end hosts.

We can’t speak highly enough of the driver from Percona and how useful it was in enabling us to run the benchmark. We generated our datasets using the utility from the same repo, albeit modified to generate pipe-separated value files. We then loaded these files in parallel with SingleStore Pipelines, instead of doing direct database inserts.

tpc-hTPC-H

The hardware configurations we used on AWS for TPC-H are given in the following table.

Scale FactorLeaf NodesSum Leaf Phys. CoresSum Leaf RAMAggregator NodesPower Run (s)
10 TB8 x i3.8xlarge1281.9 TB1 x m4.10xlarge1,789
30 TB24 x i3.8xlarge3845.72 TB1 x m4.10xlarge2,371

We generated our datasets using the open-source dataset generator available from tpc.org. In calculating the power run, we performed a warmup query, then averaged the result of 1-5 executions. The total number of runs was sometimes limited due to the total runtime of all the queries and the time available to finish the runs.

tpc-dsTPC-DS

The hardware configuration we used on AWS for TPC-DS is shown in the table below.

Scale FactorLeaf NodesSum Leaf Phys. CoresSum Leaf RAMAggregator NodesPower Run (s)
10 TB19 x i3.16xlarge6089.05 TB1 x i3.16xlarge6,495

We generated our datasets using the dataset generator available from tpc.org. In calculating the power run, we performed a warmup query, then averaged the result of 1-5 executions, depending on the runtime of query execution, in the same fashion as for our TPC-H runs.

appendix-b-how-to-play-along-at-homeAppendix B: How to Play Along at Home

If you’re interested in replicating our results, or experimenting with SingleStore and these benchmarks, we’ve included instructions on how we went about running C, H, and DS here.

tpc-cTPC-C

To run the TPC-C benchmark, we relied heavily on Percona’s excellent work, which we’ve forked to github.com/memsql/tpcc-mysql. This driver allows both generating the dataset and driving load. As SingleStore is wire compatible with MySQL, we were able to use the driver binary in an effectively unaltered format; in fact, the only changes we made to the repository were as follows:

  • Fixed a units bug that caused queries to be judged as failing if they took 5 milliseconds instead of 5 seconds.
  • Changed the dataset generator to create pipe-separated value files rather than inserting directly into the database.
  • Altered the schema to have shard keys and to take advantage of SingleStore features such as reference tables and hash indices.

If you’d like to use this driver to verify our results, the README covers how to both generate data and run the benchmark.

cluster-setupCluster Setup

For the purposes of the TPC-C benchmark, there are several configuration settings that will drastically impact your results. To match what we used, do the following:

  • SingleStoreDB Self-Managed 7.0: While we did recently make our latest and greatest 6.x release available, one of the major improvements in SingleStoreDB Self-Managed 7.0 (download here) is an entirely new system for handling intra-cluster replication more efficiently. While 6.x releases support replication, performance is dramatically improved in 7.0.
  • Enable synchronous replication.
  • Enable asynchronous durability.
  • Once SingleStore has the correct configuration, you will need to create a database, per the Percona README.

benchmark-executionBenchmark Execution

While running the benchmark is exceedingly simple, to fully stress a large SingleStore cluster you will need to spread the load across multiple aggregators. To get relevant results, the driver instances need to run in parallel, and only data from when all drivers are running is a valid representation of the cluster’s performance. We wrote a harness that used the database to coordinate across hosts; however, as it hooks in pretty deeply to our internal benchmarking system, we’re unable to meaningfully open source this extension.

tpc-h-dsTPC-H & DS

While the TPC-C specification is highly descriptive and gives a wide latitude in how results are achieved, TPC-H and DS are explicitly about executing SQL queries. As such, we were able to leverage the dataset and query generation tools available for free download from tpc.org: [H], [DS].

At a high level, running both benchmarks on SingleStore consists of the following steps:

  1. Data generation
  2. Create the table schema using our script (so that you use properly sized SingleStore data types and our shard key definitions and indexes) using create-tables.sql
  3. Load the data, using SingleStore Pipelines or a tool of your choice
  4. After loading the data, create statistics and optimize the tables; the commands are found in after-load.sql
  5. Run the queries by executing queries.sql

Sizing the SingleStore database cluster

We ran the benchmarks on AWS. For the DS benchmark at the 10 TB size we used a 20 node cluster with 1 MA, and 19 leaves, in total 20 nodes; for the H benchmark at 30 TB we used a cluster with an m4.10xlarge aggregator and 24 i3.8xlarge leaves.

Generating the data

Use the standard data generator for each benchmark found at http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp. To accomplish this, look under H & DS and download the tools for that benchmark. Register, download the latest version, compile the C-based generator. Data generation can take a long time, especially at a larger scale like the 10 & 30TB sizes we used. The data size of the csvs will unexpectedly be quite close to the promised value, and we highly recommend asynchronously compressing and uploading to cloud storage; we stored the generated data files in AWS s3.

Creating the schema

You can find the table schema and queries at the SingleStore github repo at memsql/benchmarks-tpc, looking under the ‘h’ or ‘ds’ subdirectories as appropriate. You can clone or copy them to your local machine by using the command

git clone git@github.com:memsql/benchmarks-tpc.git

In the benchmarks-tpc/tpcds and benchmarks-tpc/tpch  directories you can find the files we used to create and run the benchmark, respectively H and DS versions of create-tables.sql, after-load.sql andqueries.sql. Two common ways to submit queries to SingleStore are to use the command line MySQL client or the SingleStore tools. The SingleStore Studio tools, including the SQL editor, are described at https://docs.singlestore.com/db/latest/en/user-and-cluster-administration/cluster-management-with-tools/singlestore-db-studio-overview.html.

Loading the data

Depending on the system your test cluster is running on, there are various alternatives for loading data. The SingleStore Pipelines feature loads data in parallel; see https://archived.docs.singlestore.com/v6.7/concepts/pipelines/pipelines-overview/. We used SingleStore Pipelines to load data from S3 for our tests. An example pipeline command follows.

CREATE PIPELINE catalog_sales AS LOAD DATA S3\"<memsqlpath\>/sf_10000/catalog_sales."\CONFIG '{"region":"us-east-1","disable_gunzip":false} '\CREDENTIALS '{"aws_access_key_id": "<key\>",\"aws_secret_access_key": "<secret\>"} '\INTO TABLE catalog_sales\FIELDS TERMINATED BY '|'\LINES TERMINATED BY '|\n';

After loading

The background flusher in SingleStore automatically and asynchronously reorganizes columnstore segment files (the internal files used to store columnstore tables) for compression and to create a total ordering across all segment files in a table. To make query execution faster and avoid asynchronous processes during query execution, we manually ran the command “optimize table”, as in OPTIMIZE TABLE call_center; see https://archived.docs.singlestore.com/v6.7/reference/sql-reference/data-definition-language-ddl/optimize-table/

SingleStore has manually created histogram statistics. Manual statistics can be created by running “analyze table” on each table, such as: ANALYZE TABLE call_center COLUMNS ALL ENABLE

We did this before running the queries. These two commands are provided in after-load.sql.

Running the queries

We ran the queries multiple times and use the average run time, not counting the initial run where the query was compiled. We used columnstore tables for this benchmark, which are stored on disk, not in memory. The difference between the warm-up run and subsequent runs is just the time to compile a query, which was generally less than a second, but might be as long as a couple of seconds for very large and complex queries. The average run time varies a little on multiple runs of a query in a shared system like AWS, but we never saw that much variation.

Many TPC benchmarks, including TPC-DS, have templatized queries that replace “placeholders” in an incomplete query text with specific values, such as numeric or string constants. We didn’t find much variation with different parameters, so we used a fixed parameterization for the ease of tracking performance across time. We include the query text with the parameters we used when running the queries in queries.sql.

appendix-c-tpc-h-raw-resultsAppendix C: TPC-H Raw Results

As described above, when running TPC-H and TPC-DS, there was an initial “cold” or warm-up run where SingleStore compiles (optimize and code-gen) the queries in addition to running them. (See our documentation for the compilation pipeline.)The warm run reuses that same query plan, but re-executes the entire query re-reading all data. The cold run time is a single run, while the final result is an average of warm runs as reported in Appendix A.

Generally the difference between final and cold times comes down to the compilation time as described above. However, as some results show, there can be other factors coming from the use of shared infrastructure.

Query10T (final)10T (cold)30T (final)30T (cold)
118.91619.81524.23826.117
217.07318.3894.10101.062
376.48376.65597.520103.450
428.23629.8434.81836.77
548.44249.52057.89761.024
69.85210.4212.71213.536
745.12546.41085.48884.513
827.96128.75165.81266.735
9154.404159.037266.970266.544
1062.51964.06973.24875.532
1130.52432.18382.78183.560
1216.45418.08320.36922.320
13183.827183.469201.577193.112
14106.647107.013125.455127.872
1544.92746.70467.68570.124
1635.94737.47839.53440.943
1715.43116.30734.82034.690
18428.70425.285499.514501.920
1927.40031.50429.34031.137
20268.29271.174221.616230.66
2183.63284.015118.185118.438
2258.34161.310120.527122.270

appendix-d-tpc-ds-raw-resultsAppendix D: TPC-DS Raw Results

Refer to Appendix C for information on how to interpret these results.

Query10T (final)10T (cold)
17.9339.232
216.36717.324
30.7741.633
4105.199106.541
523.13327.435
6149.987150.346
72.0333.293
82.6243.558
99.37517.654
108.23111.261
1146.00645.239
120.8363.395
132.3274.435
14321.497329.853
157.6815.129
1660.11558.161
172.4216.716
1810.28913.983
192.0594.071
200.9553.061
211.2922.353
221.5093.311
23590.080598.040
241435.2691530.016
252.3907.031
261.5102.930
272.2454.688
2827.21534.892
29120.356124.340
308.65610.561
3118.99619.429
320.5412.212
337.4599.128
342.7304.494
3521.36925.379
361.0172.043
3710.82612.178
3879.47481.756
390.4661.686
401.5264.418
410.9461.544
420.3850.972
433.8165.118
443.3205.727
455.8907.475
464.0125.758
4710.87312.565
481.5543.780
493.2859.361
501.9424.884
513.0065.080
520.8941.508
530.7411.818
544.9908.529
550.7241.403
565.0367.167
576.1098.491
583.4304.521
5926.50927.612
607.2579.501
613.0566.141
622.7794.038
631.8794.765
6411.70428.918
653.9205.362
661.6298.578
671382.2951379.958
683.0455.198
697.11110.689
705.9926.805
713.5606.693
7238.11339.607
731.7992.840
7432.27933.930
7521.56529.151
761.5504.514
773.9178.667
781392.9381431.461
7918.72721.594
806.43313.057
819.88911.282
8228.27029.588
834.0685.807
841.8073.201
852.4335.647
860.6841.599
8763.43864.373
8824.79827.863
890.9271.940
900.6482.492
911.9763.140
920.4142.424
930.3721.183
9432.12135.912
95152.090152.829
960.7651.766
978.6289.905
981.4433.888
996.1897.854

referencesReferences

[AL19] TPC Benchmark DS Full Disclosure Report, Scale Factor 10,000, Alibaba, http://www.tpc.org/results/fdr/tpcds/alibaba~tpcds~alibaba_cloud_e-mapreduce~fdr~2019-03-19~v01.pdf, March 19, 2019.

[AL19b] TPC Benchmark DS Full Disclosure Report, Scale Factor 10,000, Alibaba Cloud AnalyticDB, http://www.tpc.org/results/individual_results/alibaba/alibaba~tpcds~alibaba_cloud_analyticdb~es~2019-04-26~v01.pdf, April 26, 2019.

[DB17] J. Sompolski and R. Xin, Databricks, Benchmarking Big Data SQL Platforms in the Cloud, https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html, 2017.

[Cis19] TPC Benchmark DS Full Disclosure Report, Scale Factor 10,000, Cisco, http://www.tpc.org/results/fdr/tpcds/cisco~tpcds~cisco_ucs_integrated_infrastructure_for_big_data_~fdr~2018-03-05~v01.pdf, March 5, 2018.

[Clo17] Apache Impala Leads Traditional Analytic Database, Cloudera, https://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/, April 25, 2017.

[GO18] Data Warehouse in the Cloud Benchmark, https://gigaom.com/report/data-warehouse-in-the-cloud-benchmark/, 2018.

[GO19] Data Warehouse Cloud Benchmark, GigaOm, https://gigaom.com/report/data-warehouse-cloud-benchmark/, 2019.

[MS16] SQL Server 30T TPC-H Result, 2016.

[MS17] SQL Server 10T TPC-H Result, http://www.tpc.org/results/individual_results/lenovo/lenovo~tpch~10000~lenovo_thinksystem_sr950~es~2017-07-09~v01.pdf, 2017.


Share