This is Part 1 of a 5-part series on DW/Analytics Performance Benchmarking and its use in non-traditional ways, including DW/Analytics Engineering and Maintenance.
Part 1. Background
Riffing on an old quote attributable (though apparently not originally) to Mark Twain [1] …
There are three kinds of lies: lies, damned lies, and benchmarks [sic]
We’ve all seen one-upmanship and “fastest” claims from vendors in the computer technology space using industry-standard benchmarks. From the fastest computers, to the fastest storage systems, to the fastest database systems, there are numerous blogs and papers touting the dominance of one platform or approach over another. In some cases, usually when it suits them, vendors adhere to benchmark specifications and standard reporting requirements. In others, only portions of benchmarks are run and often selectively referenced and co-opted for the desired message. Such is the state of our industry and it will likely not change.
This is the case in the world of Data Warehousing and Analytics. The Transaction Processing Council (TPC) arose out of a need for standard performance metrics and well-defined configuration and pricing guidelines across a number of database vendors and platform offerings. Initially focused on OLTP systems, the organization broadened its work to data warehouse/analytics workloads and other related IT infrastructure areas.
Benchmark class | Benchmarks and references |
CPU, server, and system compute performance | SPEC, HPC top500 |
Storage system performance | SPC, specSFS, HPC IO-500 |
OLTP database performance | TPC-C, TPC-E |
Ad hoc/decision support database performance | TPC-H, TPC-DS |
Big Data analytics and data integration | TPCx-HS, TPCx-BB, TPC-DI |
Virtualization and hyper-converged infrastructure | TPC-VMS, TPCx-HCI, TPCx-V |
In the Data Warehousing and Analytics space, TPC-DS is probably the most frequently referenced benchmark. But despite it having been around since 2011, there are very few audited Full Disclosure Reports – only 9 in total, all on generic scale-out hardware platforms and Chinese software platforms, and 7 of those by a single organization:
TPC-DS Submitter | Database / platform |
Alibaba (3) | Cloud Analytic Database |
Alibaba (4) | E-MapReduce (a Spark/Hadoop offering) |
Supermicro (1) | Transwarp high performance Data Mart for Online Analysis Version 1.2.1 |
Cisco (1) | UCS Transwarp Data Hub Professional Version 5.1 |
None of the proverbial DW leaders (Teradata, IBM/Netezza, Oracle, and SAP) nor recent Cloud DW offerings (Amazon Redshift, Snowflake, Azure Synapse, Google BigQuery) have published TPC-DS results, yet the blogosphere is ripe with TPC-DS-like benchmark results that use standard TPC-DS datasets and queries. And these often use unspecified platform-specific optimizations or query rewrites that are precluded for standard-compliant submissions.
This is certainly understandable. TPC-DS Full Disclosure Reports are weighty and costly efforts, and without critical mass in submitted results, most vendors don’t see the value in those formal submissions. And the formalism of the full process can be an impediment for products aimed at a portion of the end-to-end DW workflow, e.g., query acceleration.
TPC-DS-like benchmarks
Enter TPC-DS-like benchmarks. These are benchmark results based, sometimes with significant poetic license, on the TPC-DS schema, datasets, and query set. There are numerous such reports in the blogosphere, many touting the results of a single platform and/or comparing successive versions of that platform to one another.
In general, it is the relaxed formalism of "-like" that makes these published TPC-DS results so difficult to compare. Differences in schema, ordering of data, and different query rewrites can have significant effects on performance. And while there have been a number of attempts at more structured comparative studies, especially in the SQL-on-hadoop world, these are often limited in scope: using smaller datasets, a subset of queries and/or rewritten queries, and modified schema.
Consider Fivetran's published comparison of Amazon Redshift, Snowflake, Azure Synapse (then called SQLWarehouse), Presto, and Google BigQuery using TPC-DS scale factor (sf) 100 and 1000 datasets (100GB and 1TB, respectively) -- datasets small enough to fit in a single server’s memory these days with typical 3:1 compression rates.
And Datamonad has published comparative SQL-on-hadoop benchmark results for MR3, a purportedly simpler, more efficient alternative to Hive/LLAP on Hadoop/Kubernetes infrastructure with sf 1000, 3000, and 10000 (1TB, 3TB, and 10 TB) database sizes, again configuring the underlying clusters with sufficient aggregate memory to fit the entire database in memory of a relatively small compute cluster.
Moreover, vendor-provided TPC-DS-like summaries almost universally avoid reporting the Load Test, the Throughput Test, and the Data Maintenance Test – three components of the original spec with significant relevance to important operational considerations around data loading/unloading, data ordering/vacuuming, concurrency and workload management, and data maintenance.
The overwhelming result is that TPC-DS continues to have diminishing utility in fulfilling its initial role as a comparative solution screening tool for procurement. So should we simply let it wither and die?
TPC-DS is Dead; Long Live TPC-DS
The answer is a definitive no. The formal TPC-DS process may be too onerous to encourage consistent and wide support for published Full Disclosure results. However, the benchmark itself includes some very useful tools for assessing the capabilities of a Data Warehouse / Analytics solution. And TPC-DS remains a representative test for analytics based on dimensional modeling as presented in Ralph Kimball's The Data Warehouse Toolkit, Third Edition. Dimensional modeling is a space-efficient, performant, and powerful modeling technique that supports BI/Analytics needs across a wide range of application areas. We can give TPC-DS new life with an expanded testing and comparative analysis methodology, and can extend that to better represent customer workloads.
In the remainder of this series, I will highlight the use of the published TPC-DS tools available with the spec in a cross-platform testing and analysis methodology that I’ve been using over the past couple of years. This approach has proven valuable not only in typical System A vs. System B comparatives (e.g. competitive scenarios), but also in a number of common development and maintenance scenarios – including evaluating and improving the effectiveness of:
- different schema variants (datatypes, distribution, cluster/sort “indexes”, and dimensional refactoring)
- different data loading and ordering approaches (manual and system-supported, e.g. vacuum); and
- different query rewrite strategies (manual or directive-based)
The effect of such changes is often directly assessed by looking at selective query runtimes, perhaps coupled with examination of query EXPLAIN plans. But the broader impact - on a large set of queries run concurrently by a number of users, or on operational processes (ETL) or new, realtime ingest flows may be far less well understood.
A more formal approach to performance benchmarking, coupled with well defined configuration management, can provide the necessary framework for evaluating and improving the effectiveness of a deployment, for both existing and future analytics requirements.
Over the next four parts of the series, we will explore this “whole workload” focus on some of these areas, coupled with the use of some fairly straightforward analysis and visualization techniques in Excel and R. Together, these can deliver some powerful insights across various levels of the DW/Analytics "stack" for a range of constituencies, from procurement, to warehouse development and maintenance, and analytics. And even for those involved in DW/Analytics product development.
I appreciate the table of Benchmarks and references – as well as your insights and perspectives. I look forward to reading the remainder of the series!
Mike Gluck