Schema Matters: Data Distribution and Ordering

I continue my side trip from my 5-part series through a second short stop. Here, I test a schema-optimized TPC-DS run on a 30-node dc2.8xlarge configuration -- the same configuration GigaOm specified in their original study.
This DC2 configuration shows the same 5.5x performance improvement when compared with the original GigaOm results, demonstrating the importance of schema and data ordering optimizations for query performance, and shedding at least a little light on the tradeoffs and use cases that could drive selection of one configuration over the other.

A Picture is Worth 1000 Words REDUX

Cross of the Martyrs Park, Santa Fe, NM

I'm taking a slight side trip from the original outline of my 5-part series on Performance Benchmarking and its use in DW Analytics Engineering and Maintenance activities.

Here I recreate the analysis of part 2 of my series, A Picture is Worth 1000 Words, but with updated information for Redshift using the RA3 instances with managed storage that were first announced in late 2019.  The difference is eye-opening.

 

Lies, Damned Lies, and Benchmarks

Part 1 of a 5-part series on Performance Benchmarking and its use in DW Analytics Engineering and Maintenance activities.

While formal database benchmarks with Full Disclosure Reports have fallen out of favor, vendors and practitioners alike are finding new and creative uses for benchmarking, including the old stalwart, TPC-DS. This is especially true in understanding how the new wave of DW/Analytics solutions — Cloud Data Warehouses, Spark and SQL-on-Hadoop, and GPU-accelerated SQL engines — augment traditional on-prem Data Warehouse appliances.

Part 1. Lies, Damned Lies, and Benchmarks

Scroll to top