What’s up with the random pictures?

While I enjoy the “hunt” for cute or pithy pictures that complement a catchy blog title, they really don’t add any value to the post, do they?  Nor does “themed” stock photography or artwork, at least for me.  Yeah, you feel “clever” for having paired the post and the image, and it probably looks a bit slicker, ok.

Instead, I thought I’d highlight photos I like — including those I’ve taken of things or places I love, or those that have been taken by others willing to have me show and reference their work.  And I don’t have to worry about copyright infringement or royalties.

Click through the photos for full resolution (new window).

Schema Matters: Data Distribution and Ordering

I continue my side trip from my 5-part series through a second short stop. Here, I test a schema-optimized TPC-DS run on a 30-node dc2.8xlarge configuration -- the same configuration GigaOm specified in their original study.
This DC2 configuration shows the same 5.5x performance improvement when compared with the original GigaOm results, demonstrating the importance of schema and data ordering optimizations for query performance, and shedding at least a little light on the tradeoffs and use cases that could drive selection of one configuration over the other.

A Picture is Worth 1000 Words REDUX

Cross of the Martyrs Park, Santa Fe, NM

I'm taking a slight side trip from the original outline of my 5-part series on Performance Benchmarking and its use in DW Analytics Engineering and Maintenance activities.

Here I recreate the analysis of part 2 of my series, A Picture is Worth 1000 Words, but with updated information for Redshift using the RA3 instances with managed storage that were first announced in late 2019.  The difference is eye-opening.

 

Lies, Damned Lies, and Benchmarks

Part 1 of a 5-part series on Performance Benchmarking and its use in DW Analytics Engineering and Maintenance activities.

While formal database benchmarks with Full Disclosure Reports have fallen out of favor, vendors and practitioners alike are finding new and creative uses for benchmarking, including the old stalwart, TPC-DS. This is especially true in understanding how the new wave of DW/Analytics solutions — Cloud Data Warehouses, Spark and SQL-on-Hadoop, and GPU-accelerated SQL engines — augment traditional on-prem Data Warehouse appliances.

Part 1. Lies, Damned Lies, and Benchmarks

Benchmarking: It’s not just for breakfast anymore.

Announcing a 5-part series on Performance Benchmarking and its use in DW Analytics Engineering and Maintenance activities.

Benchmarks have long been used by vendors to highlight their products’ performance or price-performance advantages, and by enterprise developers and procurement teams to select an appropriate solution for their needs.

But Performance Benchmarking can be used for so much more

Series Introduction

Leap of Faith Redux

Well, maybe not so much a leap as a shove.

After 3 years at Yellowbrick Data in a variety of System Engineering and Performance Benchmarking roles, I’m back to the consulting lifestyle. Sad to be moving on after a 5+ year relationship that began with my technical due diligence and Seed/A-round investment recommendation while at Samsung, but I take with me a bunch of new “chops”, perspectives, and most importantly a set of relationships with a number of colleagues who have already been extremely supportive as I start a difficult transition.

So I’m very much looking forward to what lies ahead.

New Horizons

Moving forward, I’m planning to continue my work in scale out computing and storage architectures applied to “big and fast” data applications — scalable databases, analytics, and machine learning.

I’ll do an initial blog series on Performance Benchmarking in Data Warehousing Analytics environments, sharing some of the insights I’ve developed over the last 5 years of performance engineering work. I’m hoping to redirect some of my skills to big compute / big data challenges in life sciences, possibly in conjunction with COVID research. I spent a number of years working on genomics and proteomics “big data” applications with pioneers Affymetrix, Perlegen Sciences, Celera Genomics, and PE Biosciences, and am looking forward to returning to that sphere.

I also hope to refresh my teaching skills, and to re-engage in STEM mentoring with underprivileged youth.

(And yes, that’s my son again in that photo).

Leap of Faith

Well, I’ve done it. Left a perfectly good job with Samsung/Stellus to do the consulting thing again … at least for a while.

I’m excited about the new opportunities at the confluence of non-volatile memory solutions (NAND, NVDIMM, SCM), new scalable database and analytics and machine learning platforms, and an increasingly connected world (IoT) affording a myriad of opportunities with “big and fast” data. An opportunity to “rethink” IO.

And no, that’s not me in the photo, but my son who is an inspiration in pursuing his passion.

Scroll to top