Now that we are into our final stage of development for Harvest Data Warehouse, I can now build a new Excel PowerPivot spreadsheet. I was able to replicate some earlier work I had done in PowerPivot in mere minutes. It even nicely added related data from MUMD data (Mark Up / Mark Down), where Walmart has specifically changed the price of the product.
My problem was, even though the data related through the Walmart item number, the sales data and the MUMD data couldn’t relate on the timeframe I was choosing. So I had to setup two different slicers. I looked for ways to sync the slicers, and hide one of them, however I did not like this approach at all. It wouldn’t be user friendly.
I already had in my mind that these multi column primary keys in my data tables weren’t going to work, either in SQLDW or PowerPivot, but I wasn’t sure how to solve it. Enter in Hash encryption. Armed with these two blog posts I rewrote all the table DDL and dynamic scripting to include hash columns.
I created a main hash key that a majority of my joins will be on, and other hash keys that I might do joins on. I tested various encryptions, but with SQLDW it consistently appeared that an MD5 hash was the fastest. However the 1st blog post listed, mentions that SHA gives you the best distribution. I’ll be doing testing over the next few days to see how well joins perform in both SQLDW and PowerPivot on the hash.