Skip to main content

Utility Dimensions in Tabular / PowerPivot

As my team having been porting a truck-load of hefty Excel reports over to PowerPivot, we've realized just how brilliant PowerPivot is in terms of fast turn-around time analytics. Unfortunately, Tabular doesn't have the maturity of a traditional OLAP engine - and one area where this is most evident is when working with time.

When source data presents fact records with multiple Date columns, a common trick used to avoid having multiple time dimensions is to implement a utility dimension, so that facts are filtered by a single date dimension - but the date column used for filtering is selected by the utility dimension.

PowerPivot/Tabular does not differentiate facts from dimensions, which creates something of a conundrum for report writers... how do you present an analytics model to end users without creating dozens of date dimensions?

In a nutshell, there are pretty much just 2 routes you can follow.... you either:

  1. UNPIVOT your fact data when populating the tabular model, or
  2. use DAX to implement "date selector aware" measures.
Let's explore both options...

UNPIVOTing facts


The idea here is to repeat each fact row for each date column that you want to select on.
Consider AdventureWorks' SalesHeader table:

OrderDate, DueDate and ShipDate are the 3 columns we want to select and filter on.
To achieve this, we repeat each record for each of these 3 columns, by using T-SQL's UNPIVOT operator:

Notice that we combine the name of the date column along with the date value to form a Date Key.
Because Date Key contains the name of the column under which the date value was found, we have the foundation for filtering dates that originated in different columns. Once the data is pivoted this way we bring it into PowerPivot as the Sales Orders table.

In order to filter Sales Orders, we'll need another three tables:
  • Date
    A stock-standard "date dimension" to represent the hierarchy of years, months and dates.
  • Date Type
    Contains 3 rows to present a selection of which column any Date filtering should be applied on:  Order Date, Ship Date, Due Date.
  • Date Filter
    This table is cross-join of Date Type and Date, bridging any combination of Dates + Date Type to a Date Key. 

The PowerPivot engine always implements filtering as "parent to child", meaning Date Type and Date impose filtering onto the Date Filter table. The Date Filter in turn imposes filtering down onto the Sales Orders table via the Date Key column.

Using DAX

The other method of achieving the same result is by implementing all measures as DAX calculations.
The following calculation does the job of filtering total sales based on the date and date type selected:

The above is hideous to read and hard to maintain, but it does get the job done without having to transform the source data.

Weighing up the options

There are some pros and cons of the above techniques:
UNPIVOT
DAX
Winner
Performs well because the formula engine can apply filters when retrieving rows.
Performs slower because all rows are traversed for each cell being calculated.
UNPIVOT
Easy to build & maintain.
Requires working knowledge of DAX. Complexity increases with every additional date column.
UNPIVOT
Produces incorrect results of no, all or multiple “Date Types” are selected.
DAX calculations can detect invalid filtering criteria and return more appropriate results.
DAX
Fact rows are multiplied by the number of date columns made selectable.
Fact rows retain original granularity, improving the reusability of the tabular data.
DAX

The importance of performance can't be overstated on large data sets - DAX calculations quickly slow down when the formula engine can't apply adequate filtering when retrieving rows from the storage engine.

If you're working with smaller data sets, then UNPIVOT has a clear disadvantage of producing wildly incorrect results when the users clears the filtering on Date Type.

My advice:  If you understand and can optimize DAX and you're dealing with mildly brain-dead  users, then implement the filters as DAX calculations. If, on the other hand, you are relatively new to PowerPivot, then go the UNPIVOT route - its faster and easier.

Download:   UNPIVOT version    vs.   DAX version



2 comments

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative.
UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data:



If you're not sure how to make this function available in your document, simply:

Open up PowerQuery (either in Excel or in PowerBI)Create a new Blank Query.Open up the Advanced Editor  (found on the View tab in PowerBI).Copy-Paste the above code into the editor, then close the editor.In the properties window, rename the the function to UnzipContents Usage Using the function is fairly straight forward: Choose "New Quer…

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result?

MoveTablesToFilegroupClick here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure.
Hopefully the arguments are self explanatory, but here are some examples:

1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY:
EXEC dbo.sp_MoveTablesToFileGroup
@SchemaFilter = '%',-- chooses schemas using the LIKE operator
@TableFilter  = '%',-- chooses tables using the LIKE operator
@DataFileGroup = 'SECONDARY',-- The name of the filegroup to move index and in-row data to.
@ClusteredIndexes = 1,-- 1 means "Move all clustered inde…

SQL Server vs Azure Data Warehouse vs Netezza vs Redshift

The Great Database Shoot Out In Jan'17, I teamed up with Marc van der Zon (Data Scientist), to test and compare several database technologies for our organization's future analytics and BI platform. The technologies that made the shortlist were:
SQL Server, because it is the organization's existing BI platform.Azure Data Warehouse, because of its high similarity to SQL Server.Amazon's Redshift, because of its attractive cost, scalability and performance.Netezza, because it is anaffordable on-prem appliance that performs well. Considerations We were primarily looking for the best bang-for-buck option, so price and performance were viewed as more important than how feature complete any particular option was. That said, what we regarded as important included: A shallow learning curve for BI developers (i.e. no need for expert skills in order to get good performance)
Ability to re-use existing code  (i.e. ANSI-92 SQL compatibility)
Time-to-solution.   (i.e. does the platform …