Skip to main content

Revisiting Earned Premium


In a previous post about earned premium, I outlined how you could calculate a monetary value based on a period of time over which it was earned using DAX.

Serendipitously, the next day a colleague forwarded Alberto Ferrari's paper on understanding DAX query plans, and after giving it a thorough read I fired up the query profiler and set out to optimize our calculated measure for earned premium.

Alberto's paper details a performant solution to the classic events in progress problem, of which earned premium is a close cousin. My excitement at lazily shoplifting Alberto's work came to a grinding halt when I discovered that his 40ms Jedi solution only worked if data was queried at a specific granularity. This wasn't going to cut it... we need an earned premium measure that works at any level of aggregation. Back to the drawing board.

It turns out that much of Alberto's advice is (as always) really valuable. While I strongly recommend reading Alberto's paper, here's the cheat sheet for optimizing any DAX calculated measure:
  1. Help the Formula Engine (FE) to push the heavy lifting down to the Storage Engine (SE).
    FE is single threaded and non-caching, whereas SE is multithreaded and can cache results.
  2. Avoid complex / inequality predicates that cause SE to call back to the FE.
    This not only slows down data retrieval, but also prevents SE from caching results.
Our original  measure used inequality predicates in the expression:
     'Premium'[Start Date] <= LASTDATE('Date'[Date])
  && 'Premium'[End Date] >= FIRSTDATE('Date'[Date])

... which forces SE to callback to FE, slowing down the calculation somewhat.

To recap:
Earned Premium  = Amount Paid * Days In Current Period / Total Days of Cover

Which we can calculate as follows:
Earned Premium:=
SUMX (
 SUMMARIZE (
  'Premium',
  'Premium'[Start Date],
  'Premium'[End Date],
  "Earned Premium",
  SUM ('Premium'[Amount Paid] )        
  * COUNTROWS (                          
      CALCULATETABLE ('Date',
        KEEPFILTERS(
          DATESBETWEEN (
            'Date'[Date],'Premium'[Start Date], 'Premium'[End Date]
          )
        )
      )
    )
   / COUNTROWS (                        
      DATESBETWEEN (
        'Date'[Date], 'Premium'[Start Date], 'Premium'[End Date]
      )
    )
 ),
 [Earned Premium]
)


With this formula, we're able to calculate earned premium for 22 million records across 84 months in 2.2 seconds. Happy days.

To explain the calculation:
  • The SUMMARIZE function groups all premium by start & end date. 
  • We project "Earned Premium" as the sum of all premium for a distinct start & end period
  • The earned premium is multiplied by the number of days (records) in the distinct period that are also present in period currently being observed in the resulting cell. KEEPFILTERS applies this filtering for us - it effectively says "filter out days that aren't in the current month / quarter / year filtering being applied to the Date table" 
  • The result is divided by the number of days occurring in the distinct period regardless of filtering. 
  • Finally SUMX adds up all of the summarized values. 

Comments

Branko Veljovic said…
https://www150.statcan.gc.ca/n1/tbl/csv/13100767-eng.zip

Can you modify your script to pull and read zip file from web ?

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative. UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data: If you're not sure how to make this function available in your document, simply: Open up PowerQuery (either in Excel or in PowerBI) Create a new Blank Query. Open up the Advanced Editor  (found on the View tab in PowerBI). Copy-Paste the above code into the editor, then close the editor. In the properties window, rename the the function to  UnzipContents Usage Using the function is fairly straight forw

Power Query: Transforming YYYYMM dates (the quick way)

Accountants. Their unit of work seems to be the month, as if individual days don't exists, or don't count somehow. Nowhere is this better seen than in the notion of the accounting period , which all too often follows the form YYYYMM.  Try converting this directly into a date and Power Query starts making excuses faster than a kid with his hand caught in the cookie jar. The quick solution to this is to understand what Power Query's Table.TransformColumns does, and then leverage this knowledge to transform your YYYYMM values into proper date type columns. Table.TransformColumns As it's name suggests, this handy function allows you to convert the contents of a column from one type to another. The basic syntax is: = Table.TransformColumns( #"Your Source Table", { A list of tuples, specifying the columns and what functions to apply to each value} ) Lists {denoted by curly braces} are something you need to get comfortable with if you

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result? MoveTablesToFilegroup Click here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure. Hopefully the arguments are self explanatory, but here are some examples: 1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY: EXEC dbo.sp_MoveTablesToFileGroup @SchemaFilter = '%', -- chooses schemas using the LIKE operator @TableFilter  = '%', -- chooses tables using the LIKE operator @DataFileGroup = 'SECONDARY', -- The name of the filegroup to move index and in-row data to. @ClusteredIndexes = 1, --