Skip to main content

Using SSIS with SQL partitions

Last month I blogged about the virtues of having seperate full-load vs. incremental load packages when doing data warehousing. Well... that isn't the end-all of that argument...

When you design your data warehouse, if you take table partitioning into account (which I highly recommend, since most BI data is temporally chronological), then you can usually get away with writting a single data flow task which achieves minimal logging even during incremental updates, by simply targetting a staging partition and then switching it in at the end of the data load.

In one such data flow, I needed to be able to normalize the incomming data into three tables. The challenge comes in that you don't want to perform lookups on every record when you're doing a full load (since this slows the data flow down), but you do need to do those lookups to avoid duplicates during incremental loads.

A great pattern for solving this to check whether the target table is empty before starting the data flow. Below, the "Get MaxSaleID" task runs a SQL script to put the row-count of the target table into a variable named MaxSaleID...

 

... then in the data flow "Sales - Vouchers", a conditional split inspects the MaxSaleID variable, and re-routes rows to avoid the lookup if there is guaranteed to be no match from the lookup component...

Data Flow for full and incremental uploads

The "Scaffold Voucher Sales" destination bulk inserts into a staging table which is subseuently switched in onto the tail end of thew live table.

A quick note about partitions:  If you know that the source system which you are pulling data from will always append new records and will never/rarely update old records, then you can skip technologies like Change Data Capture or Change Tracking and simply re-import the last n-thousand records into a partition, swap out the old tail-partition and swap in the new one. In generaly this results in a far leaner, simpler package and process than would result from implementing complex data-change detection.

Comments

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative. UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data: If you're not sure how to make this function available in your document, simply: Open up PowerQuery (either in Excel or in PowerBI) Create a new Blank Query. Open up the Advanced Editor  (found on the View tab in PowerBI). Copy-Paste the above code into the editor, then close the editor. In the properties window, rename the the function to  UnzipContents Usage Using the function is fairly straight for...

Power Query: Transforming YYYYMM dates (the quick way)

Accountants. Their unit of work seems to be the month, as if individual days don't exists, or don't count somehow. Nowhere is this better seen than in the notion of the accounting period , which all too often follows the form YYYYMM.  Try converting this directly into a date and Power Query starts making excuses faster than a kid with his hand caught in the cookie jar. The quick solution to this is to understand what Power Query's Table.TransformColumns does, and then leverage this knowledge to transform your YYYYMM values into proper date type columns. Table.TransformColumns As it's name suggests, this handy function allows you to convert the contents of a column from one type to another. The basic syntax is: = Table.TransformColumns( #"Your Source Table", { A list of tuples, specifying the columns and what functions to apply to each value} ) Lists {denoted by curly braces} are something you need to get comfortable with if you...

Loading data from an Oracle source

Building Business Intelligence using the Microsoft BI stack in an organization that has an Oracle based transactional system is not uncommon, and here I'll outline a couple of tips and tricks that should ease the building of SSIS packages in that type of environment.   Attunity Components Attunity offer several nice solutions for getitng data out of Oracle and into SQL - one of which is their CDC (Change Data Capture) offering . If you don't have the budget or stomach for setting up CDC in an oracle environment, then your next best bet is to use the free Attunity oracle connectors for SSIS , which demonstrate a measurable performance boost.   The Oracle Index and Order By Gotcha In previous posts I mention the performance benefits of loading sorted data into target tables. I'm currently loading data from Oracle 10, which exhibits a rather strange characteristic... Oracle does not use make us of an index for ordering data .  In other words, Oracle only uses ...