Using SSIS with SQL partitions

Last month I blogged about the virtues of having seperate full-load vs. incremental load packages when doing data warehousing. Well... that isn't the end-all of that argument...

When you design your data warehouse, if you take table partitioning into account (which I highly recommend, since most BI data is temporally chronological), then you can usually get away with writting a single data flow task which achieves minimal logging even during incremental updates, by simply targetting a staging partition and then switching it in at the end of the data load.

In one such data flow, I needed to be able to normalize the incomming data into three tables. The challenge comes in that you don't want to perform lookups on every record when you're doing a full load (since this slows the data flow down), but you do need to do those lookups to avoid duplicates during incremental loads.

A great pattern for solving this to check whether the target table is empty before starting the data flow. Below, the "Get MaxSaleID" task runs a SQL script to put the row-count of the target table into a variable named MaxSaleID...

... then in the data flow "Sales - Vouchers", a conditional split inspects the MaxSaleID variable, and re-routes rows to avoid the lookup if there is guaranteed to be no match from the lookup component...

Data Flow for full and incremental uploads

The "Scaffold Voucher Sales" destination bulk inserts into a staging table which is subseuently switched in onto the tail end of thew live table.

A quick note about partitions: If you know that the source system which you are pulling data from will always append new records and will never/rarely update old records, then you can skip technologies like Change Data Capture or Change Tracking and simply re-import the last n-thousand records into a partition, swap out the old tail-partition and swap in the new one. In generaly this results in a far leaner, simpler package and process than would result from implementing complex data-change detection.

Mark White's BI Blog

Search This Blog

Using SSIS with SQL partitions

Labels

Comments

Popular posts from this blog

Reading Zip files in PowerQuery / M

Power Query: Transforming YYYYMM dates (the quick way)

Loading data from an Oracle source