Skip to main content

Record Versions in ETL staging areas

Ever wanted to be able to use the OLEDB Destination component in SSIS to load staging tables and have the underlying data versioned for you? If so, here's a solution which is working very well herein a production environment.


Below are some scripts which implement automatic triggers on staging tables so that whenever a newly loaded record collides with the primary key of an existing record, the existing record is quietly moved out into an archive table. In practice, you create two schemas -
  • Staging   - contains all staging tables
  • Archive   - contains all archive (historic) tables
When two tables with the same name are detected in these schemas, then triggers are placed onto the table in the Staging schema to handle moving old data into the archive record. The caveat here is that data is only preserved for columns where the column name and type is identical in both the staging and archive table.

There are two scripts that make all of this work:
  1. A stored procedure, responsible for generating triggers for a staging table.
    It generates two triggers:
    a) an INSTEAD OF trigger, which detects any key collisions and deletes the offending record/s, and
    b) an AFTER DELETE trigger, which moves deleted staging data into an identically named archive table.

    If you are comfortable with T-SQL, then you should consider modifying this script to meet your particular needs.
  2. A DDL trigger, which reacts to the creation or alteration of staging & archive tables by executing the aforementioned stored procedure. Effectively this established a "self healing" mechanism, where triggers are automatically recalculated in response to any changes to staging or archive tables.
Example - create two simple sales tables:

Example - simulate two data loads, where the source record has changed:

The above example, will result in one record being in the Staging table, and the older (first) record being moved into the Archive table. Notice how the Archive table introduces the column RecordRetired and uses it as part of the primary key. This allows multiple records with the same SaleID to exist in the archive.

The Nitty Gritty
OK - so by now you've got the point of what the scripts below do for you. Be sure to read the very bottom of this post for information on how to make use this solution in conjunction with SSIS.

Part 1: The Trigger Generating Stored Procedure

Part 2: The DDL Trigger to react to table changes



Using SSIS Data Flows
The really good news is that you can still use SSIS OLEDB Destination's FAST_LOAD mode when loading in data from source systems, which results in good performance.
Just always follow these two rules which using the OLEDB Destination component:


  1. Include FIRE_TRIGGERS in the FastLoadOptions property
  2. Keep the FastLoadMaxInsertCommitSize to something below 100,000


n


Comments

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative. UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data: If you're not sure how to make this function available in your document, simply: Open up PowerQuery (either in Excel or in PowerBI) Create a new Blank Query. Open up the Advanced Editor  (found on the View tab in PowerBI). Copy-Paste the above code into the editor, then close the editor. In the properties window, rename the the function to  UnzipContents Usage Using the function is fairly straight forw

Power Query: Transforming YYYYMM dates (the quick way)

Accountants. Their unit of work seems to be the month, as if individual days don't exists, or don't count somehow. Nowhere is this better seen than in the notion of the accounting period , which all too often follows the form YYYYMM.  Try converting this directly into a date and Power Query starts making excuses faster than a kid with his hand caught in the cookie jar. The quick solution to this is to understand what Power Query's Table.TransformColumns does, and then leverage this knowledge to transform your YYYYMM values into proper date type columns. Table.TransformColumns As it's name suggests, this handy function allows you to convert the contents of a column from one type to another. The basic syntax is: = Table.TransformColumns( #"Your Source Table", { A list of tuples, specifying the columns and what functions to apply to each value} ) Lists {denoted by curly braces} are something you need to get comfortable with if you

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result? MoveTablesToFilegroup Click here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure. Hopefully the arguments are self explanatory, but here are some examples: 1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY: EXEC dbo.sp_MoveTablesToFileGroup @SchemaFilter = '%', -- chooses schemas using the LIKE operator @TableFilter  = '%', -- chooses tables using the LIKE operator @DataFileGroup = 'SECONDARY', -- The name of the filegroup to move index and in-row data to. @ClusteredIndexes = 1, --