Skip to main content

Posts

Service for refreshing Excel PowerPivot files

I've just released a service which assists in refreshing Excel PowerPivot reports. Normally you need SharePoint in order to refresh server-side excel reports. Businesses that don't want to adopt SharePoint, but do want to share Excel files now have a means of refreshing PowerPivot cubes automatically. Head on over to codeplex to find out more. Happy PowerPivotting.

Validating RSA ID Numbers in SQL

In this post we’ll explore using SQLCLR integration to provide validation of South African Identity numbers – with the focus being slightly more on the ins and outs of CLR integration, than on how RSA ID numbers get validated. If you’re after the validation algorithm, look here . So what’s on the slab?... This solution makes RSA ID number validation possible by exposing a C# table valued function (TVF) to SQL server. The function takes an identity number as its input, and returns a single record with the following columns: IsValid                - bit: Whether the input identity number is valid. IsCitizen             - bit: Whether the number belongs to a South African citizen. Gender                - char: The gender of the individual - ‘M’ for male, ‘F’ for female. DateOfBirth        - date: The date of birth of the indivi...

Record Versions in ETL staging areas

Ever wanted to be able to use the OLEDB Destination component in SSIS to load staging tables and  have the underlying data versioned for you? If so, here's a solution which is working very well herein a production environment. Below are some scripts which implement automatic triggers on staging tables so that whenever a newly loaded record collides with the primary key of an existing record, the existing record is quietly moved out into an archive table. In practice, you create two schemas - Staging   - contains all staging tables Archive   - contains all archive (historic) tables When two tables with the same name are detected in these schemas, then triggers are placed onto the table in the Staging schema to handle moving old data into the archive record. The caveat here is that data is only preserved for columns where the column name and type is identical in both the staging and archive table. There are two scripts that make al...

Conforming Primary Key Names

It's nice to be able to predict the name of a primary key constraint based on the name of the table it constrains, but often times during the design of a database, we simply don't have time to worry about such things. Below is a quick T-SQL script which helps to modify all primary key constraints so that they are named according with the table they belong to... e.g.  the table   Dim.Sales will have it's primary key renamed to PK_Dim_Sales To use the script, paste it into management studio, then: press CTRL+T to have the output go to text press F5 to run the script Cut'n'Paste the query results into a new query window press F5 again to run the new query

SSIS Script for handling bad dates

Oracle and SSIS both have the ability to handle dates which SQL server's DateTime type won't accept. A reasonably simple solution is to leverage the script component to detect and repair the invalid dates... here's a step-by-step guide for doing exactly that... Add a script component to your data flow, in transform mode. Open the editor for the script component Under  Input Columns, select all the date columns you want to repair, and set their usage type to READ/WRITE (this is vital, else the script will break). Under Script , click the edit script button Select and delete all text Paste the C# code from here Close the source code window, click OK to dismass the script editor. Run the data flow. The script will efficiently detect date with bad years and attempt to correct them to something which SQL finds palettable. Give it a whirl and let me know what you think!

Loading data from an Oracle source

Building Business Intelligence using the Microsoft BI stack in an organization that has an Oracle based transactional system is not uncommon, and here I'll outline a couple of tips and tricks that should ease the building of SSIS packages in that type of environment.   Attunity Components Attunity offer several nice solutions for getitng data out of Oracle and into SQL - one of which is their CDC (Change Data Capture) offering . If you don't have the budget or stomach for setting up CDC in an oracle environment, then your next best bet is to use the free Attunity oracle connectors for SSIS , which demonstrate a measurable performance boost.   The Oracle Index and Order By Gotcha In previous posts I mention the performance benefits of loading sorted data into target tables. I'm currently loading data from Oracle 10, which exhibits a rather strange characteristic... Oracle does not use make us of an index for ordering data .  In other words, Oracle only uses ...

Using SSIS with SQL partitions

Last month I blogged about the virtues of having seperate full-load vs. incremental load packages when doing data warehousing. Well... that isn't the end-all of that argument... When you design your data warehouse, if you take table partitioning into account (which I highly recommend, since most BI data is temporally chronological), then you can usually get away with writting a single data flow task which achieves minimal logging even during incremental updates, by simply targetting a staging partition and then switching it in at the end of the data load. In one such data flow, I needed to be able to normalize the incomming data into three tables. The challenge comes in that you don't want to perform lookups on every record when you're doing a full load (since this slows the data flow down), but you do need to do those lookups to avoid duplicates during incremental loads. A great pattern for solving this to check whether the target table is empty before starting the data fl...