Mark White's BI Blog

Posts

Showing posts from August, 2010

Using SSIS with SQL partitions

Last month I blogged about the virtues of having seperate full-load vs. incremental load packages when doing data warehousing. Well... that isn't the end-all of that argument... When you design your data warehouse, if you take table partitioning into account (which I highly recommend, since most BI data is temporally chronological), then you can usually get away with writting a single data flow task which achieves minimal logging even during incremental updates, by simply targetting a staging partition and then switching it in at the end of the data load. In one such data flow, I needed to be able to normalize the incomming data into three tables. The challenge comes in that you don't want to perform lookups on every record when you're doing a full load (since this slows the data flow down), but you do need to do those lookups to avoid duplicates during incremental loads. A great pattern for solving this to check whether the target table is empty before starting the data fl...

Threading in SSIS Script component

Alberto Ferrari writes many good blogs on SQL and SSIS. In one of his old posts he mentions the presence of a bug in the SSIS script component, where the output buffer is marked as completed as soon as the ProcessInput method returns from the last row of the last buffer. This is not a bug , and if you think about it, it's actually correct for the stream to be marked as completed once the component returns from processing it. The reason for this is because when synchronous outputs are used, the output buffer is the input buffer - i.e. they're the same block of RAM, so when the component returns from processing the buffer, that buffer is passed on to the next down-stream component... that is... SSIS does not wait for other threads in your component to complete their work before passing the buffer on. Not only is this not a bug, but it's actually a performance booster and, in general, a really good thing. The solution to Alberto's problem is reasonably simple, and s...

Much Needed Improvements To SSIS

SSIS has been my mid-night mistress for months now, and at the risk of sounding off-color I've compiled a short list of things which ought to be changed in the nest release of SSIS: 1) The diagram auto-layout desperately needs to be rethought and replaced. 2) The Data Viewer should at least allow "live vs. page-at-a-time" modes, and showing bottle-neck indicators on pipelines would be at least as useful as showing the row count. 3) Manual tuning of buffers should either go away (replaced by automated solution), or at very least be augmented with debug helpers and visual feedback. 4) Lookup and SCD components need to be binned and replaced with components that add more flexibility in terms of how lookup data is retrieved, how matches are performed (range-value matching would be a great start), allow for multiple joins (lookup), enable sliding window lookups (as opposed to partial caching). 5) Exposing only the most useless component properties to the data-flow'...

Hiding user tables in SQL Management Studio

I spotted a question over at Stack Overflow asking whether it is possible to hide specific SQL Server user-tables from end users. Strictly speaking it is not, but there is a trick to get superfluous tables out of the way of the "Tables" node of the object explorer. While, on first glance, this may seem like poor practise, it turns out that there's a bona-fide case for it: Garbage tables used in table partitions. A data warehouse that I recently delivered makes extensive use of SQL's partitioned tables feature - which enabled a data-load design which does incremental fast-loads... something which simply isn't possible without partitions. The down-side of using partitions is that, in order to keep a fact table on-line whilst loading in new data, you effectively need three tables: The fact table A staging table, into which new data is loaded and prepared A garbage table into which the old partition of the fact table can be switched The garbage table is of absolutely...