tag:blogger.com,1999:blog-59124062097403754182024-03-13T20:42:13.676+02:00Mark White's BI BlogBusiness Intelligence on Power BI, Azure and SQL ServerMark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-5912406209740375418.post-42257007623999534612017-07-19T17:58:00.001+02:002017-07-19T17:58:22.564+02:00Power Query: Transforming YYYYMM dates (the quick way)<div class="tr_bq">
Accountants.</div>
<br />
Their unit of work seems to be <i>the month, </i>as if individual days don't exists, or don't count somehow. Nowhere is this better seen than in the notion of the <i>accounting period</i>, which all too often follows the form YYYYMM. Try converting this directly into a date and Power Query starts making excuses faster than a kid with his hand caught in the cookie jar.<br />
<br />
The quick solution to this is to understand what Power Query's Table.TransformColumns does, and then leverage this knowledge to transform your YYYYMM values into proper <i>date </i>type columns.<br />
<h3>
Table.TransformColumns</h3>
As it's name suggests, this handy function allows you to convert the contents of a column from one type to another. The basic syntax is:<br />
<blockquote>
= Table.TransformColumns(<br /> #"Your Source Table",<br /> { A list of tuples, specifying the columns and what functions to apply to each value} )</blockquote>
Lists {denoted by curly braces} are something you need to get comfortable with if you want to harness the under-the-hood power of the M language. The example below lists two tuples - for ReportedDate and LossDate respectively. On both columns, the function Date.From is invoked, which attempts to convert in input value to a <i>date </i>type.<br />
<blockquote>
= Table.TransformColumns( #"Your Source Table",{<br /> {"ReportedDate", Date.From}, {"LossDate", Date.From} })</blockquote>
This works fine for normal dates, but for YYYYMM dates we need to cook up our own function:<br />
<blockquote>
= Table.TransformColumns(#"Replaced Value",{<br /> { "Date",<br /> each Date.From(Text.From(_) & "01") }})</blockquote>
The secret here is in understanding what <i>each</i> does. <br />It is shorthand for creating a nameless (or <i>anonymous</i>) function, where the underscore character represents the single parameter/argument being passed in by the caller.<br />
<br />
The example above effectively says: "For <b>each</b> value in the <b>Date</b> column, convert the value to text (<i>Text.From</i>) then concatenate the string 01 (to create a text value of the form YYYYMM01) , then convert that text to a <i>date</i> type (<i>Date.From</i>) and put the result back into the Date column."<br />
<br />
Give it a try!<br />
<br />
<br />Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com2tag:blogger.com,1999:blog-5912406209740375418.post-53342567471321629102017-07-19T17:35:00.002+02:002017-07-19T17:35:26.654+02:00Power Query: Converting Monetary Values<div class="tr_bq">
Power Query has quickly become my favorite data transformation tool because of its fluid, interactive approach to describing data transformation. It isn't just easier to use that traditional ETL tools, it's also significantly more powerful and is a lot more forgiving. Perhaps too forgiving...</div>
<br />
Recently I worked on a solution for processing disparate spreadsheets that we receive from our partners. In theory it's simple enough: Combine the data tables, clean up the data, spit out the result.<br />
<br />
The problem came in when we did checked the resulting data, and discovered that the monetary totals were way out of whack, and Power Query didn't report any errors in the data it had processed.<br />
<br />
Consider the M query below, that imports the source data from a spreadsheet - take a close look at the ClaimAmount and VATOnClaimAmount columns. See anything wrong?<br /><br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjH2Vrtbmh-5tHm59dg15Gk2Hrgyw39F1_ZHfxRg5RgwVi2faBm80_KOLE90FGDI_Sb026iGTJVNC-13Q3xyRfrgUdqDba4qmkINy2GF5dXEAdRS0GExbCCrgyEdLm4cxTzGxDaufWcsaU/s1600/original+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="393" data-original-width="586" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjH2Vrtbmh-5tHm59dg15Gk2Hrgyw39F1_ZHfxRg5RgwVi2faBm80_KOLE90FGDI_Sb026iGTJVNC-13Q3xyRfrgUdqDba4qmkINy2GF5dXEAdRS0GExbCCrgyEdLm4cxTzGxDaufWcsaU/s1600/original+data.png" /></a></div>
<br />
At first, the values appear to be normal monetary values, but on closer inspection we see that the value 131.58 is actually 131.58000000000001. In excel this would amount to nothing more than a harmless rounding error, but in Power Query this can cause undesired results. Notice the type hint at the left side of the ClaimAmount column heading - it shows that Power Query is treating the column as the <i>any</i> type.<br />
<br />
The problem comes in when you transform the columns to the decimal <i>number</i> type. Depending on locale settings, the value may be interpreted in any of the following ways:<br />
<br />
<ol>
<li>Period denotes a decimal place. <br />The resulting value is what we expected. Yay.<br /><br />or<br /></li>
<li>Period denotes a thousands separator.<br />The resulting value is in the trillions.</li>
</ol>
<br />
<br />
Thankfully, the solution is fairly simple: Specify a locale during column conversion operations.<br />The column transform operation that the Power Query GUI creates for you will look something like this:<br />
<blockquote class="tr_bq">
= Table.TransformColumnTypes(Source,{<br /> {"ClaimAmount", type number}, {"VATOnClaimAmount", type number}<br /> })</blockquote>
.. all you need to do is type in the locale at the tail of the TransformColumnTypes column:<br />
<blockquote>
= Table.TransformColumnTypes(Source,{<br /> {"ClaimAmount", type number}, {"VATOnClaimAmount", type number} }, "en-US")</blockquote>
... and voila! Conversions treat the period as a decimal point, not a thousands separator.<br />
<br />Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-38219308459056596782017-02-01T12:49:00.000+02:002017-02-10T07:24:31.896+02:00SQL Server vs Azure Data Warehouse vs Netezza vs Redshift<h2>
The Great Database Shoot Out</h2>
In Jan'17, I teamed up with Marc van der Zon (Data Scientist), to test and compare several database technologies for our organization's future analytics and BI platform. The technologies that made the shortlist were:<br>
<ul>
<li>SQL Server, because it is the organization's existing BI platform.</li>
<li>Azure Data Warehouse, because of its high similarity to SQL Server.</li>
<li>Amazon's Redshift, because of its attractive cost, scalability and performance.</li>
<li>Netezza, because it is an<span style="font-family: sans-serif;">affordable</span> on-prem appliance that performs well.</li>
</ul>
<div>
<h3>
Considerations</h3>
</div>
<div>
We were primarily looking for the best bang-for-buck option, so price and performance were viewed as more important than how feature complete any particular option was. That said, what we regarded as important included:</div>
<div>
<ul>
<li>A shallow learning curve for BI developers (i.e. no need for expert skills in order to get good performance)<br>
</li>
<li>Ability to re-use existing code (i.e. ANSI-92 SQL compatibility)<br>
</li>
<li>Time-to-solution. (i.e. does the platform help bypass IT department bureaucracy)</li>
</ul>
</div>
<div>
<h3>
Testing Methodology</h3>
Important to modern insurers are <i>rating factors</i> - qualities about the insured items that signify a change in the risk of insuring that item. Our test was based on taking 1.8 billion rating factors, presented in 5th normal form using 5 relational tables, and transform them into a series of 5 flattened tables. In short, it is a massive denormalizing exercise which tends to lend itself to traditional relational database technologies.</div>
<div>
<h3>
Contenders</h3>
</div>
<div>
In the IT department's corner was an on-prem Dell 970 server, packed with 4 Xeon E7 4th generation quad core processors, teamed with 600GB of error correcting RAM, riding an EMC ExtremeIO solid-state disk SAN on dual 8GB/s HBAs. Including software licenses, the ticket to ride is roughly $551K (ZAR 7,5 million), given that the ExtremeIO is a shared resource.<br>
<br>
In the Data Scientist's corner was the other on-prem offering - an IBM Netezza - a 32TB analytics appliance with a solid reputation for performance, and a comparatively reasonable price tag of roughly 88K (ZAR 1.2 million).<br>
<br>
Next up was Microsoft's cloud based Azure Data Warehouse - a massively parallel appliance which splits data and workloads across 60 <i>distributions, </i>and is provisioned using DWUs (Data Warehouse units) - which translate into the amount of RAM and query concurrency the appliance will yield.<br>
<br>
We tested Azure at several price/performance points:<br>
<ul>
<li>DWU 600 - costing ~$24k / yr </li>
<li>DWU 1200 - costing ~$53k / yr</li>
<li>DWU 2000 - costing ~$88k / yr</li>
</ul>
Last up was Amazon's Redshift - a truly scalable MPP database based on the Prostgress database engine, in which you choose both the size (CPU & RAM) and number of nodes (logical machines) onto you would like your data and workloads distributed, and pay accordingly for the resources provisioned. The configurations we tested were:<br>
<ul>
<li>1 Node - costing ~$9k/yr (ZAR 122K)</li>
<li>12 Nodes - costing ~$12k/yr (ZAR 161K)</li>
<li>32 nodes - costing ~$31K/yr (ZAR 429K)</li>
</ul>
For all of the above configurations, we based our pricing on the provisioning of 13TB data, with at least 3129 hours of up time per year.</div>
<div>
<h3>
Results</h3>
</div>
<div>
To graphically represent the results, we created two measures:</div>
<div>
<ol>
<li>Affordability - being the inverse of cost.</li>
<li>Productivity - being the inverse of the query performance.</li>
</ol>
</div>
<div>
When plotted onto a Cartesian plane, the differences between these technologies really become apparent:</div>
<div>
<br></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgTh8V9aiJXtUeAdzPWDtIk6H7jX2_v4sfXAvKzCrEXyxsnu7MQo2-JG1BKo-FI5-FXWtILZ1Ezz2_Yo-sjAeyC48VxaqNzGcypJ26WHoT2EocW6_Dr9BJHXoKbDcnHZyaNE9aReHe0L8/s1600/RedshiftComparison.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgTh8V9aiJXtUeAdzPWDtIk6H7jX2_v4sfXAvKzCrEXyxsnu7MQo2-JG1BKo-FI5-FXWtILZ1Ezz2_Yo-sjAeyC48VxaqNzGcypJ26WHoT2EocW6_Dr9BJHXoKbDcnHZyaNE9aReHe0L8/s1600/RedshiftComparison.png"></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br></div>
<div class="separator" style="clear: both; text-align: left;">
<br></div>
<div class="separator" style="clear: both; text-align: left;">
Both Azure, Netezza and Redshift utterly annihilate on-prem SQL Server in terms of value proposition, and all of them are hands-down a better choice for building out future BI platforms.</div>
<div class="separator" style="clear: both; text-align: left;">
<br></div>
<div class="separator" style="clear: both; text-align: left;">
For me, Amazon's offering really stood out both on the performance-to-price-point curve, as well as in it's features and ease of use. The ability to scale while remaining on-line sounds like a nice-to-have, but our experience with Azure quickly revealed that an inability to scale online is a productivity killer.</div>
<div class="separator" style="clear: both; text-align: center;">
<br></div>
<div>
<table><tbody>
<tr> <td width="100"></td> <td width="150"></td> <td><b>Query Time</b></td> </tr>
<tr> <td rowspan="3">On Prem<br>
<br></td> <td>Existing Server</td> <td>0:59:00</td> </tr>
<tr> <td>New Server</td> <td>0:43:40</td> </tr>
<tr> <td>Netezza</td> <td>0:04:50</td> </tr>
<tr> <td rowspan="3">Azure<br>
<br><br>
<br></td> <td>DWU600</td> <td>0:43:30</td> </tr>
<tr> <td>DWU1200</td> <td>0:15:25</td> </tr>
<tr> <td>DWU2400</td> <td>0:09:28</td> </tr>
<tr> <td rowspan="3">Redshift<br>
<br></td> <td>1 node</td> <td>0:36:09</td> </tr>
<tr> <td>12 nodes</td> <td>0:04:45</td> </tr>
<tr> <td>32 nodes</td> <td>0:01:47</td></tr>
</tbody></table>
<br>
<br></div>
<div>
<h3>
Notes</h3>
<div style="font-size: medium; font-weight: normal;">
We encountered several minor annoyances during the set up and testing of Azure Data Warehouse - sometimes it would take upwards of 30 minutes for the database to pause, so subsequently scaling operations (which take the database offline) would take inordinately long to complete. When running below DWU600, some of the queries would terminate with errors or else take upwards of 4 hours to run - suggesting that the Azure offering doesn't handle low resource conditions particularly well. There were also significant inconsistencies in the elapsed time required for some queries to complete when run at different times, possibly the result of either congestion or contention? Azure DW's Polybase also proved to be a comparatively fussy affair, with lots of trial and error necessary to get the 5 CSV files loaded into tables.<br>
<br></div>
<div style="font-size: medium; font-weight: normal;">
The testing of Netezza went smoothly and query times were predictable, as one might expect from a dedicated appliance. There wasn't lots of configuration necessary to get things going, and the loading of data was a reasonably simple affair. </div>
<div style="font-size: medium; font-weight: normal;">
<br></div>
<div style="font-size: medium; font-weight: normal;">
Amazon's Redshift has a lot going for it. The documentation is 1st class, and the follow-along tutorials can be completed in a matter of minutes each. Some of the features that really set Redhisft apart from competitors were:</div>
<div>
<ol>
<li><span style="font-size: small;"><span style="font-weight: normal;">It stays online when scaling up or down, and it scales reasonably quickly.<br>
</span></span></li>
<li><span style="font-size: small;"><span style="font-weight: normal;">It is an incredibly simple, single step process to load CSV data into Redshift.<br>
</span></span></li>
<li><span style="font-size: small;"><span style="font-weight: normal;">It analyses data as it is loaded and intelligently determines the best compression algorithm to use for storing the data.<br>
</span></span></li>
<li><span style="font-size: small;"><span style="font-weight: normal;">It allows lookup tables to be present, in their entirety, on every compute node.<br>
This significantly improves the performance of relational joins.</span></span></li>
</ol>
</div>
<div style="font-size: medium; font-weight: normal;">
<br></div>
<div style="font-size: medium; font-weight: normal;">
Finally, there's on-premises SQL Server machine, which the server team and DBAs say has many, many CPU cores, and gigabytes per second of awesome performance, and will be ready next Friday. Probably. After the latest service pack has been installed.<br>
<br></div>
<div style="font-size: medium; font-weight: normal;">
...It also has column store indexes. Did we mention how many CPU cores it has?</div>
<div style="font-size: medium; font-weight: normal;">
<br></div>
<h3>
Conclusion</h3>
With no specific query tuning, Amazon's Redshift demonstrated performance that we were unable to obtain on any of the other platforms at a comparable price point. With respect to get-it-done productivity it stands head and shoulders above of the competition.<br>
<br>
Netezza makes for an attractive alternative for use cases that require an on-premise platform, and its important to remember that large volumes of data can be shipped into and out of an on-premise server for free, whereas data charges do apply for both Microsoft and Amazon's cloud offerings.<br>
<br>
Seriously, though... if like me you've been a SQL Server stalwart most of your career, take Redshift for a spin. It'll knock your socks off!<br>
<br></div>
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-79443315869388981522016-06-01T11:29:00.003+02:002019-03-29T01:38:46.222+02:00Reading Zip files in PowerQuery / MBeing a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found <a href="http://www.excelandpowerbi.com/?p=155">this blog</a> which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative.<br />
<h3>
UnzipContents</h3>
The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data:<br />
<br />
<iframe src="https://pastebin.com/embed_iframe/KJ8MyAPb" style="border: none; height: 400px; width: 100%;"></iframe><br />
<br />
If you're not sure how to make this function available in your document, simply:<br />
<br />
<ol>
<li>Open up PowerQuery (either in Excel or in PowerBI)</li>
<li>Create a new Blank Query.</li>
<li>Open up the Advanced Editor (found on the View tab in PowerBI).</li>
<li>Copy-Paste the above code into the editor, then close the editor.</li>
<li>In the properties window, rename the the function to <b>UnzipContents</b></li>
</ol>
<h3>
Usage</h3>
<div>
Using the function is fairly straight forward:</div>
<div>
<ol>
<li>Choose "New Query -> Blank Query".</li>
<li>Open the advanced editor.</li>
<li>Paste the following:<br /><blockquote class="tr_bq">
<span style="font-family: "courier new" , "courier" , monospace;">let</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Source = File.Contents("</span><span style="color: red; font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">[Full path to your zip file]</span><span style="font-family: "courier new" , "courier" , monospace;">"),</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Files = UnzipContents(Source)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">in</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Files</span></blockquote>
</li>
<li>Close the advanced editor.</li>
<li>In the Applied Steps, click the configure cog to the right of Source.</li>
<li>Select your Zip file.</li>
</ol>
<div>
<br /></div>
<div>
At this point you should see a table listing every file that is contained in the zip file you selected, along with a Binary blob of the file's contents. For many file types, like CSV and Excel, you can simply click in the Binary field and PowerQuery will intelligently build all of the steps needed to expose the contents of the file.</div>
</div>
<div>
<br /></div>
<div>
Good luck! </div>
<br />
<blockquote class="tr_bq">
<br /></blockquote>
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.comtag:blogger.com,1999:blog-5912406209740375418.post-14514664874790572332014-07-29T15:32:00.001+02:002014-07-29T16:05:19.726+02:00Virtuous Designs for Tabular Data ModellingMuch of the beauty of Microsoft's Tabular model is the apparent ability to escape weeks of star-schema modelling that are common place in OLAP cubes. While tabular can be blazingly fast both to develop models for as well as to use, the performance of the Vertipaq engine varies massively depending on how you present your data to it.<br />
<br />
Below are several data modelling patterns you are likely to encounter:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSUmPB68k-MZkqcr_M_ymAHs-hkvul5kd9hBpvwV574V2xJ_m6Vp_yhkcxHYD-pSZhA5nJ6gHhGomd1BhXMlYGKL1HvxLv2jFCOgv78pNJ4J0Pugke6JllE6WQwRUrZvaArKmFiJ0YQBM/s1600/image001-770974.png"><img border="0" height="191" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSUmPB68k-MZkqcr_M_ymAHs-hkvul5kd9hBpvwV574V2xJ_m6Vp_yhkcxHYD-pSZhA5nJ6gHhGomd1BhXMlYGKL1HvxLv2jFCOgv78pNJ4J0Pugke6JllE6WQwRUrZvaArKmFiJ0YQBM/s320/image001-770974.png" width="400" /></a><br />
The Monolithic table design involves joining all source tables together into a single denormalized representation. Tabular is able to group / aggregate and filter rows easily in this model, so while care needs to be taken when writing DAX expressions, the resulting cube will perform well.<br />
Pros: <br />
<ul>
<li>Easy to get started. </li>
<li>Performs well.</li>
</ul>
Cons:<br />
<ul>
<li>DAX expressions trickier to write. </li>
<li>Cube loading times may suffer. </li>
<li>Only similar-grained data can be accommodated.</li>
</ul>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5CIVCcqHzX9HNw6aI3cnFuvAXxAKuNX2oEs5Jgxgj26OBRbiuMEXfxKzppBnH6xIy-A-c7v31q5Ud_7FptEBV_o_frljPQRzS4YBvBBV126BLvkRjbxFLHqh5SyT5NuaKHfi17357pWY/s1600/image002-773304.png"><img border="0" height="213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5CIVCcqHzX9HNw6aI3cnFuvAXxAKuNX2oEs5Jgxgj26OBRbiuMEXfxKzppBnH6xIy-A-c7v31q5Ud_7FptEBV_o_frljPQRzS4YBvBBV126BLvkRjbxFLHqh5SyT5NuaKHfi17357pWY/s320/image002-773304.png" width="400" /></a><br />
<br />
When facts are derived from disparate sources, a monolithic design is not practical. In this case, multiple fact tables can be conformed by presenting a façade of filtering tables. Unlike traditional OLAP dimensions, these table do not need to present surrogate keys – only the union of unique columns values that appear in the joined fact tables.<br />
<br />
Pros:<br />
<ul>
<li>Easy to implement.</li>
<li>Fact tables perform relatively well.</li>
</ul>
Cons:<br />
<ul>
<li>Quickly becomes messy due to all the small filter tables - User experience degraded.</li>
</ul>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjttG_a9CMDqyxNfzPusb5dsqDYaL5nLQso72KNxvfCqz-61cKMuG3O4c6amyUJWp0glEBFqCb4ACQFd-laE-5_G20-13AbSS7tSuEaBnLrYGIowp2HwIAy-vADhkIK2R9vbDzxJLWwRBI/s1600/image003-775591.png"><img border="0" height="141" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjttG_a9CMDqyxNfzPusb5dsqDYaL5nLQso72KNxvfCqz-61cKMuG3O4c6amyUJWp0glEBFqCb4ACQFd-laE-5_G20-13AbSS7tSuEaBnLrYGIowp2HwIAy-vADhkIK2R9vbDzxJLWwRBI/s320/image003-775591.png" width="400" /></a><br />
<br />
The end user experience can, in some cases, be improved by hiding facts behind a chain of filtering tables. When this chain of tables present behavior that is consistent with an end-user’s understanding of the business, the model becomes easier t consume.<br />
<br />
Such chains usually perform <i>moderately </i>when compared to other virtuous patterns, but can provide an ideal end-user experience.<br />
<br />
Pros:<br />
<ul>
<li>Balance between performance and end user experience.</li>
</ul>
Cons:<br />
<ul>
<li>Harder to implement – careful planning needed to figure out how to push each filtering column into just one place upstream of its facts.</li>
<li>Is only useful when calculations only rely on one fact table at a time – i.e. where the formula engine can convert the entire expression into an “inner joined” SQL statement.</li>
</ul>
<div>
<br /></div>
<div>
There are also several design patterns that, with few exceptions, produce poorer performance than their virtuous counterparts:</div>
<div>
<br /></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiZuScTzMGeXFUB3nW8z6WgLKdqzqxM08s-hfKVluEZAt2YVo0hRlNnpVt5_bE7BAOuAsUAV8ElSQntZXQXJ8x-22VJFu1r_vEBIDQnUqzcmb1Vn97cYWW9WOU_acC6H40Jef0avuKYrA/s1600/image004-777779.png"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiZuScTzMGeXFUB3nW8z6WgLKdqzqxM08s-hfKVluEZAt2YVo0hRlNnpVt5_bE7BAOuAsUAV8ElSQntZXQXJ8x-22VJFu1r_vEBIDQnUqzcmb1Vn97cYWW9WOU_acC6H40Jef0avuKYrA/s320/image004-777779.png" width="400" /></a><br />
<br />
Avoid designs where the same filter(able) columns appear in multiple places. Inevitably, business will ask a question that involve both facts tables. The filter can only be (easily) applied to one table or the other, but not both.<br />
Options:<br />
<ul>
<li>Add a common filtering table, or</li>
<li>Combine (union) the facts into a monolithic table.</li>
</ul>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg629bfjyfov_8cGvvZs8o3YuhniyDhOe40gLAcqUjz0OWL0LDc3yARuyLyUc489ejhyphenhyphen3EtOnH1USM7DW6a5gP-DjMP8uLp686jVxFaF17qqYO_i9_smwzesqkCkLZxvJTaSHCjK1ta0WE/s1600/image005-779689.png"><img border="0" height="221" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg629bfjyfov_8cGvvZs8o3YuhniyDhOe40gLAcqUjz0OWL0LDc3yARuyLyUc489ejhyphenhyphen3EtOnH1USM7DW6a5gP-DjMP8uLp686jVxFaF17qqYO_i9_smwzesqkCkLZxvJTaSHCjK1ta0WE/s320/image005-779689.png" width="400" /></a><br />
While many-to-many relationships are possible in analysis services, they rarely perform well since they are often bridges between large tables millions of rows.<br />
<br />
Alternatives to consider:<br />
<ul>
<li>Combine all of the fact tables.</li>
<li>Combine two of the fact tables into a child of the remaining table.</li>
</ul>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDxI73a7dmdUkFM7MmDQenDhsoduf5rYInWW31fsFvt0Kr3wAhB0qLDykjXtDYw9M6z9LFcBGTG73kQh4ghasjuwaV7tlLJl__yBgV9QdB6v-wVreTxUNzb99XUpVKH0Z1R5gMF6_WJRE/s1600/image006-782136.png"><img border="0" height="193" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDxI73a7dmdUkFM7MmDQenDhsoduf5rYInWW31fsFvt0Kr3wAhB0qLDykjXtDYw9M6z9LFcBGTG73kQh4ghasjuwaV7tlLJl__yBgV9QdB6v-wVreTxUNzb99XUpVKH0Z1R5gMF6_WJRE/s320/image006-782136.png" width="400" /></a> <br />
Patterns that rely on large (>1m rows) tables to facilitate joins between facts perform poorly.<br />
The reason for this is that the formula engine cannot resolve the joins in a single query to the storage engine, and so instead it:<br />
<ol>
<li>Uses the filter to query all keys from the large intermediate table.</li>
<li>Passes the keys back to the storage engine to get the first set of facts.</li>
</ol>
Characteristics of this pattern:<br />
<ul>
<li>Extremely slow measures (minutes instead of milliseconds)</li>
<li>Memory exhaustion failures.</li>
</ul>
Alternatives to consider:<br />
<ul>
<li>Combine the fact tables.</li>
<li>Create a third [combined] fact table.</li>
<li>Join the fact tables directly to the filtering table.</li>
</ul>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlQ8eqm-CFeeG2iDYxpLsv09DZziUk01KqfXms0oo-TLWqUia3fOUp3DACEKFVJYsVGwAxk0tw1KJahp3sjwZPqazG6M_qes1dHDIPaDdScXl_tBf2Sywp6a6YGgRY6UPOL-Kyquar2mk/s1600/image007-784165.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlQ8eqm-CFeeG2iDYxpLsv09DZziUk01KqfXms0oo-TLWqUia3fOUp3DACEKFVJYsVGwAxk0tw1KJahp3sjwZPqazG6M_qes1dHDIPaDdScXl_tBf2Sywp6a6YGgRY6UPOL-Kyquar2mk/s320/image007-784165.png" /></a><br />
For similar reasons to the “large intermediate table” design, using filtered child tables to filter parent tables can result in lack-lustre performance, stemming from how the formula engine resolves the query. <br />
Alternatives to consider:<br />
<ul>
<li>Link the parent table directly to the filtering of the small dimension.</li>
<li>Combine the child and parent table.</li>
</ul>
<!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F1.bp.blogspot.com%2F-qrJa0VwsmL4%2FU9eiiVWqiqI%2FAAAAAAAAQnM%2FK63AIHnYOCw%2Fs320%2Fimage004-777779.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiZuScTzMGeXFUB3nW8z6WgLKdqzqxM08s-hfKVluEZAt2YVo0hRlNnpVt5_bE7BAOuAsUAV8ElSQntZXQXJ8x-22VJFu1r_vEBIDQnUqzcmb1Vn97cYWW9WOU_acC6H40Jef0avuKYrA/s320/image004-777779.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F4.bp.blogspot.com%2F-_MzBcDQGSWc%2FU9eikFy2mXI%2FAAAAAAAAQnw%2FaMw9wR9xvgU%2Fs320%2Fimage007-784165.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlQ8eqm-CFeeG2iDYxpLsv09DZziUk01KqfXms0oo-TLWqUia3fOUp3DACEKFVJYsVGwAxk0tw1KJahp3sjwZPqazG6M_qes1dHDIPaDdScXl_tBf2Sywp6a6YGgRY6UPOL-Kyquar2mk/s320/image007-784165.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F2.bp.blogspot.com%2F-jl_AHrUhOF0%2FU9eihWwbb0I%2FAAAAAAAAQm0%2FqyAo2ncvqCk%2Fs320%2Fimage002-773304.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5CIVCcqHzX9HNw6aI3cnFuvAXxAKuNX2oEs5Jgxgj26OBRbiuMEXfxKzppBnH6xIy-A-c7v31q5Ud_7FptEBV_o_frljPQRzS4YBvBBV126BLvkRjbxFLHqh5SyT5NuaKHfi17357pWY/s320/image002-773304.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F1.bp.blogspot.com%2F-szHmwpD3TWs%2FU9eii888iUI%2FAAAAAAAAQnY%2FLP355ZO3uRY%2Fs320%2Fimage005-779689.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg629bfjyfov_8cGvvZs8o3YuhniyDhOe40gLAcqUjz0OWL0LDc3yARuyLyUc489ejhyphenhyphen3EtOnH1USM7DW6a5gP-DjMP8uLp686jVxFaF17qqYO_i9_smwzesqkCkLZxvJTaSHCjK1ta0WE/s320/image005-779689.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F4.bp.blogspot.com%2F-SE6lNC7RjnU%2FU9eig2OPlJI%2FAAAAAAAAQmo%2FcAUICHHXxIM%2Fs320%2Fimage001-770974.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSUmPB68k-MZkqcr_M_ymAHs-hkvul5kd9hBpvwV574V2xJ_m6Vp_yhkcxHYD-pSZhA5nJ6gHhGomd1BhXMlYGKL1HvxLv2jFCOgv78pNJ4J0Pugke6JllE6WQwRUrZvaArKmFiJ0YQBM/s320/image001-770974.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F1.bp.blogspot.com%2F-w8gxBziOYcc%2FU9eijls1bvI%2FAAAAAAAAQnk%2F-rot8wAGWz0%2Fs320%2Fimage006-782136.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDxI73a7dmdUkFM7MmDQenDhsoduf5rYInWW31fsFvt0Kr3wAhB0qLDykjXtDYw9M6z9LFcBGTG73kQh4ghasjuwaV7tlLJl__yBgV9QdB6v-wVreTxUNzb99XUpVKH0Z1R5gMF6_WJRE/s320/image006-782136.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F3.bp.blogspot.com%2F-VrZuJepIIJ4%2FU9eih3qCSgI%2FAAAAAAAAQnA%2FLgcoPh6eg_k%2Fs320%2Fimage003-775591.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjttG_a9CMDqyxNfzPusb5dsqDYaL5nLQso72KNxvfCqz-61cKMuG3O4c6amyUJWp0glEBFqCb4ACQFd-laE-5_G20-13AbSS7tSuEaBnLrYGIowp2HwIAy-vADhkIK2R9vbDzxJLWwRBI/s320/image003-775591.png" -->Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-35810096977107202192013-09-18T16:48:00.002+02:002014-08-21T10:49:36.224+02:00Cloning SQL tablesPlenty of folks have blogged about various techniques for cloning tables in SQL server, and for good reason... during data loading and data processing its very useful to be able to build one table while simultaneously reporting off of another. When the processing of the new table is completed, it can be switched in to replace the data of the old table.<br />
<br />
To simplify the creation of a build table, I've written a stored procedure which will take any table and clone it and its indexes:<br />
<br />
<iframe src="http://pastebin.com/embed_iframe.php?i=rgKdvQUk" style="border:none;width:100%;height:400px"></iframe><br />
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-545182539890249912013-07-19T16:46:00.002+02:002013-07-19T17:11:30.167+02:00Revisiting Earned Premium<br />
In a <a href="http://sql10.blogspot.com/2013/07/dax-and-insurances-earned-premium.html" target="_blank">previous post about earned premium</a>, I outlined how you could calculate a monetary value based on a period of time over which it was earned using DAX.<br />
<br />
Serendipitously, the next day a colleague forwarded Alberto Ferrari's paper on <a href="http://www.sqlbi.com/articles/understanding-dax-query-plans/" target="_blank">understanding DAX query plans</a>, and after giving it a thorough read I fired up the query profiler and set out to optimize our calculated measure for earned premium.<br />
<br />
Alberto's paper details a performant solution to the classic <i>events in progress</i> problem, of which <i>earned premium</i> is a close cousin. My excitement at lazily shoplifting Alberto's work came to a grinding halt when I discovered that his <i>40ms Jedi solution </i>only worked if data was queried at a specific granularity. This wasn't going to cut it... we need an earned premium measure that works at <i>any </i>level of aggregation. Back to the drawing board.<br />
<br />
It turns out that much of Alberto's advice is (as always) really valuable. While I strongly recommend reading Alberto's paper, here's the cheat sheet for optimizing any DAX calculated measure:<br />
<ol>
<li>Help the Formula Engine (FE) to push the heavy lifting down to the Storage Engine (SE).<br />
FE is single threaded and non-caching, whereas SE is multithreaded and can cache results.<br />
</li>
<li>Avoid complex / inequality predicates that cause SE to call back to the FE.<br />
This not only slows down data retrieval, but also prevents SE from caching results.</li>
</ol>
<div>
Our original measure used inequality predicates in the expression:</div>
<div>
<span style="background-color: white; color: #222222; font-family: monospace; font-size: 13px; line-height: 18px;"> 'Premium'[Start Date] <= LASTDATE('Date'[Date]) <br />
&& 'Premium'[End Date] >= FIRSTDATE('Date'[Date])</span></div>
<div>
<br /></div>
<div>
... which forces SE to callback to FE, slowing down the calculation somewhat.</div>
<div>
<br /></div>
<div>
To recap:<br />
<span style="background-color: white; color: #222222; font-family: monospace; font-size: 13px; line-height: 18px;">Earned Premium = Amount Paid * Days In Current Period / Total Days of Cover</span></div>
<br />
Which we can calculate as follows:<br />
<code>Earned Premium:=<br />
SUMX (<br />
SUMMARIZE ( <br />
'Premium',<br />
'Premium'[Start Date],<br />
'Premium'[End Date],<br />
"Earned Premium", <br />
SUM ('Premium'[Amount Paid] ) <br />
* COUNTROWS ( <br />
CALCULATETABLE ('Date', <br />
KEEPFILTERS( <br />
DATESBETWEEN ( <br />
'Date'[Date],'Premium'[Start Date], 'Premium'[End Date]<br />
)<br />
)<br />
)<br />
)<br />
/ COUNTROWS ( <br />
DATESBETWEEN (<br />
'Date'[Date], 'Premium'[Start Date], 'Premium'[End Date]<br />
)<br />
)<br />
),<br />
[Earned Premium]<br />
)</code><br />
<br />
With this formula, we're able to calculate earned premium for 22 million records across 84 months in 2.2 seconds. Happy days.<br />
<br />
To explain the calculation:<br />
<ul>
<li>The SUMMARIZE function groups all premium by start & end date. </li>
<li>We project "Earned Premium" as the sum of all premium for a distinct start & end period</li>
<li>The earned premium is multiplied by the number of days (records) in the distinct period that are also present in period currently being observed in the resulting cell. KEEPFILTERS applies this filtering for us - it effectively says "filter out days that aren't in the current month / quarter / year filtering being applied to the Date table" </li>
<li>The result is divided by the number of days occurring in the distinct period regardless of filtering. </li>
<li>Finally SUMX adds up all of the summarized values. </li>
</ul>
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com1tag:blogger.com,1999:blog-5912406209740375418.post-70385960165516122382013-07-15T19:21:00.001+02:002013-07-16T09:47:44.972+02:00DAX and Insurance's Earned Premium problem<p>In the world of short term insurance, "Earned Premium" is a common BI metric. In its simplest form, an amount of money is earned as time elapses through a period of cover. On any given day, you will have earned some, all or none of the premium paid.</p><p>Using DAX calculations, solving earned premium turns out to be both easy and efficient, and in this post I'll show you how to do it. Before you start, download and open <a href="https://sites.google.com/site/ssiskeygenerator/Earned%20Premium.xlsx" target="_blank">this Excel 2013 file</a>. Make sure that you have the PowerPivot add-in enabled in Excel.</p><p>If you open the PowerPivot cube, you'll find two tables, the first being a regular "Date" table used to represent the hierarchy of days, months, years etc. The second "Premium" table contains the data that's of interest, and to keep things simple, I've only included 4 columns:</p><ul><li>Product Line - describes a type of insurance product</li><li>Amount - the amount of premium paid by a policy holder for a given period of cover.</li><li>Start Date - the date when insurance cover starts for the premium paid.</li><li>End Date - the date after which insurance cover ends for the premium paid.</li></ul><p>The DAX calculation, <em>Earned Premium</em> is where it get interesting. The layman's calculation for earned premium is:</p><p><code><span style="white-space: pre;"> </span>Earned Premium = Amount Paid * Days In Current Period / Total Days of Cover</code></p><p>For example, if you paid $100 for 1 year of insurance, then in the month of March you will earn $8.49:</p><p><code><span style="white-space: pre;"> </span>Earned Premium = <strong>$100</strong> * <strong>31 </strong>(Days in March) <strong>/</strong> <strong>365 </strong>(Days of insurance purchased)</code></p><p>To move into the world of DAX, the above equation is pseudo coded as follows:</p><p><code>Earned Premium = SUM(<br /> Amount<br /> * [Days in Date table overlapping period of cover]<br /> / [Days in Date table for total period of cover]<br />)</code></p><h4>Step 1: Calculate Days in Total Period Of Cover</h4><p>The DAX technique for calculating the number of days in a period is to produce a table of dates that fall within a period using DATESBETWEEN, and then counting the number of rows:<br /><code><br />COUNTROWS( DATESBETWEEN( <em>Dates</em>, <em>Start Date</em>, <em>End Date</em> ) ) <br /></code></p><p><code>Filling in the actual table and column names, the DAX formula becomes:<br /></code></p><p><code>COUNTROWS ( DATESBETWEEN( 'Date'[Date], 'Premium'[Start Date], 'Premium'[End Date] ) ) </code></p><h4>Step 2: Calculate Days in Current Period </h4><p><code></p><div>Determining which days fall into the current period of cover requires checking for overlap between the period currently being calculated, and the total period of cover. In practice, this means using the latest of the two start dates and the earliest of the two end dates:</div><p><code>COUNTROWS (<br /> DATESBETWEEN (<br /> 'Date'[Date],<br /> IF(FIRSTDATE('Date'[Date]) > 'Premium'[Start Date], FIRSTDATE('Date'[Date]), 'Premium'[Start Date] ) ,<br /> IF(LASTDATE ('Date'[Date]) < 'Premium'[End Date] , LASTDATE ('Date'[Date]), 'Premium'[End Date] )<br /> )<br />)<br /></code></p><h4>Step 3: Optimizing the input data</h4><p>No relationship is defined between the Date and Premium tables - so to improve the performance of our calculation, we give DAX a way of quickly eliminating records that are not applicable to our calculation. The logic for doing this is:</p><ul><li>Filter out records where cover ends <em>before</em> the period we're calculating starts.</li><li>Filter out records where the cover starts <em>after</em> the period we're calculating ends.</li><li>Group the remaining records by start and end date, projecting the sum total of premium for the given period.</li></ul><p>In DAX this looks somewhat inside out...</p><p><code>FILTER (<br /> SUMMARIZE (<br /> 'Premium',<br /> 'Premium'[Start Date],<br /> 'Premium'[End Date],<br /> "EarnedPremium",<br /> SUM('Premium'[Amount])<br /> ),<br /> 'Premium'[Start Date] <= LASTDATE('Date'[Date]) && 'Premium'[End Date] >= FIRSTDATE('Date'[Date])<br />) <br /></code><br />In T-SQL, this might look as follows:</p><p><code>SELECT [Start Date], [End Date], [EarnedPremium] = SUM(Amount)<br />FROM Premium<br />GROUP BY [Start Date],[End Date]<br />HAVING [Start Date] < {Current Cell in Excel's Period End}<br />AND [End Date] > {Current Cell in Excel's Period Start}</code></p><p>This calculation efficiently reduces the number of times our earned premium expression needs to be run inside of the SUM statement. The completed DAX calculation then:</p><p><code>Earned Premium:=SUMX (<br /> FILTER (<br /> SUMMARIZE (<br /> 'Premium',<br /> 'Premium'[Start Date],<br /> 'Premium'[End Date],<br /> "EarnedPremium",<br /> SUM('Premium'[Amount])<br /> ),<br /> 'Premium'[Start Date] <= LASTDATE('Date'[Date]) && 'Premium'[End Date] >= FIRSTDATE('Date'[Date])<br /> ),<br /> [EarnedPremium]<br /> *<br /> COUNTROWS (<br /> DATESBETWEEN (<br /> 'Date'[Date],<br /> IF(FIRSTDATE('Date'[Date]) > 'Premium'[Start Date], FIRSTDATE('Date'[Date]), 'Premium'[Start Date] ) ,<br /> IF(LASTDATE ('Date'[Date]) < 'Premium'[End Date] , LASTDATE ('Date'[Date]), 'Premium'[End Date] )<br /> )<br /> ) /<br /> COUNTROWS ( DATESBETWEEN( 'Date'[Date], 'Premium'[Start Date], 'Premium'[End Date] ) )<br />)</code></p><p></code></p>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-26461280008047553842013-07-11T12:54:00.004+02:002016-10-04T13:40:10.232+02:00Easily Move SQL Tables between FilegroupsRecently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result?<br />
<br />
<h2>
MoveTablesToFilegroup</h2>
<div>
<a href="http://pastebin.com/raw.php?i=ZK3wXAqB">Click here</a> for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure.</div>
<br />
Hopefully the arguments are self explanatory, but here are some examples:<br />
<br />
1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY:<br />
<blockquote class="tr_bq">
EXEC dbo.sp_MoveTablesToFileGroup<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@SchemaFilter = '%',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- chooses schemas using the LIKE operator<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@TableFilter = '%',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- chooses tables using the LIKE operator<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@DataFileGroup = 'SECONDARY',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- The name of the filegroup to move index and in-row data to.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@ClusteredIndexes = 1,<span class="Apple-tab-span" style="white-space: pre;"> </span>-- 1 means "Move all clustered indexes" - i.e. table data where a primary key / clustered index exists<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@SecondaryIndexes = 1,<span class="Apple-tab-span" style="white-space: pre;"> </span>-- 1 means "Move all secondary indexes"<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@Heaps = 1,<span class="Apple-tab-span" style="white-space: pre;"> </span>-- 1 means "Move all heaps" - i.e. tables with no clustered index.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@ProduceScript = 1<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Don't move anything, just produce a T-SQL script</blockquote>
<br />
2. Produce a script to move LOBS to the LOB_DATA filegroup, and move table data to the SECONDARY filegroup, for tables in the TEST schema only:<br />
<blockquote class="tr_bq">
EXEC dbo.sp_MoveTablesToFileGroup<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@SchemaFilter = 'TEST',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Only tables in the TEST schema<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@TableFilter = '%',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- All tables<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@DataFileGroup = 'SECONDARY',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Move in-row data to SECONDARY<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@LobFileGroup = 'LOB_DATA',<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Move LOB data to LOB_DATA fg.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@ClusteredIndexes = 1,<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Move all clustered indexes<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@SecondaryIndexes = 0,<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Don't move all secondary indexes<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@Heaps = 0,<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Don't move tables with no PK<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>@ProduceScript = 1<span class="Apple-tab-span" style="white-space: pre;"> </span>-- Don't move anything, just produce a T-SQL script</blockquote>
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com4tag:blogger.com,1999:blog-5912406209740375418.post-46163136152365758412013-04-24T16:57:00.003+02:002013-07-15T21:59:15.260+02:00Utility Dimensions in Tabular / PowerPivotAs my team having been porting a truck-load of hefty Excel reports over to PowerPivot, we've realized just how brilliant PowerPivot is in terms of fast turn-around time analytics. Unfortunately, Tabular doesn't have the maturity of a traditional OLAP engine - and one area where this is most evident is when working with time.<br />
<br />
When source data presents fact records with multiple Date columns, a common trick used to avoid having multiple time dimensions is to implement a utility dimension, so that facts are filtered by a single date dimension - but the date column used for filtering is selected by the utility dimension.<br />
<br />
PowerPivot/Tabular does not differentiate facts from dimensions, which creates something of a conundrum for report writers... how do you present an analytics model to end users without creating dozens of date dimensions?<br />
<br />
In a nutshell, there are pretty much just 2 routes you can follow.... you either:<br />
<br />
<ol>
<li>UNPIVOT your fact data when populating the tabular model, or</li>
<li>use DAX to implement "date selector aware" measures.</li>
</ol>
<div>
Let's explore both options...</div>
<h3>
UNPIVOTing facts</h3>
<div>
<br /></div>
<div>
The idea here is to repeat each fact row for each date column that you want to select on.</div>
<div>
Consider AdventureWorks' SalesHeader table:</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFO6NMPgiyT2bPmHEKldf8ZjC1oooyIJqmIwlPlhbbdzSWaUCiQOzz_G0Xk_lH5rnnD3gftp9PeycOY1svyqVI1teO1UgQvuCcIDmGcgx7BQoq24W7FawxLne_pfkyvZ1eHrRvyt2ozCM/s1600/Sales+Header+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFO6NMPgiyT2bPmHEKldf8ZjC1oooyIJqmIwlPlhbbdzSWaUCiQOzz_G0Xk_lH5rnnD3gftp9PeycOY1svyqVI1teO1UgQvuCcIDmGcgx7BQoq24W7FawxLne_pfkyvZ1eHrRvyt2ozCM/s1600/Sales+Header+1.png" /></a></div>
<div>
<br /></div>
<div>
OrderDate, DueDate and ShipDate are the 3 columns we want to select and filter on.</div>
<div>
To achieve this, we repeat each record for each of these 3 columns, by using T-SQL's UNPIVOT operator:</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1ckLaDRacvki8Jv4QbI5QsSmsTjLpbhlhwMAYDCz5FOdlj3ymrGF4fmM85zWNgxw8rBPfBZ14wZ50wnTn4JKFHRiugn4Vb-82pp_JRpWg2cV-fDuH1bZsZZv7KYj0FDUTSAYs6AsK0h4/s1600/Sales+Header+Unpivotted.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1ckLaDRacvki8Jv4QbI5QsSmsTjLpbhlhwMAYDCz5FOdlj3ymrGF4fmM85zWNgxw8rBPfBZ14wZ50wnTn4JKFHRiugn4Vb-82pp_JRpWg2cV-fDuH1bZsZZv7KYj0FDUTSAYs6AsK0h4/s1600/Sales+Header+Unpivotted.png" /></a></div>
<div>
<br /></div>
<div>
Notice that we combine the <i>name</i> of the date column along with the date <i>value </i>to form a <b>Date Key</b>.</div>
<div>
Because Date Key contains the name of the column under which the date value was found, we have the foundation for filtering dates that originated in different columns. Once the data is pivoted this way we bring it into PowerPivot as the <i>Sales Orders</i> table.</div>
<div>
<br /></div>
<div>
In order to filter Sales Orders, we'll need another three tables:</div>
<div>
<ul>
<li><b><b>Date</b><br style="font-weight: normal;" /><span style="font-weight: normal;">A stock-standard "date dimension" to represent the hierarchy of years, months and dates.</span></b></li>
<li><b>Date Type</b><br />Contains 3 rows to present a selection of <i>which</i> column any Date filtering should be applied on: Order Date, Ship Date, Due Date.</li>
<li><b>Date Filter</b><br />This table is cross-join of Date Type and Date, bridging any combination of Dates + Date Type to a <i>Date Key. </i></li>
</ul>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaNWnSdM_fkxaz8WfMa08WVnNPTZT9vj-smxD3Zcz5qAtGTUPXkxgAQ6RAtTpkAjuDVtxz0YF7UzlAuLwuIwxdwgaumAFlOW0azmWyUpvdJsjOf8kUQqtQ3X3ucZczawP5rjcE36xGtkU/s1600/PowerPivot+ERD.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaNWnSdM_fkxaz8WfMa08WVnNPTZT9vj-smxD3Zcz5qAtGTUPXkxgAQ6RAtTpkAjuDVtxz0YF7UzlAuLwuIwxdwgaumAFlOW0azmWyUpvdJsjOf8kUQqtQ3X3ucZczawP5rjcE36xGtkU/s1600/PowerPivot+ERD.png" /></a></div>
<div>
<br /></div>
<div>
The PowerPivot engine always implements filtering as "parent to child", meaning <i>Date Type </i>and <i>Date</i> impose filtering onto the <i>Date Filter </i>table. The <i>Date Filter </i>in turn imposes filtering down onto the <i>Sales Orders</i> table via the Date Key column.</div>
<div>
<br /></div>
<h3>
Using DAX</h3>
<div>
The other method of achieving the same result is by implementing all measures as DAX calculations.</div>
<div>
The following calculation does the job of filtering total sales based on the date and date type selected:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9MM_URp0Kn9OaPlgNwvMvMqcrkwrZb5wRR2rWcvrFGJBSqkXoJG_hCOGopBkxxc2bnqMZjnAuA4gx3fVVFM6N6tRTMKdm6bskVPk0i0FjL39hRLdRA3EW4W91QID2JiKNA61qcwLMi5c/s1600/Hideous+DAX.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9MM_URp0Kn9OaPlgNwvMvMqcrkwrZb5wRR2rWcvrFGJBSqkXoJG_hCOGopBkxxc2bnqMZjnAuA4gx3fVVFM6N6tRTMKdm6bskVPk0i0FjL39hRLdRA3EW4W91QID2JiKNA61qcwLMi5c/s1600/Hideous+DAX.png" /></a></div>
<div>
<br /></div>
<div>
The above is hideous to read and hard to maintain, but it does get the job done without having to transform the source data.</div>
<div>
<br /></div>
<h3>
Weighing up the options</h3>
<div>
There are some pros and cons of the above techniques:</div>
<div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoTable15List3Accent2" style="border-collapse: collapse; border: none; mso-border-alt: solid #ED7D31 .5pt; mso-border-themecolor: accent2; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-yfti-tbllook: 1056;">
<tbody>
<tr>
<td style="background: #ED7D31; border-bottom: none; border-left: solid #ED7D31 1.0pt; border-right: none; border-top: solid #ED7D31 1.0pt; mso-background-themecolor: accent2; mso-border-left-alt: solid #ED7D31 .5pt; mso-border-left-themecolor: accent2; mso-border-left-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="color: white; mso-themecolor: background1;">UNPIVOT<o:p></o:p></span></b></div>
</td>
<td style="background: #ED7D31; border-top: solid #ED7D31 1.0pt; border: none; mso-background-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="color: white; mso-themecolor: background1;">DAX<o:p></o:p></span></b></div>
</td>
<td style="background: #ED7D31; border-bottom: none; border-left: none; border-right: solid #ED7D31 1.0pt; border-top: solid #ED7D31 1.0pt; mso-background-themecolor: accent2; mso-border-right-alt: solid #ED7D31 .5pt; mso-border-right-themecolor: accent2; mso-border-right-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 57.8pt;" valign="top" width="77"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="color: white; mso-themecolor: background1;">Winner<o:p></o:p></span></b></div>
</td>
</tr>
<tr>
<td style="border-right: none; border: solid #ED7D31 1.0pt; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-left-alt: solid #ED7D31 .5pt; mso-border-left-themecolor: accent2; mso-border-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Performs well because the formula engine can apply
filters when retrieving rows.</div>
</td>
<td style="border-bottom: solid #ED7D31 1.0pt; border-left: none; border-right: none; border-top: solid #ED7D31 1.0pt; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-bottom-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Performs slower because all rows are traversed for
each cell being calculated.</div>
</td>
<td style="border-left: none; border: solid #ED7D31 1.0pt; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-right-alt: solid #ED7D31 .5pt; mso-border-right-themecolor: accent2; mso-border-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 57.8pt;" valign="top" width="77"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
UNPIVOT</div>
</td>
</tr>
<tr>
<td style="border-left: solid #ED7D31 1.0pt; border: none; mso-border-left-alt: solid #ED7D31 .5pt; mso-border-left-themecolor: accent2; mso-border-left-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Easy to build & maintain.</div>
</td>
<td style="border: none; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Requires working knowledge of DAX. Complexity increases with every
additional date column.</div>
</td>
<td style="border-right: solid #ED7D31 1.0pt; border: none; mso-border-right-alt: solid #ED7D31 .5pt; mso-border-right-themecolor: accent2; mso-border-right-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 57.8pt;" valign="top" width="77"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
UNPIVOT</div>
</td>
</tr>
<tr>
<td style="border-right: none; border: solid #ED7D31 1.0pt; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-left-alt: solid #ED7D31 .5pt; mso-border-left-themecolor: accent2; mso-border-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Produces incorrect results of no, all or multiple “Date
Types” are selected.</div>
</td>
<td style="border-bottom: solid #ED7D31 1.0pt; border-left: none; border-right: none; border-top: solid #ED7D31 1.0pt; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-bottom-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
DAX calculations can detect invalid filtering
criteria and return more appropriate results.</div>
</td>
<td style="border-left: none; border: solid #ED7D31 1.0pt; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-right-alt: solid #ED7D31 .5pt; mso-border-right-themecolor: accent2; mso-border-themecolor: accent2; mso-border-top-alt: solid #ED7D31 .5pt; mso-border-top-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 57.8pt;" valign="top" width="77"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
DAX</div>
</td>
</tr>
<tr>
<td style="border-bottom: solid #ED7D31 1.0pt; border-left: solid #ED7D31 1.0pt; border-right: none; border-top: none; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-bottom-themecolor: accent2; mso-border-left-alt: solid #ED7D31 .5pt; mso-border-left-themecolor: accent2; mso-border-left-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Fact rows are multiplied by the number of date columns made
selectable.</div>
</td>
<td style="border-bottom: solid #ED7D31 1.0pt; border: none; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-bottom-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 150.25pt;" valign="top" width="200"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Fact rows retain original granularity, improving the reusability of
the tabular data.</div>
</td>
<td style="border-bottom: solid #ED7D31 1.0pt; border-left: none; border-right: solid #ED7D31 1.0pt; border-top: none; mso-border-bottom-alt: solid #ED7D31 .5pt; mso-border-bottom-themecolor: accent2; mso-border-bottom-themecolor: accent2; mso-border-right-alt: solid #ED7D31 .5pt; mso-border-right-themecolor: accent2; mso-border-right-themecolor: accent2; padding: 0cm 5.4pt 0cm 5.4pt; width: 57.8pt;" valign="top" width="77"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
DAX</div>
</td>
</tr>
</tbody></table>
</div>
<div>
<br /></div>
<div>
The importance of performance can't be overstated on large data sets - DAX calculations quickly slow down when the formula engine can't apply adequate filtering when retrieving rows from the storage engine.</div>
<div>
<br /></div>
<div>
If you're working with smaller data sets, then UNPIVOT has a clear disadvantage of producing wildly incorrect results when the users clears the filtering on <i>Date Type</i>.</div>
<div>
<br /></div>
<div>
My advice: If you understand and can optimize DAX <i>and</i> you're dealing with mildly brain-dead users, then implement the filters as DAX calculations. If, on the other hand, you are relatively new to PowerPivot, then go the UNPIVOT route - its faster and easier.</div>
<div>
<br /></div>
<div>
Download: <a href="https://doc-0o-c8-docs.googleusercontent.com/docs/securesc/emrip0ek3esvkltsald3pou0psmn9f4t/352sjcu0ki1c91g39crbvlmt866gj1a5/1366812000000/16516648889483475323/16516648889483475323/0B7kk-TP13zNjR1pGNklNRW4yc0E?e=download" target="_blank">UNPIVOT version</a> vs. <a href="https://doc-0o-c8-docs.googleusercontent.com/docs/securesc/emrip0ek3esvkltsald3pou0psmn9f4t/mq1c9h9vfmj50aog8v5usdbdp9cftl00/1366812000000/16516648889483475323/16516648889483475323/0B7kk-TP13zNjTVcycHI0d1VjTVU?e=download&nonce=evb25o9hon400&user=16516648889483475323&hash=s2n5bsjnjfvu9734qa870cs1qouh8c54" target="_blank">DAX version</a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com2tag:blogger.com,1999:blog-5912406209740375418.post-36114816556062683392012-10-25T17:15:00.000+02:002012-10-25T17:15:00.368+02:00Service for refreshing Excel PowerPivot filesI've just released a service which assists in refreshing Excel PowerPivot reports.<br />
<br />
Normally you need SharePoint in order to refresh server-side excel reports. Businesses that don't want to adopt SharePoint, but do want to share Excel files now have a means of refreshing PowerPivot cubes automatically.<br />
<br />
Head on over to <a href="https://excelrefresher.codeplex.com/">codeplex</a> to find out more. <br />
<br />
Happy PowerPivotting.Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-58986075325542748912012-10-17T17:30:00.002+02:002012-10-17T17:35:35.520+02:00Validating RSA ID Numbers in SQL<br />
<div class="MsoNormal"></div><div class="MsoNormal">In this post we’ll explore using SQLCLR integration to provide validation of South African Identity numbers – with the focus being slightly more on the ins and outs of CLR integration, than on how RSA ID numbers get validated. If you’re after the validation algorithm, <a href="http://geekswithblogs.net/willemf/archive/2005/10/30/58561.aspx">look here</a>.<o:p></o:p></div><div class="MsoNormal"><br />
</div><h3>So what’s on the slab?...</h3><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal">This solution makes RSA ID number validation possible by exposing a C# table valued function (TVF) to SQL server. The function takes an identity number as its input, and returns a single record with the following columns:<o:p></o:p></div><div class="MsoNormal"></div><ul><li>IsValid - bit: Whether the input identity number is valid.</li>
<li>IsCitizen - bit: Whether the number belongs to a South African citizen.</li>
<li>Gender - char: The gender of the individual - ‘M’ for male, ‘F’ for female.</li>
<li>DateOfBirth - date: The date of birth of the individual.</li>
</ul><o:p></o:p><br />
<div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal"><o:p></o:p></div><h3>Why a table valued function?</h3><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal">Glad you asked. (...just smile politely, nod and pretend you did!)<br />
Given the 4 output fields above, it is tempting to think that a smarter solution would be to implement a user defined type (UDT) – after all – an ID number can be thought of as a type, right? <o:p></o:p></div><div class="MsoNormal"><br />
</div><div class="MsoNormal">Well.. it can… and that’s exactly the problem. UDTs are possibly the most problematic and controversial part of SQL CLR integration, becauseT-SQL Dependencies on UDTs make it very difficult to upgrade the CLR assembly. In other words, if you use a UDT for a table column, then you cannot upgrade the UDT’s assembly without dropping the column. Also, properties exposed by UDTs have to be aliased in SELECT statements – i.e. the name of a UDT property does not translate into a column name/alias when used in a query.</div><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal"><br />
</div><div class="MsoNormal">In contrast, implementing a table valued function gives us:<o:p></o:p></div><div class="MsoNormal"><br />
</div><div class="MsoNormal"><b>Upgradability</b>: You can unload and upgrade a TVF’s assembly any time, since you won’t unwittingly create a schema-bound dependency on a TVF.<o:p></o:p></div><div class="MsoNormal"><b>Aesthetics</b>: Columns returned by a TVF default to having sensible names.</div><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal"><b>Performance</b>:<b> </b>All columns for a row returned by a TVF are returned in single function call, whereas UDTs require a CLR call per property. <o:p></o:p></div><div class="MsoNormal"><br />
</div><h3>The solution…</h3><div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal">The solution outlined here shows a couple of best practices when using CLR integration with SQL server, namely:<o:p></o:p></div><div class="MsoNormal"></div><ul><li>The assembly is marked as safe.<br />
(no naughty pointers, no external calls, no unverifiable code blocks)</li>
<li>The assembly is stateless.<br />
SQL unloads application domains when it detects resource constraints, and assemblies must deal with this gracefully.</li>
<li>Exceptions are never thrown.<br />
This means that queries never crash, no matter what data you feed into the TVF.</li>
</ul><o:p></o:p><br />
<div class="MsoNormal"><o:p></o:p></div><div class="MsoNormal">The code is reasonably straight forward... there are 3 important components:<br />
<ol><li>The ValidatedRSAID class - which encapsulated the logic of cracking the DateOfBirth, Gender, Citizenship and validity from an ID number.</li>
<li>The ValidateRSAID method, which takes a SqlString as it's input, and returns a ValidatedRSAID instance.</li>
<li>The ValidateRSAIDFillRow method, which SQL uses to decode the fields of the ValidatedRSAID instance into a column values in the resulting query.</li>
</ol></div><div class="MsoNormal"><br /><br />
</div><div class="MsoNormal">Here's the code for parts 2 and 3, which expose the ValidatedRSAID functionality to SQL Server:</div><div class="MsoNormal"><script src="http://pastebin.com/embed_js.php?i=vytEaJc2"></script><br />
</div><div class="MsoNormal"><br />
...and here's the code for validating an ID number. It's been carefully optimized, at the expense of readability:</div><div class="MsoNormal"><script src="http://pastebin.com/embed_js.php?i=3XLehjJe"></script><br />
</div><div class="MsoNormal"><br />
To download the full solution, head over to <a href="https://validatersaid.codeplex.com/">https://validatersaid.codeplex.com</a></div>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-54538381987334476422011-08-18T16:50:00.000+02:002011-08-18T16:50:34.723+02:00Record Versions in ETL staging areasEver wanted to be able to use the OLEDB Destination component in SSIS to load staging tables <i>and</i> have the underlying data versioned for you? If so, here's a solution which is working very well herein a production environment.<br />
<br />
<div>
<br /></div>
<div>
Below are some scripts which implement automatic triggers on staging tables so that whenever a newly loaded record collides with the primary key of an existing record, the existing record is quietly moved out into an archive table. In practice, you create two schemas -</div>
<div>
<div>
<ul>
<li>Staging - contains all staging tables</li>
<li>Archive - contains all archive (historic) tables</li>
</ul>
</div>
<div>
When two tables with the same name are detected in these schemas, then triggers are placed onto the table in the Staging schema to handle moving old data into the archive record. The caveat here is that data is only preserved for columns where the column name and type is identical in both the staging and archive table.</div>
</div>
<div>
<br /></div>
<div>
There are two scripts that make all of this work:</div>
<div>
<ol>
<li>A stored procedure, responsible for generating triggers for a staging table.<br />It generates two triggers:<br />a) an INSTEAD OF trigger, which detects any key collisions and deletes the offending record/s, and<br />b) an AFTER DELETE trigger, which moves deleted staging data into an identically named archive table.<br /><br />If you are comfortable with T-SQL, then you should consider modifying this script to meet your particular needs.</li>
<li>A DDL trigger, which reacts to the creation or alteration of staging & archive tables by executing the aforementioned stored procedure. Effectively this established a "self healing" mechanism, where triggers are automatically recalculated in response to any changes to staging or archive tables.</li>
</ol>
<div>
Example - create two simple sales tables:<br />
<script src="http://pastebin.com/embed_js.php?i=tM1nEwAY">
</script>
</div>
<div>
<br />
Example - simulate two data loads, where the source record has changed:<br />
<script src="http://pastebin.com/embed_js.php?i=KqmeUipg">
</script>
</div>
</div>
<div>
<br />
The above example, will result in one record being in the Staging table, and the older (first) record being moved into the Archive table. Notice how the Archive table introduces the column RecordRetired and uses it as part of the primary key. This allows multiple records with the same SaleID to exist in the archive.<br />
<br />
<b>The Nitty Gritty</b><br />
OK - so by now you've got the point of what the scripts below do for you. Be sure to read the very bottom of this post for information on how to make use this solution in conjunction with SSIS.<br />
<br />
<b>Part 1: The Trigger Generating Stored Procedure</b><br />
<script src="http://pastebin.com/embed_js.php?i=SMjswZNe"></script><br />
<b>Part 2: The DDL Trigger to react to table changes</b><br />
<script src="http://pastebin.com/embed_js.php?i=1pD6tGdr"></script><br />
<br />
<br />
<b>Using SSIS Data Flows</b><br />
The really good news is that you can still use SSIS OLEDB Destination's FAST_LOAD mode when loading in data from source systems, which results in good performance.<br />
Just <i>always</i> follow these two rules which using the OLEDB Destination component:<br />
<br />
<br />
<ol>
<li>Include FIRE_TRIGGERS in the FastLoadOptions property</li>
<li>Keep the FastLoadMaxInsertCommitSize to something below 100,000</li>
</ol>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMwIoGP-ANaxXmgRFVMpp8J0QWXyLmXq6wPJL8vOdVHSD8r7KIkMkw7PR9Lh2uKKcaCH3WtYzxf-0W4ehEWF5v-X8CeUjk60_I2SHMSyEqNObTid9QtgrR0UPzoF3msi8D6f9YqvwZKyA/s1600/properties.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMwIoGP-ANaxXmgRFVMpp8J0QWXyLmXq6wPJL8vOdVHSD8r7KIkMkw7PR9Lh2uKKcaCH3WtYzxf-0W4ehEWF5v-X8CeUjk60_I2SHMSyEqNObTid9QtgrR0UPzoF3msi8D6f9YqvwZKyA/s320/properties.png" width="176" /></a>n</div>
<br />
<br /></div>
Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-5988408428322147082011-08-18T15:47:00.000+02:002011-08-18T15:47:25.564+02:00Conforming Primary Key NamesIt's nice to be able to predict the name of a primary key constraint based on the name of the table it constrains, but often times during the design of a database, we simply don't have time to worry about such things.<br />
<br />
Below is a quick T-SQL script which helps to modify all primary key constraints so that they are named according with the table they belong to...<br />
<br />
e.g. the table <b>Dim.Sales</b> will have it's primary key renamed to<b> PK_Dim_Sales</b><br />
<br />
To use the script, paste it into management studio, then:<br />
<br />
<ol>
<li>press CTRL+T to have the output go to text</li>
<li>press F5 to run the script</li>
<li>Cut'n'Paste the query results into a new query window</li>
<li>press F5 again to run the new query</li>
</ol>
<br />
<br />
<iframe src="http://pastebin.com/embed_iframe.php?i=dbZyeznM" style="border:none;height:300px;width:100%"></iframe>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-47748525103041617482011-06-08T13:08:00.000+02:002011-06-08T13:10:11.748+02:00SSIS Script for handling bad datesOracle and SSIS both have the ability to handle dates which SQL server's DateTime type won't accept. <br />
A reasonably simple solution is to leverage the script component to detect and repair the invalid dates... here's a step-by-step guide for doing exactly that...<br />
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXHCi005tNax7CeO0mLdBJR9DNt3uV7QMS24Aig-BXFKiFCYuVeagfbmTxBFiKrPDc3D51-JcMJIlJCGGxvMCiQ05QeRft3btXKGr6ip3NlfGHWSMcr9xPxrUUkXH1IY5yjDNdkKdFdgQ/s320/ScriptSetup.png" style="float: right;" />
<br />
<ol>
<li>Add a script component to your data flow, in <em>transform</em> mode.</li>
<li>Open the editor for the script component</li>
<li>Under <em>Input Columns, </em>select all the date columns you want to repair, and set their usage type to READ/WRITE (this is vital, else the script will break).</li>
<li>Under <em>Script</em>, click the edit script button</li>
<li>Select and delete all text</li>
<li>Paste the C# code from <a href="http://pastebin.com/raw.php?i=r9F3az6r">here</a></li>
<li>Close the source code window, click OK to dismass the script editor.</li>
<li>Run the data flow.</li>
</ol>
The script will efficiently detect date with bad years and attempt to correct them to something which SQL finds palettable. <br />
<br />
Give it a whirl and let me know what you think!Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-21721030918775061052011-06-08T12:54:00.000+02:002011-06-08T12:54:35.307+02:00Loading data from an Oracle sourceBuilding Business Intelligence using the Microsoft BI stack in an organization that has an Oracle based transactional system is not uncommon, and here I'll outline a couple of tips and tricks that should ease the building of SSIS packages in that type of environment.<br />
<br />
<div>
</div>
<strong>Attunity Components</strong><br />
Attunity offer several nice solutions for getitng data out of Oracle and into SQL - one of which is their <a href="http://www.attunity.com/oracle_cdc_for_ssis">CDC (Change Data Capture) offering</a>. If you don't have the budget or stomach for setting up CDC in an oracle environment, then your next best bet is to use the <a href="http://www.microsoft.com/downloads/en/details.aspx?FamilyID=d9cb21fe-32e9-4d34-a381-6f9231d84f1e&displaylang=en">free Attunity oracle connectors for SSIS</a>, which demonstrate a measurable performance boost.<br />
<br />
<div>
</div>
<strong>The Oracle Index and Order By Gotcha</strong><br />
In previous posts I mention the performance benefits of loading sorted data into target tables. I'm currently loading data from Oracle 10, which exhibits a rather strange characteristic... <strong>Oracle does not use make us of an index for ordering data</strong>. In other words, Oracle only uses the indexes on a table for the sake of solving <em>where</em> predicates and joins... if you simply want to select all records from a table then an <em>order by</em> clause will force a table sort operation every time the query is run. This will place significant strain on a production database, and thus represents a relatively unacceptable solution.<br />
<br />
<div>
</div>
The solution is to use SQL staging tables liberally, and to go with the loading strategy of:<br />
<ol>
<li>Load all <em>large</em> Oracle tables into SQL staging tables.</li>
<li>Use SQL queries to perform simple joins, lookups and sorting to transform the stage into the data warehouse target tables.</li>
</ol>
<strong>Performance tips for dataflows that stage data from an Oracle source</strong><br />
<div class="separator" style="clear: both; text-align: center;">
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitvyRdnPrawBRUzeWrqo41pgUAgz_8Js08TNia1vYxpU-RiDIz-GfnFFZHU0cwJ9lNLxLW1zZgR4fHvSfUewsBKgt7aVnkb7yInHjO4mkO1H4qVw2R05_i8_k-oYVlefrViHKxeNHv5B4/s320/OracleDataflow.png" style="float: right;" />
</div>
<ol>
<li>Use the Attunity source component and set it's BatchSize property to match the data flow's DefaultBufferMaxRows property. </li>
<li>Set the FastLoadMaxInsertCommitSize of the OLE DB desgtination to the same value as the data flow's DefaultBufferMaxRows. </li>
<li>Consider using the Data Conversion component for changing the type and column name of the source columns. This keeps your Oracle queries neat and readable, whilst also allowing you to handle invalid values (we're looking at you, Oracle DateTime) within the data flow, instead of having a query that fails halfway.</li>
<li><div>
If you have the budget, grab a copy of Pragmatic Work's TaskFactory. Among many other great components, it offers a Null handler, and a Data Cleansing transform, both of which are huge time saves when dealing with Oracle source columns.</div>
</li>
</ol>
<br />
<div>
BTW - values that tend to offer consistently good performance are:</div>
<ul>
<li>DefaultBufferMaxRows = 1000 (on data flow task)</li>
<li>BatchSize = 1000 (on Attunity Oracle Source)</li>
<li>FastLoadMaxInsertCommitSize = 1000 (on OLE DB destination)</li>
</ul>
<strong>In Summary</strong><br />
<ul>
<li>Don't rely on indexes supporting any <em>order by</em> clause in Oracle.</li>
<li>Do use staging tables to support cleansing and transforming data.</li>
<li>Pay attention to matching the buffer & batch sizes for the source, data flow and destination.</li>
</ul>
<br />Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-88162520030862031452010-08-03T10:43:00.000+02:002011-02-21T06:53:47.624+02:00Using SSIS with SQL partitions<p>Last month I blogged about the virtues of having seperate full-load vs. incremental load packages when doing data warehousing. Well... that isn't the end-all of that argument...</p><p>When you design your data warehouse, if you take table partitioning into account (which I highly recommend, since most BI data is temporally chronological), then you can usually get away with writting a single data flow task which achieves minimal logging even during incremental updates, by simply targetting a staging partition and then switching it in at the end of the data load.</p><p>In one such data flow, I needed to be able to normalize the incomming data into three tables. The challenge comes in that you don't want to perform lookups on every record when you're doing a full load (since this slows the data flow down), but you do need to do those lookups to avoid duplicates during incremental loads.</p><p>A great pattern for solving this to check whether the target table is empty before starting the data flow. Below, the "Get MaxSaleID" task runs a SQL script to put the row-count of the target table into a variable named MaxSaleID...</p><p> </p><div class="separator" style="clear: both; text-align: center;"><a style="margin-left: 1em; margin-right: 1em;" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJoRHZf1zPNfsFwx3ECmTkkUesCvR7Bb1aDL7Sby1CrHKdJzYD1gNyS-HIovbHG90WRHww0UQw9gVDt1AXXf8z-aO8R8cGrrIErKW6XoqzJh3qNSZediM1f54sFmBrF6lGjporf_Y8174/s1600/component.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJoRHZf1zPNfsFwx3ECmTkkUesCvR7Bb1aDL7Sby1CrHKdJzYD1gNyS-HIovbHG90WRHww0UQw9gVDt1AXXf8z-aO8R8cGrrIErKW6XoqzJh3qNSZediM1f54sFmBrF6lGjporf_Y8174/s320/component.png" border="0" alt="" width="320" height="268" /></a></div><p>... then in the data flow "Sales - Vouchers", a conditional split inspects the MaxSaleID variable, and re-routes rows to avoid the lookup if there is guaranteed to be no match from the lookup component...<br /><br /></p><table class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;" cellspacing="0" cellpadding="0" align="center"><tbody><tr><td style="text-align: center;"><a style="margin-left: auto; margin-right: auto;" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgydTTHMRuxKIFwSfskNU1kbhyk-zQFBsij1mnM1bm7e8N-ECgCxhggk8nxCd_5S_vUZxdKj3cD8tQd9OQYbeNQ10NiWQkzfpRh51CTKp6_lrbo3fyQ3kkd-rGN03a0CetvXMwSRFNaSRc/s1600/data+flow.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgydTTHMRuxKIFwSfskNU1kbhyk-zQFBsij1mnM1bm7e8N-ECgCxhggk8nxCd_5S_vUZxdKj3cD8tQd9OQYbeNQ10NiWQkzfpRh51CTKp6_lrbo3fyQ3kkd-rGN03a0CetvXMwSRFNaSRc/s640/data+flow.png" border="0" alt="" width="640" height="443" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Data Flow for full and incremental uploads</td></tr></tbody></table><p>The "Scaffold Voucher Sales" destination bulk inserts into a staging table which is subseuently switched in onto the tail end of thew live table.</p><p>A quick note about partitions: If you know that the source system which you are pulling data from will always append new records and will never/rarely update old records, then you can skip technologies like Change Data Capture or Change Tracking and simply re-import the last n-thousand records into a partition, swap out the old tail-partition and swap in the new one. In generaly this results in a far leaner, simpler package and process than would result from implementing complex data-change detection.</p>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-20111752621355734402010-08-03T10:18:00.000+02:002010-08-03T10:18:17.734+02:00Threading in SSIS Script componentAlberto Ferrari writes many good blogs on SQL and SSIS. In <a href="http://sqlblog.com/blogs/alberto_ferrari/archive/2008/09/14/ssis-living-in-a-parallel-world-not-yet.aspx?CommentPosted=true#commentmessage">one of his old posts</a> he mentions the presence of a bug in the SSIS script component, where the output buffer is marked as completed as soon as the ProcessInput method returns from the last row of the last buffer.<br />
<br />
<u>This is not a bug</u>, and if you think about it, it's actually correct for the stream to be marked as completed once the component returns from processing it. The reason for this is because when synchronous outputs are used, the output buffer <em>is</em> the input buffer - i.e. they're the same block of RAM, so when the component returns from processing the buffer, that buffer is passed on to the next down-stream component... that is... SSIS <em>does not wait</em> for other threads in your component to complete their work before passing the buffer on. Not only is this not a bug, but it's actually a performance booster and, in general, a really good thing.<br />
<br />
The solution to Alberto's problem is reasonably simple, and should be a rule for your SSIS development work:<br />
<strong> Always use asynchronous outputs when writting threaded components</strong><br />
<br />
By doing so, you are in complete control of how your output buffers are issued to downstream components, and no rows will be passed downstream while a thread is still calculating values for that row.<br />
<br />
<br />Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-65756215867834747522010-08-03T10:03:00.000+02:002010-08-03T10:03:38.606+02:00Much Needed Improvements To SSISSSIS has been my mid-night mistress for months now, and at the risk of sounding off-color I've compiled a short list of things which ought to be changed in the nest release of SSIS:<br />
<br />
1) The diagram auto-layout desperately needs to be rethought and replaced.<br />
2) The Data Viewer should at least allow "live vs. page-at-a-time" modes, and showing bottle-neck indicators on pipelines would be at least as useful as showing the row count.<br />
3) Manual tuning of buffers should either go away (replaced by automated solution), or at very least be augmented with debug helpers and visual feedback.<br />
<br />
4) Lookup and SCD components need to be binned and replaced with components that add more flexibility in terms of how lookup data is retrieved, how matches are performed (range-value matching would be a great start), allow for multiple joins (lookup), enable sliding window lookups (as opposed to partial caching).<br />
<br />
5) Exposing only the most useless component properties to the data-flow's expression engine, instead of the most useful (e.g. Source's SQL Query, Destinations MaxInsertCommitSize, etc.)<br />
6) The overall rigidity and resulting fragility of SSIS and the LineageID implementation brings. Most other ETL packages allow you to disconnect and reconnect pipelines without a litany of "missing field" dialogs. There should at least be a "take your best guess" toggle button on a toolbar.<br />
<br />
7) Overall UI layout is counter-productive. I want to be able to see and edit variables, expressions and key properties without a cluttered workspace. Ideally a call-out style popup when hovering over a component would do the trick.<br />
<br />
8) Inability to change scope of existing variables without deleting and recreating them.<br />
<br />
9) The merge and sort components. For real-world data warehousing, these two components are the ultimate pitfalls... both run out of steam long before swap-disk is depleted. The merge component in particular desperately needs to be rewritten to not swamp down-stream buffers. Sort needs to be rewritten to intelligently handle sub-sorting of streams (i.e. resort subset of sorted input), as well as ability to yield windowed sorts, which would enable a new class of downstream lookups.Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-38513784292430422812010-08-02T10:53:00.000+02:002011-02-21T06:54:26.055+02:00Hiding user tables in SQL Management Studio<p>I spotted <a href="http://stackoverflow.com/questions/1847728/hide-filter-tables-in-sql-server-management-studio/3386224#3386224">a question over at Stack Overflow</a> asking whether it is possible to hide specific SQL Server user-tables from end users. Strictly speaking it is not, but there is a trick to get superfluous tables out of the way of the "Tables" node of the object explorer. While, on first glance, this may seem like poor practise, it turns out that there's a bona-fide case for it: Garbage tables used in table partitions.</p><p>A data warehouse that I recently delivered makes extensive use of SQL's partitioned tables feature - which enabled a data-load design which does incremental fast-loads... something which simply isn't possible without partitions.</p><p>The down-side of using partitions is that, in order to keep a fact table on-line whilst loading in new data, you effectively need three tables:</p><ol><li>The fact table</li><li>A staging table, into which new data is loaded and prepared</li><li>A garbage table into which the old partition of the fact table can be switched</li></ol><p>The garbage table is of absolutely no value other than as a means for nuking a data partition, and so there's really very little value in cluttering Management Studio's Tables tree with these tables.</p><p>The answer? Mark the garbage tables as "database tools support" tables. This causes SSMS to hide the tables from normal view and instead list them under the "System Tables" node. Nice'n'neat!</p><p>Here's a script template which you can add to your template explorer:<br /><br /></p><blockquote>EXEC sp_addextendedproperty <br />@name = N'microsoft_database_tools_support', <br />@value = '', <br />@level0type ='schema', <br />@level0name ='',<br />@level1type = 'table', <br />@level1name = N'.<br /><br /> <>'<br /><br /><p>Either add it to your template explorer, or simply paste into a query window, then press CTRL+SHIFT+M to launch the "specify values..." dialog.</p><p>Enjoy!</p><table></table></blockquote>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-4991415469285005012010-07-20T11:02:00.000+02:002011-02-21T06:55:14.535+02:00More on high performance data loading<p>Today I stumbled across what is possibly <a href="http://msdn.microsoft.com/en-us/library/dd425070(SQL.100).aspx">the most comprehensive, useful page on the net about bulk loading data into SQL Server 2008</a>. While it is packed with useful information, one of the pearls is about trace flag 610.</p><p>In a nutshell, trace flag 610 instructs the server to minimally log all operations that can be minimally logged. In practical terms, this means that MERGE and INSERT operations will minimally log <span style="text-decoration: underline;">new page allocations</span>. Without trace flag 610, these operations are fully logged. The link (above) goes into great detail about the usage of trace flag 610, so I'll simply summarize as follows:</p><p><span style="font-family: "Courier New", Courier, monospace;">DBCC TRACEON (610); -- enables trace flag 610</span><br /><span style="font-family: "Courier New", Courier, monospace;">MERGE INTO xxx USING yyy ON -- do minimally logged upsert</span><br /><span style="font-family: "Courier New", Courier, monospace;">DBCC TRACEOFF(610); -- restore normal mode.</span></p><p>Note that setting trace flag 610 will affect <em>all</em> connections to the <em>server</em> until it is reset, so avoid using it on main OLTP servers. For a usable T-SQL MERGE example, see <a href="http://blogs.msdn.com/b/sqlserverstorageengine/archive/2010/06/03/minimal-logging-and-merge-statement.aspx">Sunil's blog posting</a>.</p>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-53088753360906802372010-07-06T21:49:00.001+02:002011-06-08T12:06:39.376+02:00SSIS - High Performance Data Warehouse Loading<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">In a <a href="http://sql10.blogspot.com/2010/07/ssis-check-list-for-minimally-logged.html"><span style="color: blue;">previous post</span></a> I mentioned how to achieve data
transfers at near file-copy speeds using SSIS. In this follow-on, I’m going to
show you how to put this information to practical use for quickly loading a
data-warehouse.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Generally speaking, the idea behind
data warehousing is to reduce the load on transactional systems by providing a
central repository of which has been normalized in a manner that is ideal for
drawing reports and analytics. Strategies for loading data warehouses typically
fall into one of two categories:<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="line-height: normal; margin: 0in 0in 0pt 0.5in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><span style="mso-list: Ignore;">1.<span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";">
</span></span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Periodically resetting and
repopulating the data warehouse during off-peak times, <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="line-height: normal; margin: 0in 0in 0pt 0.5in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><span style="mso-list: Ignore;">2.<span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";">
</span></span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Performing an initial data load,
followed by on-going incremental loads.<br style="mso-special-character: line-break;" />
<br style="mso-special-character: line-break;" />
<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Significant changes to any
transactional systems usually necessitate changes to a data warehouse or, at
least, changes to how the data warehouse is populated, which means that no
matter which strategy you follow, you’ll benefit from being able to achieve
fast full-reloads of the data warehouse…</span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; mso-outline-level: 3;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 13.5pt; mso-fareast-font-family: "Times New Roman";">Lobbying
for separate packages</span></b></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">A common challenge in data
warehousing is building a solution that performs well, can cope with increasing
data volumes, and is reasonably maintainable. While it is arguably a matter of
preference, I recommend creating <i>separate packages</i> for full reloads vs.
incremental loads, because there are glaring differences in how you handle data
in either scenario:<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; mso-padding-alt: 0in 0in 0in 0in; mso-yfti-tbllook: 1184;">
<tbody>
<tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td style="background-color: transparent; border-color: windowtext rgb(0, 0, 0); border-style: solid none; border-width: 1pt 0px; padding: 0in 5.4pt; width: 120.1pt;" valign="top" width="160"></td>
<td style="background-color: transparent; border-color: windowtext rgb(0, 0, 0); border-style: solid none; border-width: 1pt 0px; padding: 0in 5.4pt; width: 199.1pt;" valign="top" width="265">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Full Refresh</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext rgb(0, 0, 0); border-style: solid none; border-width: 1pt 0px; padding: 0in 5.4pt; width: 159.6pt;" valign="top" width="213">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Incremental Load</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="background-color: transparent; border: rgb(0, 0, 0); padding: 0in 5.4pt; width: 120.1pt;" valign="top" width="160">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Type 1 Slowly
Changing Dimensions</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border: rgb(0, 0, 0); padding: 0in 5.4pt; width: 199.1pt;" valign="top" width="265">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Can be bulk loaded
with minimal logging.</span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border: rgb(0, 0, 0); padding: 0in 5.4pt; width: 159.6pt;" valign="top" width="213">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Should be checked –
any change represents an error.</span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="background-color: transparent; border: rgb(0, 0, 0); padding: 0in 5.4pt; width: 120.1pt;" valign="top" width="160">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Type 2 Slowly
Changing Dimensions</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border: rgb(0, 0, 0); padding: 0in 5.4pt; width: 199.1pt;" valign="top" width="265">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Can be bulk loaded
with minimal logging, with historic information (if inferable from source
data) being generated using common SSIS normalization tactics.</span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border: rgb(0, 0, 0); padding: 0in 5.4pt; width: 159.6pt;" valign="top" width="213">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Must be verified
against existing data and handled accordingly.</span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) rgb(0, 0, 0) windowtext; border-style: none none solid; border-width: 0px 0px 1pt; padding: 0in 5.4pt; width: 120.1pt;" valign="top" width="160">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Fact Tables</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) rgb(0, 0, 0) windowtext; border-style: none none solid; border-width: 0px 0px 1pt; padding: 0in 5.4pt; width: 199.1pt;" valign="top" width="265">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Can be bulk loaded
with minimal logging.</span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) rgb(0, 0, 0) windowtext; border-style: none none solid; border-width: 0px 0px 1pt; padding: 0in 5.4pt; width: 159.6pt;" valign="top" width="213">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="color: #76923c; font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Can require some
sort of “update if already loaded” logic (like SCD2). Can be bulk loaded, but
not minimally logged.</span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">The remainder of this post discusses
tactics for achieving a full refresh as quickly as possible, with focus on
lessons learned <i>in the wild</i>…</span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; mso-outline-level: 1;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">My previous post about fast,
minimally logged data inserts mentioned having data sorted by a table’s primary
key before it arrives at the SSIS destination (OLEDB destination or SQL
Destination). Finding an appropriate means of sorting data really depends on
your creativity, but there are usually some clever short cuts if you look for
them…<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; mso-outline-level: 3;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 13.5pt; mso-fareast-font-family: "Times New Roman";">Avoid
the Merge component<o:p></o:p></span></b></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">There are plenty of woeful tales
afloat on the net about the merge component. Particularly prominent in its list
of evils is the fact that it doesn’t scale well… meaning that your package
which works fine in the lab may choke and die in the wild. The reason behind
this comes from two rather ugly characteristics:<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">·</span><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Merge must see <i>at least</i> one row in both inputs before
it can release any output.<br />
This makes sense – the output is a sorted union – a comparison must be made
before a decision can be made on which row/s to output first.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">·</span><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Merge does not respect down-stream throughput.<br />
Unforgivably, the component will consume data as rapidly as possible at both
inputs regardless of whether down-stream components are able to consume the
output fast enough.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Both of the above create a situation
where large volumes of data cause resource exhaustion. Although SSIS does page
memory buffers out to disk, there are limits to how many memory buffers can be
allocated; Even with plenty of free disk space, a 40GB data load can easily
choke on a machine with 8GB RAM when a merge component is used.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; mso-outline-level: 3;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 13.5pt; mso-fareast-font-family: "Times New Roman";">Avoid
the Sort component<o:p></o:p></span></b></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Sorting all but the smallest sets of
data inside the pipeline is madness, and leads to the same scalability problems
that the merge component does. Instead, consider ways to get the data into the
pipeline in an already-sorted (or semi-sorted) state,<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Some tricks for achieving this
include:<br style="mso-special-character: line-break;" />
<br style="mso-special-character: line-break;" />
<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">1)</span></b><b style="mso-bidi-font-weight: normal;"><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span></b><b style="mso-bidi-font-weight: normal;"><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Use ORDER BY at source<br />
</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">If the source system has an
appropriate index then simply use an ORDER BY clause when selecting data from
the source database. Be sure to set the pipeline meta-data in SSIS to let SSIS
know that the data is indeed sorted.<br style="mso-special-character: line-break;" />
<br style="mso-special-character: line-break;" />
<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">2)</span></b><b style="mso-bidi-font-weight: normal;"><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span></b><b style="mso-bidi-font-weight: normal;"><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Use a Staging database<br />
</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">If you can’t sort the data at
source, consider moving it into an intermediate table, then let an RDBMS engine
perform the sort – this approach scales far better than using a Sort component
in SSIS.<br style="mso-special-character: line-break;" />
<br style="mso-special-character: line-break;" />
<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">3)</span></b><b style="mso-bidi-font-weight: normal;"><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span></b><b style="mso-bidi-font-weight: normal;"><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Look for the natural data order<br />
</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">SSIS doesn’t actually check whether
your data is really sorted – it simply takes your word for it. It is common in
transactional systems to date fiscal records, but to give them an
auto-incrementing primary key (for SQL read: IDENTITY) – such that the natural
order of the primary key correlates with the order of the fiscal date. <o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">It is typical in data warehousing
need source data ordered by <i>date</i>, not identity, but adding “ORDER BY
Date” to the select query causes the source system to choke. In cases like
these, it’s fine to order the data by its primary key, but tell SSIS that the
data is ordered by date.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; mso-outline-level: 3;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 13.5pt; mso-fareast-font-family: "Times New Roman";">Avoid
using temporary files for staging<o:p></o:p></span></b></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">For reading and writing files, SSIS
supports two formats - RAW, which perform well, and Text/CSV which is more
readable. Various SSIS gurus have posted about the performance advantages of
Raw files over Text/CSV, but there tend to be relatively few postings about the
tactical advantages/disadvantages of using either one.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">In general, RAW files are easier to
work with at either end of a pipeline, and unlike using a database, no special
effort is required achieve very fast write speed for these files, making them a
very tempting solution for intermediate data storage. I recommend giving a
staging database preference to using temporary files, however, for the simple
reason that staging files are inflexible, which detracts from the
maintainability of your package… there is no SELECT/JOIN/WHERE/UNION capability
across RAW files, and while this sort of can be achieved inside of SSIS, you
will likely find scenarios (as pointed out above) where SSIS simply doesn’t
handle large data as easily as a database will.<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; mso-outline-level: 3;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 13.5pt; mso-fareast-font-family: "Times New Roman";">Use
a staging database</span></b><span style="font-family: Calibri;"><b><span style="font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">
</span></b><b><span style="font-family: "Times New Roman","serif"; font-size: 13.5pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></b></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">A common misconception in data
warehousing is that using a staging database is a necessary evil that slows
down the overall loading process. At first glance this makes sense… after all -
why move data twice when you can move it once?<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">The answer lies in the prerequisites
for fast minimally logged inserts. Given that sorted data allows us to
completely by-pass the transaction log and TempDB, it turns out that using a
staging database to combine and sort source data can actually <i>improve</i>
performance when compared to the alternatives – most particularly when any of
the following are true:<br style="mso-special-character: line-break;" />
<br style="mso-special-character: line-break;" />
<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">·</span><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Source data is not naturally sorted per the target table’s
key<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">·</span><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Multiple sources need to be combined<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">·</span><span style="font-size: 7pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="font-family: Calibri;"> </span></span><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Lookups/Joins are required on large data sets<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">If this advice seems counter
intuitive, then consider the following performance characteristics of different
versions of a package which achieves exactly the same goal – to combine 40GB
from 5 tables and load them into a fact table with 13 dimension lookups:<o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; mso-padding-alt: 0in 0in 0in 0in; mso-yfti-tbllook: 1184;">
<tbody>
<tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td style="background-color: transparent; border: 1pt solid windowtext; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Merge Components<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext windowtext windowtext rgb(0, 0, 0); border-style: solid solid solid none; border-width: 1pt 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Sort Components<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext windowtext windowtext rgb(0, 0, 0); border-style: solid solid solid none; border-width: 1pt 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">RAW Files<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext windowtext windowtext rgb(0, 0, 0); border-style: solid solid solid none; border-width: 1pt 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Lookups Used?<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext windowtext windowtext rgb(0, 0, 0); border-style: solid solid solid none; border-width: 1pt 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Staging Tables?<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext windowtext windowtext rgb(0, 0, 0); border-style: solid solid solid none; border-width: 1pt 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Load Sorted Data?<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: windowtext windowtext windowtext rgb(0, 0, 0); border-style: solid solid solid none; border-width: 1pt 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Execution Time<o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext; border-style: none solid solid; border-width: 0px 1pt 1pt; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">6<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">8<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">16<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Yes<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">(Out Of Memory)<o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext; border-style: none solid solid; border-width: 0px 1pt 1pt; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">7<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">8<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">16<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Yes<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">(Out Of Memory)<o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 3;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext; border-style: none solid solid; border-width: 0px 1pt 1pt; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">3<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">2<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76"></td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">6<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Yes<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">4:33:01<o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 4;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext; border-style: none solid solid; border-width: 0px 1pt 1pt; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">8<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">16<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">No<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">6:15:32<o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 5;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext; border-style: none solid solid; border-width: 0px 1pt 1pt; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">16<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">8<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">No<o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">5:56:47<o:p></o:p></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 6; mso-yfti-lastrow: yes;">
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext; border-style: none solid solid; border-width: 0px 1pt 1pt; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 70.8pt;" valign="top" width="94">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 87.5pt;" valign="top" width="117">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">0</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">7</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">8</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 56.7pt;" valign="top" width="76">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">Yes</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
<td style="background-color: transparent; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); border-style: none solid solid none; border-width: 0px 1pt 1pt 0px; padding: 0in 5.4pt; width: 91.25pt;" valign="top" width="122">
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">2:01:58</span></b><span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<br /></div>
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt;">
<span style="font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-font-family: "Times New Roman";">The trick is to use a staging
database to eliminate sort and merge components, as well as to eliminate as
many lookups as possible and finally order data according to the target table’s
primary key. As you can see, the difference in performance is dramatic.<o:p></o:p></span></div>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-50395757462256436692010-07-01T19:57:00.000+02:002011-05-31T08:04:20.603+02:00SSIS: Check List for Minimally Logged InsertsOver the coming weeks I’ll be presenting a series of posts on advanced techniques for achieving high performance SSIS data loads. <br />
In this post, we’ll focus on a brief check-list for achieving high speed data inserts into a SQL Server target table. There are many disparate sources, but I’ve yet to find a single checklist of everything you need to consider for achieving minimally logged insert operations.<br />
<br />
The prerequisite for achieving data loads at speeds comparable to a file copy ultimately come down to two main things:<br />
<br />
1. Insert operations need to be minimally logged.<br />
2. Inserted data needs to be sorted according to the target table’s clustered index (primary key).<br />
When insert operations are minimally logged, it means that the transaction log is bypassed during the insert operation – i.e. only the MDF (data) file is written to, instead of the MDF and LDF file/s. When the inserted data is sorted according to the target table’s primary key, it means that SQL server does not need to use TempDB to sort and build the clustered index. With heavy read/write operations to the transaction log and TempDB out of the way, the insert operation becomes extremely fast.<br />
<br />
So… enough of the theory… here’s the full check-list for achieving high performance bulk inserts:<br />
<strong>Target Table Checklist:</strong><br />
1. Database recovery model is either BULK LOGGED or SIMPLE.<br />
2. Account used to connect to SQL server is effectively granted the BULK OPERATIONS privilege on the server.<br />
3. Target table is empty.<br />
4. Ideally, non-clustered indexes are disabled.<br />
<br />
<strong>SSIS Checklist:</strong><br />
1. Pipeline data is ordered according to table’s clustered index.<br />
2. SQL Destination (faster) or OLEDB Destination (more flexible) component is used.<br />
3. Table Lock is checked on destination component<br />
4. Names of clustered index columns is provided in the Order box (BulkInsertTabLock = True)<br />
5. Complete entire load in a single operation (MaxInsertCommitSize = 0).<br />
<br />
<br />
<strong>Fallback Plan:</strong><br />
It isn’t always possible to meet all of the above criterion. In particular, if the target table is not empty, or the data stream cannot be sorted according to the clustered index, then it’s actually better not to attempt a single-operation bulk insert. This is because when the table is not empty, or when the data is not sorted according to the primary key, it means that SQL server can only begin to process the actual insert operation after the last row has been sent – which means that all of the data will end up in TempDB anyway. When this is the case, set MaxInsertCommitSize to a reasonably small number (thumb-suck around 10,000) to allow SQL to process the inserts in batches during the data streaming operation. This improves parallelism and reduces the size by which TempDB will grow to accommodate the operation.<br />
<br />
<strong>In summary:</strong><br />
1. If running SSIS on the target server, use the SQL destination component – it’s up to 15% faster. <u>Be sure to execute the package under a user account with sufficient privillages to open shared memory with SQL Server</u>.<br />
2. Whenever possible, lock the table during inserts.<br />
3. If you’re inserting sorted data into an empty table, then:<br />
<ul>
<li>For the SQL Destination, ensure:</li>
<ul>
<li>MaxInsertCommitSize = 0 </li>
<li>BulkInsertTabLock = TRUE</li>
<li>BulkInsertOrder = <em>columnname [, columnname, ...]</em></li>
</ul>
<li>For the OLEDB Destination, ensure:</li>
<ul>
<li>FastLoadMaxInsertCommitSize = 0 (or it's default - 2147483647)</li>
<li>FastLoadOptions = TABLOCK,ORDER(<em>columnname</em> ASC)</li>
</ul>
</ul>
4. If inserting unsorted data, or are inserting into a populated table, set MaxInsertCommitSize to 10,000 or less.<br />
<br />
<u>2011-05-30 Update</u><br />
To set up ordering for the OLE DB Destination component, edit <strong>FastLoadOptions</strong> property. It contains a comma seperated list of text, more details of which are available at <a href="http://msdn.microsoft.com/en-us/library/ms141237.aspx">http://msdn.microsoft.com/en-us/library/ms141237.aspx</a> and <a href="http://msdn.microsoft.com/en-us/library/ms177468.aspx">http://msdn.microsoft.com/en-us/library/ms177468.aspx</a>.Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com2tag:blogger.com,1999:blog-5912406209740375418.post-63478800620655167572008-05-01T16:04:00.001+02:002011-02-21T06:56:04.257+02:00Improving Performance: Stored Procedures and Table Valued Parameters<p><span> </span><br />Here's how to use table typed arguments to author a stored procedure that inserts multiple master-detail records into tables that use identity columns for primary key. Consider the following Accounts and Transactions tables:<br /><br /></p><div style="margin-left: 28pt;"><span style="font-size: 10pt;">CREATE TABLE Accounts (</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> AccountID int not null identity(1,1) PRIMARY KEY,</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> CustomerName NVARCHAR(100) not null,</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> AccountType NVARCHAR(10) not null </span><br /><div style="margin-left: 28pt;"><span style="font-size: 10pt;">) </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">CREATE TABLE [Transactions] (</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> AccountID int not null, </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> TransactionID int not null identity(1,1),</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> TransactionDate date default (GETDATE()),</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> Amount money not null default (0.00),</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> CONSTRAINT PK_Transactions PRIMARY KEY (AccountID, TransactionID),</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> CONSTRAINT FK_TransactionAccount FOREIGN KEY (AccountID) REFERENCES Accounts(AccountID)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">GO</span></div><p><span style="font-size: 10pt;">These tables represent a typical parent-child relationship where identity columns are used for the primary keys. Inserting records into such tables is usually problematic for performance, because client applications are required to perform multiple round-trips to the server... first inserting an account record, then retrieve its new primary key value, then using that value to insert a transaction record.</span></p><p><span style="font-size: 10pt;">By using table types, we can define arguments to pass to a stored procedure that allow multiple account and transaction records to be described, without the caller needing to know what the identity values of the inserted records will be. </span></p><p><span style="font-size: 10pt;">First, we'll define two table types which describe the records which we want our new stored procedure to handle:</span></p><div style="margin-left: 28pt;"><span style="font-size: 10pt;">CREATE TYPE AccountsTblType AS TABLE (</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> AccountID int not null PRIMARY KEY,</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> CustomerName NVARCHAR(100) not null,</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> AccountType NVARCHAR(10) not null </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">CREATE TYPE TransactionsTblType AS TABLE (</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> AccountID int not null, </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> TransactionDate date default (GETDATE()),</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> Amount money not null default (0.00)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">)</span></div><p><span style="font-size: 10pt;">Now we can create a stored procedure which takes arguments of these table types as follows:</span></p><div style="margin-left: 28pt;"><span style="font-size: 10pt;">CREATE PROCEDURE sp_CreateNewAccountsWithTransactions (</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> @Accounts AccountsTblType READONLY,</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> @Transactions TransactionsTblType READONLY )</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">AS</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> ...</span></div><p><span style="font-size: 10pt;">The stored procedure we define will take both account and transaction table variables, and will perform the work of first inserting account records, then retrieving the new record's primary keys (the identity values), and then correlating those new identity values to the pseudo values given in the @Transactions argument. We will call this stored procedure as follows:</span></p><div style="margin-left: 28pt;"><span style="font-size: 10pt;">DECLARE @MyAccounts AS AccountsTblType;</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">INSERT INTO @MyAccounts (AccountID, CustomerName, AccountType) </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">VALUES (-1, 'Mark','Savings')</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> , (-2, 'Mark','Checking')</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> , (-3, 'John','Savings');</span></div><p> </p><div style="margin-left: 28pt;"><span style="font-size: 10pt;">DECLARE @MyTransactions AS TransactionsTblType;</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">INSERT INTO @MyTransactions (AccountID, Amount )</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">VALUES (-1, 1000.00)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> , (-1, 5.93)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> , (-2, 5500.23)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> , (-2, 33.99)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> , (-3, 9050.00)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">exec sp_CreateNewAccountsWithTransactions </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> @Accounts = @MyAccounts, </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> @Transactions = @MyTransactions</span></div><p><span style="font-size: 10pt;">In the calling code (above), we define type typed variables and populate them with the records which we want inserted into the database Account & Transaction tables. Note that we are assigning pseudo values to the AccountID column, so as to describe which transactions correlate to which accounts.</span></p><p><span style="font-size: 10pt;">With the above in mind, here's our stored procedure:</span></p><div style="margin-left: 28pt;"><span style="font-size: 10pt;">CREATE PROCEDURE sp_CreateNewAccountsWithTransactions (</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> @Accounts AccountsTblType READONLY,</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> @Transactions TransactionsTblType READONLY )</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">AS</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> SET NOCOUNT ON;</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> DECLARE @IdentityLink AS TABLE ( </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> SubmittedKey int, </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> ActualKey int, </span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> RowNumber int identity(1,1)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> );</span></div><div style="margin-left: 28pt;"> </div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> INSERT INTO Accounts (CustomerName, AccountType)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> OUTPUT inserted.AccountID INTO @IdentityLink (ActualKey)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> SELECT CustomerName, AccountType FROM @Accounts;</span></div><div style="margin-left: 28pt;"> </div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> WITH OrderedRows As (</span></div><div style="margin-left: 36pt;"><span style="font-size: 10pt;"> SELECT AccountID, ROW_NUMBER () OVER (ORDER BY AccountID) As RowNumber </span></div><div style="margin-left: 36pt;"><span style="font-size: 10pt;"> FROM @Accounts</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> )</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> UPDATE @IdentityLink SET SubmittedKey = M.AccountID</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> FROM @IdentityLink L JOIN OrderedRows M ON L.RowNumber = M.RowNumber;</span></div><div style="margin-left: 28pt;"> </div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> INSERT INTO Transactions(AccountID, TransactionDate, Amount)</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> SELECT L.ActualKey, T.TransactionDate, T.Amount</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> FROM @Transactions T</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;"> JOIN @IdentityLink L ON L.SubmittedKey = T.AccountID;</span></div><div style="margin-left: 28pt;"><span style="font-size: 10pt;">GO</span></div><p><span style="font-size: 10pt;">Here's what the above stored procedure is doing:</span></p><ol><li><span style="font-size: 10pt;">@IdentityLink is declared as a table variable which store the inserted identity values for AccountID. We will use this table variable to later match transaction records to account records.</span></li><li><span style="font-size: 10pt;">We insert all account records, outputting the new identity values into the @IdentityLink variable.</span></li><li><div><span style="font-size: 10pt;">Update the @IdentityLink table variable so that every inserted AccountID value has a corresponding pseudo (submitted) AccountID value:</span></div><ol><li><span style="font-size: 10pt;">We use a common table expression to define OrderedRows, which provides a correlation between the submitted pseudo AccountID values and the row order in which they appear in the submitted table type variable @Accounts. </span></li><li><span style="font-size: 10pt;">We join the submitted AccountID to the actual inserted AccountID value by using this row numbering. Put differently we're using the order in which records where submitted to correlate the inserted AccountID values back to the pseudo AccountID values.</span></li></ol> </li><li><span style="font-size: 10pt;">Insert records into the transaction table, using the @IdentityLink table variable to cross walk from the pseudo (submitted) AccountID values over to the actual identity values of the inserted Account records.</span></li></ol><p><span style="font-size: 10pt;">Perhaps the most interesting part of this technique is the support that's available for it in ADO.NET. The SqlCommand object in .NET allows developers to assign in-memory DataTables to the Value property of a SqlParameter – meaning that the above stored procedure can be called by a client application without writing any T-SQL code to populate the table variables before passing them as arguments to the stored procedure.</span></p></div>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0tag:blogger.com,1999:blog-5912406209740375418.post-63210707781227875722008-05-01T16:02:00.001+02:002011-02-21T06:56:31.474+02:00Whether & When to use Stored Procedures<p><span> </span></p><p>It's a discussion that comes up time after time in training session... when and why to use stored procedures. There are many heated debates posted on the net about this, with people seeming to argue strongly either for or against. Rather than suggest that you should or shouldn't use stored procedures with SQL server, let's look at their merits and pit falls with a view to knowing <em>when</em> to use them.</p><p>The biggest misconception about stored procedures is that they offer significant performance advantages over submitted DML queries to the SQL engine. While it is true that the query plan for any stored procedures is precompiled, the SQL engine does a very good job of parameterizing DML queries and caching their plans. The real-world result is that stored procedures don't offer <em>a significant</em> improvement in performance – especially not in a world where disk I/O, no CPU, is the bottle neck. So then – if performance isn't a core criterion for choosing whether or not to use stored procedures, then what is?</p><p>Simple: <strong>Maintainability and security.</strong></p><p>Stored procedures offer a simple means of keeping client applications at arm's-length from your relational design, allowing you to accommodate changes to the storage of data without re-writing and re-deploying client applications. Moreover, stored procedures allow administrators to implement logic to perform custom logic and validity checks on the operations being requested.</p><p>Most interestingly, by using SQL 2008's new table types, database administrators can offer stored procedures to developers which enable multiple records to be inserted in a single round-trip to the server. See the this post for an example. With the advent of LINQ and the Entity Framework, the use of stored procedures as a rigid doctrine is becoming less attractive... LINQ DataContext classes allow developers to manipulate table data without writing a single SQL statement, and the queries which LINQ for SQL generates are impressively efficient. The point here is that enforcing the use of stored procedures will significantly slow the progress of a development team using technologies like LINQ or the Entity Framework.</p><p>If security and maintainability are your prime concerns, consider offering updatable views (instead of stored procedures) to your development team. Updatable views offer all of the same advantages as stored procedures, and are generally more flexible in the various ways you can use them. Views can also offer better query optimization that stored procedures, because when a "retrieve-data" stored procedure executes, it always returns all columns selected from the underlying table – regardless of whether or not the client application needed all those columns, but with an updatable view the query optimizer can offer performance gains by expanding the view into the query it is being used in, and then determining how to solve the outer query based solely on the columns requested by the client – regardless of how many columns the actual view defines.</p><p>If you aren't comfortable with updatable views or giving LINQ direct table access, or have a corporate policy of implementing all data access via stored procedures, that's OK... but there are a couple of common scenarios worth reviewing, where stored procedures are used, but shouldn't be...</p><h2>When NOT to use stored procedures</h2><ol><li><strong>Aggregating, moving or manipulating large volumes of table data.</strong><br />This is one of the top reasons administrators site for using stored procedures. The problem with using a stored procedure, or even straight DML queries, for manipulating large volumes of data is that it hits your transaction log, and as the volume of data increases, so the performance becomes exponentially worse. The result is a stored procedure which works in a couple of seconds on your test-set of data, but crawls for hours in a production environment.<br /><strong>Instead: </strong>Consider using SSIS – it is very effective at extracting large volumes of data, transforming into new "shapes", and then re-loading it into a destination... hence the term ETL (extract, transform, load). If you're a database administrator who's responsible for these sorts of year-end or month-end jobs, resist the temptation to write stored procedures and instead try SSIS – it's a great tool!</li><li><strong>Running queries across linked servers.</strong><br />Generally speaking, if you're serious about performance and system stability then you'll avoid using linked servers. Writing queries that joins data from multiple servers <em>usually</em> suggests that your organization needs coherent reports that gather data from disparate line-of-business systems. <br /><strong>Instead:</strong> Rally for the implementation of a simple datamart- a place where data from disparate systems can be viewed and reported on. At the very least, consider creating a new database, using SSIS to populate it with data from the source databases, then running the queries against this new database. With SQL 2008's new change data capture capabilities you can run incremental SSIS data loads frequently without significantly hurting the performance of your source databases.</li><li><strong>Implementing complex business rules.<br /></strong>Stored procedures can implement useful logic to manipulate and marshal data that's going to or from database tables, but that doesn't mean that a stored procedure is the best technology for implementing complex business logic. Generally speaking, the stored procedure is the "last mile" in the persistence layer of any good 3-tier design. Complex business rules are best implemented in high-level tools that deal with information at a business entity level. Stored procedures see data at the persistence level, which makes them a less-than-ideal choice for implementing complex business rules. <br /><strong>Instead:</strong> Consider implementing a good 3-tier design with the business-logic layer implemented in a rich language like VB.NET or C#. Also, ,consider learning to use the new Entity Framework, which helps transform persisted data to and from the business entities it represents.</li></ol><p>To keep perspective here, we're not saying that stored procedures are<em> bad</em> – they certainly have their place. Newer technologies like LINQ, Entity Framework, SSIS and updatable views allow you to achieve all of the same benefits (and more) that stored procedures offer, with fewer pitfalls and in significantly shorter development times.</p><p> </p>Mark Whitehttp://www.blogger.com/profile/00931590930945665211noreply@blogger.com0