Skip to main content

SSIS: Check List for Minimally Logged Inserts

Over the coming weeks I’ll be presenting a series of posts on advanced techniques for achieving high performance SSIS data loads.
In this post, we’ll focus on a brief check-list for achieving high speed data inserts into a SQL Server target table. There are many disparate sources, but I’ve yet to find a single checklist of everything you need to consider for achieving minimally logged insert operations.

The prerequisite for achieving data loads at speeds comparable to a file copy ultimately come down to two main things:

1. Insert operations need to be minimally logged.
2. Inserted data needs to be sorted according to the target table’s clustered index (primary key).
When insert operations are minimally logged, it means that the transaction log is bypassed during the insert operation – i.e. only the MDF (data) file is written to, instead of the MDF and LDF file/s. When the inserted data is sorted according to the target table’s primary key, it means that SQL server does not need to use TempDB to sort and build the clustered index. With heavy read/write operations to the transaction log and TempDB out of the way, the insert operation becomes extremely fast.

So… enough of the theory… here’s the full check-list for achieving high performance bulk inserts:
Target Table Checklist:
1. Database recovery model is either BULK LOGGED or SIMPLE.
2. Account used to connect to SQL server is effectively granted the BULK OPERATIONS privilege on the server.
3. Target table is empty.
4. Ideally, non-clustered indexes are disabled.

SSIS Checklist:
1. Pipeline data is ordered according to table’s clustered index.
2. SQL Destination (faster) or OLEDB Destination (more flexible) component is used.
3. Table Lock is checked on destination component
4. Names of clustered index columns is provided in the Order box (BulkInsertTabLock = True)
5. Complete entire load in a single operation (MaxInsertCommitSize = 0).


Fallback Plan:
It isn’t always possible to meet all of the above criterion. In particular, if the target table is not empty, or the data stream cannot be sorted according to the clustered index, then it’s actually better not to attempt a single-operation bulk insert. This is because when the table is not empty, or when the data is not sorted according to the primary key, it means that SQL server can only begin to process the actual insert operation after the last row has been sent – which means that all of the data will end up in TempDB anyway. When this is the case, set MaxInsertCommitSize to a reasonably small number (thumb-suck around 10,000) to allow SQL to process the inserts in batches during the data streaming operation. This improves parallelism and reduces the size by which TempDB will grow to accommodate the operation.

In summary:
1. If running SSIS on the target server, use the SQL destination component – it’s up to 15% faster. Be sure to execute the package under a user account with sufficient privillages to open shared memory with SQL Server.
2. Whenever possible, lock the table during inserts.
3. If you’re inserting sorted data into an empty table, then:
  • For the SQL Destination, ensure:
    • MaxInsertCommitSize = 0
    • BulkInsertTabLock = TRUE
    • BulkInsertOrder = columnname [, columnname, ...]
  • For the OLEDB Destination, ensure:
    • FastLoadMaxInsertCommitSize = 0 (or it's default - 2147483647)
    • FastLoadOptions = TABLOCK,ORDER(columnname ASC)
4. If inserting unsorted data, or are inserting into a populated table, set MaxInsertCommitSize to 10,000 or less.

2011-05-30 Update
To set up ordering for the OLE DB Destination component, edit FastLoadOptions property. It contains a comma seperated list of text, more details of which are available at http://msdn.microsoft.com/en-us/library/ms141237.aspx and http://msdn.microsoft.com/en-us/library/ms177468.aspx.

Comments

Richard C said…
Thanks for the helpful post. One question: in step 3 of the fallback plan you say "add the names of key columns to the 'Order Columns' box (or BulkInsertOrder property)." Where is this "Order Columns" box? I can't find it in the SSIS designer (2008).

Thanks,
Richard.
thomas said…
"Order Columns" is in the advanced editor

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative. UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data: If you're not sure how to make this function available in your document, simply: Open up PowerQuery (either in Excel or in PowerBI) Create a new Blank Query. Open up the Advanced Editor  (found on the View tab in PowerBI). Copy-Paste the above code into the editor, then close the editor. In the properties window, rename the the function to  UnzipContents Usage Using the function is fairly straight for...

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result? MoveTablesToFilegroup Click here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure. Hopefully the arguments are self explanatory, but here are some examples: 1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY: EXEC dbo.sp_MoveTablesToFileGroup @SchemaFilter = '%', -- chooses schemas using the LIKE operator @TableFilter  = '%', -- chooses tables using the LIKE operator @DataFileGroup = 'SECONDARY', -- The name of the filegroup to move index and in-row data to. @ClusteredIndexes = 1, --...

Power Query: Transforming YYYYMM dates (the quick way)

Accountants. Their unit of work seems to be the month, as if individual days don't exists, or don't count somehow. Nowhere is this better seen than in the notion of the accounting period , which all too often follows the form YYYYMM.  Try converting this directly into a date and Power Query starts making excuses faster than a kid with his hand caught in the cookie jar. The quick solution to this is to understand what Power Query's Table.TransformColumns does, and then leverage this knowledge to transform your YYYYMM values into proper date type columns. Table.TransformColumns As it's name suggests, this handy function allows you to convert the contents of a column from one type to another. The basic syntax is: = Table.TransformColumns( #"Your Source Table", { A list of tuples, specifying the columns and what functions to apply to each value} ) Lists {denoted by curly braces} are something you need to get comfortable with if you...