Skip to main content

Improving Performance: Stored Procedures and Table Valued Parameters

 
Here's how to use table typed arguments to author a stored procedure that inserts multiple master-detail records into tables that use identity columns for primary key. Consider the following Accounts and Transactions tables:

CREATE TABLE Accounts (
    AccountID int not null identity(1,1) PRIMARY KEY,
    CustomerName NVARCHAR(100) not null,
    AccountType NVARCHAR(10) not null    
)    
    
CREATE TABLE [Transactions] (
    AccountID int not null,
    TransactionID int not null identity(1,1),
    TransactionDate date default (GETDATE()),
    Amount money not null default (0.00),
    CONSTRAINT PK_Transactions PRIMARY KEY (AccountID, TransactionID),
    CONSTRAINT FK_TransactionAccount FOREIGN KEY (AccountID) REFERENCES Accounts(AccountID)
)
GO

These tables represent a typical parent-child relationship where identity columns are used for the primary keys. Inserting records into such tables is usually problematic for performance, because client applications are required to perform multiple round-trips to the server... first inserting an account record, then retrieve its new primary key value, then using that value to insert a transaction record.

By using table types, we can define arguments to pass to a stored procedure that allow multiple account and transaction records to be described, without the caller needing to know what the identity values of the inserted records will be.

First, we'll define two table types which describe the records which we want our new stored procedure to handle:

CREATE TYPE AccountsTblType AS TABLE (
    AccountID int not null PRIMARY KEY,
    CustomerName NVARCHAR(100) not null,
    AccountType NVARCHAR(10) not null    
)
CREATE TYPE TransactionsTblType AS TABLE (
    AccountID int not null,
    TransactionDate date default (GETDATE()),
    Amount money not null default (0.00)
)

Now we can create a stored procedure which takes arguments of these table types as follows:

CREATE PROCEDURE sp_CreateNewAccountsWithTransactions (
        @Accounts AccountsTblType READONLY,
        @Transactions TransactionsTblType READONLY )
AS
    ...

The stored procedure we define will take both account and transaction table variables, and will perform the work of first inserting account records, then retrieving the new record's primary keys (the identity values), and then correlating those new identity values to the pseudo values given in the @Transactions argument. We will call this stored procedure as follows:

DECLARE @MyAccounts AS AccountsTblType;
INSERT INTO @MyAccounts (AccountID, CustomerName, AccountType)
VALUES        (-1, 'Mark','Savings')
        ,    (-2, 'Mark','Checking')
        ,    (-3, 'John','Savings');

 

DECLARE @MyTransactions AS TransactionsTblType;
INSERT INTO @MyTransactions (AccountID, Amount )
VALUES        (-1, 1000.00)
        ,     (-1, 5.93)
        ,     (-2, 5500.23)
        ,     (-2, 33.99)
        ,     (-3, 9050.00)
exec sp_CreateNewAccountsWithTransactions
    @Accounts = @MyAccounts,
    @Transactions = @MyTransactions

In the calling code (above), we define type typed variables and populate them with the records which we want inserted into the database Account & Transaction tables. Note that we are assigning pseudo values to the AccountID column, so as to describe which transactions correlate to which accounts.

With the above in mind, here's our stored procedure:

CREATE PROCEDURE sp_CreateNewAccountsWithTransactions (
        @Accounts AccountsTblType READONLY,
        @Transactions TransactionsTblType READONLY )
AS
    SET NOCOUNT ON;
    
    DECLARE @IdentityLink AS TABLE (
        SubmittedKey int,
        ActualKey int,
        RowNumber int identity(1,1)
        );
 
    INSERT INTO Accounts (CustomerName,    AccountType)
    OUTPUT inserted.AccountID INTO @IdentityLink (ActualKey)
    SELECT CustomerName, AccountType FROM @Accounts;
 
    WITH OrderedRows As (
        SELECT AccountID, ROW_NUMBER () OVER (ORDER BY AccountID) As RowNumber
        FROM @Accounts
    )
    UPDATE @IdentityLink SET SubmittedKey = M.AccountID
    FROM @IdentityLink L JOIN OrderedRows M ON L.RowNumber = M.RowNumber;
 
    INSERT INTO Transactions(AccountID, TransactionDate, Amount)
    SELECT L.ActualKey, T.TransactionDate, T.Amount
        FROM @Transactions T
        JOIN @IdentityLink L ON L.SubmittedKey = T.AccountID;
GO

Here's what the above stored procedure is doing:

  1. @IdentityLink is declared as a table variable which store the inserted identity values for AccountID. We will use this table variable to later match transaction records to account records.
  2. We insert all account records, outputting the new identity values into the @IdentityLink variable.
  3. Update the @IdentityLink table variable so that every inserted AccountID value has a corresponding pseudo (submitted) AccountID value:
    1. We use a common table expression to define OrderedRows, which provides a correlation between the submitted pseudo AccountID values and the row order in which they appear in the submitted table type variable @Accounts.
    2. We join the submitted AccountID to the actual inserted AccountID value by using this row numbering. Put differently we're using the order in which records where submitted to correlate the inserted AccountID values back to the pseudo AccountID values.
  4. Insert records into the transaction table, using the @IdentityLink table variable to cross walk from the pseudo (submitted) AccountID values over to the actual identity values of the inserted Account records.

Perhaps the most interesting part of this technique is the support that's available for it in ADO.NET. The SqlCommand object in .NET allows developers to assign in-memory DataTables to the Value property of a SqlParameter – meaning that the above stored procedure can be called by a client application without writing any T-SQL code to populate the table variables before passing them as arguments to the stored procedure.

Post a Comment

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative.
UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data:



If you're not sure how to make this function available in your document, simply:

Open up PowerQuery (either in Excel or in PowerBI)Create a new Blank Query.Open up the Advanced Editor  (found on the View tab in PowerBI).Copy-Paste the above code into the editor, then close the editor.In the properties window, rename the the function to UnzipContents Usage Using the function is fairly straight forward: Choose "New Quer…

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result?

MoveTablesToFilegroupClick here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure.
Hopefully the arguments are self explanatory, but here are some examples:

1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY:
EXEC dbo.sp_MoveTablesToFileGroup
@SchemaFilter = '%',-- chooses schemas using the LIKE operator
@TableFilter  = '%',-- chooses tables using the LIKE operator
@DataFileGroup = 'SECONDARY',-- The name of the filegroup to move index and in-row data to.
@ClusteredIndexes = 1,-- 1 means "Move all clustered inde…

SQL Server vs Azure Data Warehouse vs Netezza vs Redshift

The Great Database Shoot Out In Jan'17, I teamed up with Marc van der Zon (Data Scientist), to test and compare several database technologies for our organization's future analytics and BI platform. The technologies that made the shortlist were:
SQL Server, because it is the organization's existing BI platform.Azure Data Warehouse, because of its high similarity to SQL Server.Amazon's Redshift, because of its attractive cost, scalability and performance.Netezza, because it is anaffordable on-prem appliance that performs well. Considerations We were primarily looking for the best bang-for-buck option, so price and performance were viewed as more important than how feature complete any particular option was. That said, what we regarded as important included: A shallow learning curve for BI developers (i.e. no need for expert skills in order to get good performance)
Ability to re-use existing code  (i.e. ANSI-92 SQL compatibility)
Time-to-solution.   (i.e. does the platform …