Skip to main content

Validating RSA ID Numbers in SQL


In this post we’ll explore using SQLCLR integration to provide validation of South African Identity numbers – with the focus being slightly more on the ins and outs of CLR integration, than on how RSA ID numbers get validated. If you’re after the validation algorithm, look here.

So what’s on the slab?...

This solution makes RSA ID number validation possible by exposing a C# table valued function (TVF) to SQL server. The function takes an identity number as its input, and returns a single record with the following columns:
  • IsValid                - bit: Whether the input identity number is valid.
  • IsCitizen             - bit: Whether the number belongs to a South African citizen.
  • Gender                - char: The gender of the individual - ‘M’ for male, ‘F’ for female.
  • DateOfBirth        - date: The date of birth of the individual.

Why a table valued function?

Glad you asked.     (...just smile politely, nod and pretend you did!)
Given the 4 output fields above, it is tempting to think that a smarter solution would be to implement a user defined type (UDT) – after all – an ID number can be thought of as a type, right?

Well.. it can…  and that’s exactly the problem. UDTs are possibly the most problematic and controversial part of SQL CLR integration, becauseT-SQL Dependencies on UDTs make it very difficult to upgrade the CLR assembly. In other words, if you use a UDT for a table column, then you cannot upgrade the UDT’s assembly without dropping the column. Also, properties exposed by UDTs have to be aliased in SELECT statements – i.e. the name of a UDT property does not translate into a column name/alias when used in a query.

In contrast, implementing a table valued function gives us:

Upgradability: You can unload and upgrade a TVF’s assembly any time, since you won’t unwittingly create a schema-bound dependency on a TVF.
Aesthetics: Columns returned by a TVF default to having sensible names.
Performance: All columns for a row returned by a TVF are returned in single function call, whereas UDTs require a CLR call per property.

The solution…

The solution outlined here shows a couple of best practices when using CLR integration with SQL server, namely:
  • The assembly is marked as safe.
    (no naughty pointers, no external calls, no unverifiable code blocks)
  • The assembly is stateless.
    SQL unloads application domains when it detects resource constraints, and assemblies must deal with this gracefully.
  • Exceptions are never thrown.
    This means that queries never crash, no matter what data you feed into the TVF.

The code is reasonably straight forward... there are 3 important components:
  1. The ValidatedRSAID class - which encapsulated the logic of cracking the DateOfBirth, Gender, Citizenship and validity from an ID number.
  2. The ValidateRSAID method, which takes a SqlString as it's input, and returns a ValidatedRSAID instance.
  3. The ValidateRSAIDFillRow method, which SQL uses to decode the fields of the ValidatedRSAID instance into a column values in the resulting query.


Here's the code for parts 2 and 3, which expose the ValidatedRSAID functionality to SQL Server:


...and here's the code for validating an ID number. It's been carefully optimized, at the expense of readability:


To download the full solution, head over to https://validatersaid.codeplex.com

Comments

Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative. UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data: If you're not sure how to make this function available in your document, simply: Open up PowerQuery (either in Excel or in PowerBI) Create a new Blank Query. Open up the Advanced Editor  (found on the View tab in PowerBI). Copy-Paste the above code into the editor, then close the editor. In the properties window, rename the the function to  UnzipContents Usage Using the function is fairly straight forw

Power Query: Transforming YYYYMM dates (the quick way)

Accountants. Their unit of work seems to be the month, as if individual days don't exists, or don't count somehow. Nowhere is this better seen than in the notion of the accounting period , which all too often follows the form YYYYMM.  Try converting this directly into a date and Power Query starts making excuses faster than a kid with his hand caught in the cookie jar. The quick solution to this is to understand what Power Query's Table.TransformColumns does, and then leverage this knowledge to transform your YYYYMM values into proper date type columns. Table.TransformColumns As it's name suggests, this handy function allows you to convert the contents of a column from one type to another. The basic syntax is: = Table.TransformColumns( #"Your Source Table", { A list of tuples, specifying the columns and what functions to apply to each value} ) Lists {denoted by curly braces} are something you need to get comfortable with if you

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result? MoveTablesToFilegroup Click here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure. Hopefully the arguments are self explanatory, but here are some examples: 1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY: EXEC dbo.sp_MoveTablesToFileGroup @SchemaFilter = '%', -- chooses schemas using the LIKE operator @TableFilter  = '%', -- chooses tables using the LIKE operator @DataFileGroup = 'SECONDARY', -- The name of the filegroup to move index and in-row data to. @ClusteredIndexes = 1, --