Skip to main content

Loading data from an Oracle source

Building Business Intelligence using the Microsoft BI stack in an organization that has an Oracle based transactional system is not uncommon, and here I'll outline a couple of tips and tricks that should ease the building of SSIS packages in that type of environment.

Attunity Components
Attunity offer several nice solutions for getitng data out of Oracle and into SQL - one of which is their CDC (Change Data Capture) offering. If you don't have the budget or stomach for setting up CDC in an oracle environment, then your next best bet is to use the free Attunity oracle connectors for SSIS, which demonstrate a measurable performance boost.

The Oracle Index and Order By Gotcha
In previous posts I mention the performance benefits of loading sorted data into target tables. I'm currently loading data from Oracle 10, which exhibits a rather strange characteristic... Oracle does not use make us of an index for ordering data.  In other words, Oracle only uses the indexes on a table for the sake of solving where predicates and joins... if you simply want to select all records from a table then an order by clause will force a table sort operation every time the query is run. This will place significant strain on a production database, and thus represents a relatively unacceptable solution.

The solution is to use SQL staging tables liberally, and to go with the loading strategy of:
  1. Load all large Oracle tables into SQL staging tables.
  2. Use SQL queries to perform simple joins, lookups and sorting to transform the stage into the data warehouse target tables.
Performance tips for dataflows that stage data from an Oracle source
  1. Use the Attunity source component and set it's BatchSize property to match the data flow's DefaultBufferMaxRows property.
  2. Set the FastLoadMaxInsertCommitSize of the OLE DB desgtination to the same value as the data flow's DefaultBufferMaxRows. 
  3. Consider using the Data Conversion component for changing the type and column name of the source columns. This keeps your Oracle queries neat and readable, whilst also allowing you to handle invalid values (we're looking at you, Oracle DateTime) within the data flow, instead of having a query that fails halfway.
  4. If you have the budget, grab a copy of Pragmatic Work's TaskFactory. Among many other great components, it offers a Null handler, and a Data Cleansing transform, both of which are huge time saves when dealing with Oracle source columns.

BTW - values that tend to offer consistently good performance are:
  • DefaultBufferMaxRows = 1000        (on data flow task)
  • BatchSize = 1000   (on Attunity Oracle Source)
  • FastLoadMaxInsertCommitSize = 1000  (on OLE DB destination)
In Summary
  • Don't rely on indexes supporting any order by clause in Oracle.
  • Do use staging tables to support cleansing and transforming data.
  • Pay attention to matching the buffer & batch sizes for the source, data flow and destination.


Popular posts from this blog

Reading Zip files in PowerQuery / M

Being a fan of PowerBI, I recently looked for a way to read zip files directly into the Data Model, and found this blog which showed a usable technique. Inspired by the possibilities revealed in Ken's solution, but frustrated by slow performance, I set out to learn the M language and write a faster alternative.
UnzipContents The result of these efforts is an M function - UnzipContents - that you can paste into any PowerBI / PowerQuery report. It takes the contents of a ZIP file, and returns a list of files contained therein, along with their decompressed data:

If you're not sure how to make this function available in your document, simply:

Open up PowerQuery (either in Excel or in PowerBI)Create a new Blank Query.Open up the Advanced Editor  (found on the View tab in PowerBI).Copy-Paste the above code into the editor, then close the editor.In the properties window, rename the the function to UnzipContents Usage Using the function is fairly straight forward: Choose "New Quer…

Easily Move SQL Tables between Filegroups

Recently during a Data Warehouse project, I had the need to move many tables to a new file group. I didn't like any of the solutions that I found on Google, so decided to create on of my own. The result?

MoveTablesToFilegroupClick here for a nifty stored proc allows you to easily move tables, indexes, heaps and even LOB data to different filegroups without breaking a sweat. To get going, copy-paste the code below into Management Studio, and then run it to create the needed stored procedure.
Hopefully the arguments are self explanatory, but here are some examples:

1. Move all tables, indexes and heaps, from all schemas into the filegroup named SECONDARY:
EXEC dbo.sp_MoveTablesToFileGroup
@SchemaFilter = '%',-- chooses schemas using the LIKE operator
@TableFilter  = '%',-- chooses tables using the LIKE operator
@DataFileGroup = 'SECONDARY',-- The name of the filegroup to move index and in-row data to.
@ClusteredIndexes = 1,-- 1 means "Move all clustered inde…

SQL Server vs Azure Data Warehouse vs Netezza vs Redshift

The Great Database Shoot Out In Jan'17, I teamed up with Marc van der Zon (Data Scientist), to test and compare several database technologies for our organization's future analytics and BI platform. The technologies that made the shortlist were:
SQL Server, because it is the organization's existing BI platform.Azure Data Warehouse, because of its high similarity to SQL Server.Amazon's Redshift, because of its attractive cost, scalability and performance.Netezza, because it is anaffordable on-prem appliance that performs well. Considerations We were primarily looking for the best bang-for-buck option, so price and performance were viewed as more important than how feature complete any particular option was. That said, what we regarded as important included: A shallow learning curve for BI developers (i.e. no need for expert skills in order to get good performance)
Ability to re-use existing code  (i.e. ANSI-92 SQL compatibility)
Time-to-solution.   (i.e. does the platform …