SQL

Saving Indiana Jones (and your SSIS packages) From “Bad Dates”

I have been working on a project modifying the ETL for a data warehouse. The source is a Pervasive database which, through SSIS 2005 packages, populates a SQL Server 2005 data warehouse and associated SSAS 2005 cubes. I encountered an issue which reminded me of a scene in the first Indiana Jones film, Raiders of the Lost Ark. If you had a nickel for every time you heard of SSIS reminding someone of Indiana Jones….you’d now have….a nickel. Yeah. I’m kind of different.

It was a scene in Cairo where Indy and Sallah had taken the headpiece to the Staff of Ra to an old man who they asked to translate the markings on the object. While they are talking, Indy tosses a date into the air to catch it in his mouth. Sallah, noticing a victim of the dates (poisoned moments before by a nefarious character) laying dead, snatches the date from the air just above Indy’s mouth with the line, “Bad dates.” Sallah saves Indy’s life by preventing the bad date from being consumed. Sounds a lot like an SSIS data flow to me, and you may be starting to see my connection.

In my situation, the source system had allowed invalid dates to populate columns with Date data type. These dates had a year of 0000. My client wanted me to route the records with invalid dates to a separate table for later analysis. Ok. Cool. I would just use the ISDATE() function in SSIS to separate out the records with “bad dates.” Easy.

<<insert cool John Williams music as I proceed to attempt to use ISDATE in SSIS only to find that it does not natively exist>>

Now what was I going to do? “Don’t know. I’ll think of something.”

I played around with using the Data Conversion transformation, planning to route the invalid dates out the Error output . That was not working for me. Instead, it was seeing the year 0000 as 2000 and just letting all the records out the Success output. Suboptimal.

So, I reached out to the SQL community on Twitter (using the #sqlhelp and #ssis tags) asking if there was an elegant way to perform ISDATE() type operations in SSIS. I was thinking I may end up doing some scripting (not my strong suit to be sure) but was trying to see if there was a cool way to do it without scripting. Looking back at that, I should have embraced the opportunity to learn more about scripting instead of trying to stick with the tools I know better. Josef Richberg (blog: http://josef-richberg.squarespace.com/, Twitter: @sqlrunner) responded by suggesting I look at VB functions. That was the final little push I needed to dig into the Script Component (transformation). And I am glad for having received it.

I started by changing my data source query to convert the date field involved to a string before bringing it into the data flow. Here is my example to illustrate.

We will create tables to play with in the Adventureworks database, in the markvsql schema.

USE Adventureworks

GO

 

--Create the markvsql schema if it does not already exist

IF NOT EXISTS

(

    SELECT * FROM sys.schemas WHERE [name] = 'markvsql'

)

EXEC sp_executesql N'CREATE SCHEMA markvsql'

GO

 

CREATE TABLE markvsql.isdatessis2005_source

(

      checkdate_string varchar(30) NULL

)

 

CREATE TABLE markvsql.isdatessis2005_destination

(

      checkdate_string varchar(30) NULL

    , checkdate datetime NULL

)

 

CREATE TABLE markvsql.isdatessis2005_errorlog

(

      checkdate_string varchar(30) NULL

    , failreason varchar(100) NULL

)

 

/*insert the good date*/

INSERT INTO markvsql.isdatessis2005_source (checkdate_string) 

VALUES('2010-01-01 00:00:00.000')

/*insert the bad date*/

INSERT INTO markvsql.isdatessis2005_source (checkdate_string) 

VALUES('0000-01-01 00:00:00.000')

You will note that the second record has a checkdate_string value with a 0000 in the year position. If we take a moment to try converting our checkdate_string field to a date, we will see the problem.

SELECT

      checkdate_string

    , CONVERT(datetime, checkdate_string) AS checkdate

FROM markvsql.isdatessis2005_source

This query leads to the following error message:

Msg 242, Level 16, State 3, Line 1

The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.

If we were keeping this all in T-SQL, we could just use the ISDATE() function and be on our way. However, SSIS lacks this function natively. Thus, I used the Script Component (transformation) to check for invalid dates, populating a new boolean Output column with the result of the VB ISDATE() function. I then used the Conditional Split to route the valid dates to my destination table and the invalid ones to my errorlog table based on the new boolean field’s value. This worked quite well for me and I was astounded at how easy it was to implement.

Here is a screenshot of my data flow, which is pretty much self documenting.

image

For those unfamiliar with the Script Component (transformation), I was, before doing this, you must choose whether the component will be used as a Source, Destination, or Transformation when you add it to your data flow. For the purposes of this process, you must choose Transformation. Here are screenshots of the Script Component (transformation) properties.

image

The only input column in our example is the checkdate_string field. The Usage Type defaults to read only and I left it that way.

image

The checkdate_string_isdate field was added as a new column (via Add Column button) and assigned the Boolean Data Type since the VB ISDATE() function returns a boolean result.

image

Notice the single line of code:

Row.checkdatestringisdate = IsDate(Row.checkdatestring).

That is seriously the only line of code I had to write. And, thanks to Intellisense, I bagged it on the first try. All it does is set that value of our new boolean output field to the result of the IsDate() function.

When we run our package, we see that our Script Component (transformation) and Conditional Split work just fine indeed.

image

That is really all there is to it. This same technique also works in SSIS 2008, although to use the VB scripting, you will need to change the scripting language in the Script transform from C# (the default) to VB. I poked around with doing this in C# for this post and soon decided that, for the moment, I will leave that to others.

When there are bad dates to be dealt with in a house in Cairo, then Sallah is your man. For SSIS, you can use the Script Component (transformation).

I have included a zip of my SSIS 2005 solution which also includes the .sql file featuring my code snippets shown above. Note that you will need to change the SQL instance in the Adventure Works data source to match your own.

ISDATE Functionality In SSIS Solution Download

Categories: SQL, SSIS

Tagged as: , , , ,

5 replies »

  1. Mark,

    Two code-less solutions, that should be faster than any vb script:

    1. Use a Derived Column Transform. The Expression would be (DT_DATE)(checkdate_string)
    or
    2. Use a Data Conversion Transform. Convert checkdate_string to DB_Date

    On either solution, the key is to configure the error output on that column. So on the Derived Column method, click the “Configure Error Output” button. Set the On Error action to “Ignore”. Same process if you go with Data Conversion.

    Either way, you will now have nicely formatted dates in a proper strongly typed date column. Anything that could not be converted to a proper date will result in NULL in the dataflow. So you can still use your conditional branch to do something with it if you choose.

    Like

  2. AHA! Those are the kind of solutions I was originally looking for. I never considered using “Ignore” like this; definitely worth digging into. Thanks much for the comment.

    Like