Category: SSIS

SQL Server Integration Services related musings

An SSIS Training Video by Webucator

Greetings, noble readers!

A while back, Webucator, an organization that produces training videos on a number of topics, contacted me about doing a video walkthrough of one of my blog posts. The original post is Making Your SSIS Configuration Files More Readable. Their video walkthrough is here.

Feel free to check out this video and their others as well.

In the interests of full disclosure, I was in no way compensated for this and have no relationship with Webucator other than this video.

TARDIS – Time And Relative Distribution In SSIS

TARDISFans of Dr. Who will immediately recognize the title of this post. In that classic science fiction series, the Doctor has a very special vessel that lets him travel anywhere in Time and Space, even across dimensions. That vessel is called the T.A.R.D.I.S. (Time And Relative Dimension In Space). As I was crafting the idea for this post, paying homage to Dr. Who just seemed like the perfect hook.

At my current client, I have been working on an ETL solution. It involves automating the execution of about 70 unique stored procedures, each of which populate their own output table, written by another team and then consolidating all of that data into a dimensional model in a data warehouse. Many of these stored procedures do quite a bit of work and take several minutes to execute. Some run more quickly. Each of these procedures is independent of the others, so I am able to to run any of them in parallel with each other without affecting the results. This, will, however, affect server resources, so running all 70 at once would be a great way tick off the DBA. However, running them serially pretty much guarantees that I will exceed the 3 hour window in which my process must execute.

I decided to start by assigning each procedure to one of five processing silos, which I called Process Groups. I made these assignments in a table to allow for total control and tuning for best results. For example, if one of the procedures took far longer than the others, I could give that a Process Group all to itself and divide the others evenly across the remaining Process Groups. I actually created a procedure to automate that, but I will keep that for another post. For this post, I chose to just do three silos instead of five.

Note: I have included the SSIS solution as well as script that creates all necessary database objects and rows right here. I will only show code that directly impacts what I am trying to show here. Also, I am using SSIS 2014 for this post. At my client, I used SSIS 2012. While I am using a task that was introduced in SSIS 2012, namely the Expression Task, you can accomplish the same thing using a Script Task in prior versions of SSIS. So, I see no reason this could not work on 2008 R2 (and possibly earlier).

The TARDIS Child Package

I created a table called StoredProcedure with the following rows shown in Figure .

Figure 1

image_thumb1

The purpose of this table is to control the Process Group to which each stored procedure belongs. I also created each of those stored procedures. They each insert the Process Group value passed into them into their respective tables.

CREATE TABLE dbo.Table1 ( ProcessGroup tinyint , RecordCreateDate datetime2 DEFAULT CONVERT(datetime2,GetDate()) );

I created Table1 thru Table6 with this exact same pattern.

My child package is pretty simple, as shown in Figure 2.

Figure 2

image_thumb31

Figure 3 show the Variables in the package.

Figure 3

image_thumb5

 

The first step, marked with the 1 in Figure 2, is an Execute SQL Task that calls the GetStoredProcedureProcessGroup stored procedure which is shown below.

CREATE PROCEDURE dbo.GetStoredProcedureProcessGroup @ProcessGroup tinyint AS SELECT ProcedureName FROM dbo.StoredProcedure WHERE ProcessGroup = @ProcessGroup; GO

The task is configured to pass the value of the User::ProcessGroup variable as the @ProcessGroup parameter for the stored procedure, as shown in Figures 4 and 5.

Figure 4

image_thumb61

Figure 5

image_thumb7

Just to show how brilliantly this procedure works, let’s execute it.

EXEC dbo.GetStoredProcedureProcessGroup @ProcessGroup = 1

It produces the following results shown in Figure 6.

Figure 6

image_thumb

Hold your applause, please. Feel free to check this against Figure 1 to make sure these are the procedures configured for Process Group 1. We’ll wait.

We good? Cool.

When these results get returned to the package, they are stored in User::StoredProcedureList variable, which is of the System.Object data type.

The next step in the package, marked with the 2 in Figure 2, is a For Each Loop that iterates through each row residing in the User::StoredProcedureList variable. That configuration is shown in Figures 7 and 8.

Figure 7

image_thumb4

Figure 8

image_thumb3

The first time through the loop, the User::StoredProcedure package variable will get the value ‘dbo.InsTable1’ which is the first row shown in Figure 6.

Package execution then moves on to the “SQL Execute Stored Procedure For Process Group” Execute SQL Task, shown by the 3 in Figure 2. This task calls the ExecuteSingleStoredProcedure stored procedure, which is shown below.

CREATE PROCEDURE dbo.ExecuteSingleStoredProcedure @StoredProcedure nvarchar(256) , @ProcessGroup tinyint AS DECLARE @SQLString nvarchar(1000) SET @SQLString = N'EXEC ' + @StoredProcedure + ' @ProcessGroup = ' + CONVERT(nchar(1),@ProcessGroup) + ';' EXEC SP_EXECUTESQL @SQLString ; GO

The configuration of the Execute SQL Task is shown in Figures 9 and 10.

Figure 9

image_thumb51

Figure 10

image_thumb6

The User::StoreProcedure variable was populated by the For Each Loop. The User::ProcessGroup variable will be passed from the Parent Package. For now, if we assume that value is 1, we end up with the following stored procedure call passing to the database:

EXECUTE dbo.ExecuteSingleStoredProcedure @StoredProcedure = N'dbo.InsTable1' , @ProcessGroup = 1;

This will insert a single row into dbo.Table1.

Once that execution is complete, the For Each Loop starts over again for the next row in the User::StoredProcedureList variable, the record for dbo.InsTable4, and the following stored procedure call pass to the database:

EXECUTE dbo.ExecuteSingleStoredProcedure @StoredProcedure = N'dbo.InsTable4' , @ProcessGroup = 1;

Once that exection is complete, the For Each Loop detects there are no more records in the User::StoredProcedureList variable, and it finishes its work, reporting Success.

To see the resulting effect on the tables, we can run the following query:

SELECT 'Table1' AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table1 UNION ALL SELECT 'Table2'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table2 UNION ALL SELECT 'Table3'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table3 UNION ALL SELECT 'Table4'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table4 UNION ALL SELECT 'Table5'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table5 UNION ALL SELECT 'Table6'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table6 ORDER BY RecordCreateDate

This produces the results shown in Figure 11.

Figure 11

image

Since the stored procedures dbo.InsTable1 and dbo.InsTable4 were run serially, with dbo.InsTable1 going first, the RecordCreateDate for Table1 is earlier than that of Table4. Later, when we run all process groups, the RecordCreateDate will be the proof that the TARDIS solution works. The reason is that of the stored procedures, InsTable1 thru InsTable6, four of them have some manner of delay in their processing. For example, InsTable1 has the following definition:

CREATE PROCEDURE dbo.InsTable1 @ProcessGroup tinyint AS WAITFOR DELAY'00:00:20'; INSERT INTO dbo.Table1 (ProcessGroup) VALUES(@ProcessGroup); GO

Here I use the WAITFOR to cause execution to simply pause. In the case of InsTable1, I am pausing for 20 seconds. I took this approach just for this post in order to prove this solution works. InsTable1 will actually be the first procedure to get called. I put in the delay to make sure it finishes AFTER some procedures that are called later. With serial processing, this would be impossible. 

I mentioned that the TARDIS Child package would get the value of the User::ProcessGroup variable from the TARDIS Parent package. This is accomplished via a Parent Package Variable configuration, as shown in Figure 12.

Figure 12

image_thumb10

When the TARDIS Parent package tells the TARDIS Child package to execute, the TARDIS Child will look at the User::ProcessGroup variable on TARDIS Parent and take that value for its own User::ProcessGroup variable. The fact that the User::ProcessGroup variable of TARDIS Parent can only have a single value (at a time) meant I needed a way for each execution of TARDIS Child to get a different value. My solution ended up being pretty timey wimey.

The TARDIS Parent Package

This is where it gets really fun. First, I will just show you the TARDIS Parent package in Figure 13. Note that the numbers in Figure 13 denote the relative order in which I will discuss the tasks, not necessarily the order of execution.

Figure 13

image_thumb11

Step 1 merely involves truncating all of the output tables, Table1 thru Table6. Nothing to see here.

In Step 2, I use the Expression Task to set the value of the User::ProcessGroup variable to 1. The configuration of that task is shown in Figure 14.

Figure 14

image_thumb12

There isn’t really anything fancy about this. That said, I like the Expression Task. 🙂

Step 3 is the first Execute Package task calling TARDIS Child. Thanks to the Parent Package Variable configuration, TARDIS Child will get the value 1 for its User::ProcessGroup variable. The only setting I made here was in setting the connection manager to the TARDIS Child.dtsx file. That connection manager is shown in Figure 15.

Figure 15

image_thumb13

The Execute Package Tasks in Steps 3, 6, and 9 all use the same connection manager. Therefore, they all execute the same child package.

Step 4 is where the coolness comes in. When I originally started trying to solve this problem, I was attempting to come up with a way to have a single variable pass three different values to the child package. Then it hit me. There is NO reason I have to call the child package multiple times at EXACTLY the same moment. Like the TARDIS, it can exist in multiple dimensions, or time signatures, simultaneously. Therefore, all I really had to do was wait a bit, then change the value of the User::ProcessGroup variable before calling the next incarnation of the child package. Boom. This is accomplished using a Script Task. I got the code I needed for it from Mike Davis (Blog | Twitter) in this post on BIDN. The code for it is very simple, as shown in Figure 16.

Figure 16

image_thumb15

The Step A in the script adds the System.Threading to the references. This enables the use of the System.Threading.Thread.Sleep() method, which will essentially pause for the number of milliseconds provided. In this case, I am pausing 5000 milliseconds (5 seconds).

Once the wait is over, Step 5 of the TARDIS Parent package can change the value of the User::ProcessGroup variable to 2.

Step 6 then calls the TARDIS Child package a second time. But this time, then TARDIS Child gets a value of 2 for its User::ProcessGroup variable. As of that moment, TARDIS Child now essentially exists twice in Time and Space, just slightly offset.

Steps 7 thru 9 perform this same process, waiting 10 seconds instead of 5.

The wait is the key. What is also vital is that I do NOT have any precedence constraints between the separate calls of TARDIS Child. Thus, the second execution of TARDIS Child does not wait for the first. Nor does the third execution wait for the first or second. When the Sequence Container starts executing, the steps labeled 2, 4, and 7 all start at exactly the same time. This is shown in Figure 17, which took me several tries to capture properly.

Figure 17

image

The Tasks shown by the arrows all start at exactly the same time. Those waits in SCR Wait 5 Seconds and SCR Wait 10 Seconds cause my three executions of TARDIS Child to happen 5 seconds apart. Figure 18 was taken later in the execution.

Figure 18

image

Here you can see that, although the executions for Process Group 1 and Process Group 2 started before that of Process Group 3, the parallel nature of this solution meant that Process Group 3 could still finish first as it is independent of the other Process Groups. We can confirm this by querying the tables with the query we used earlier, ordering the results by RecordCreateDate.

SELECT 'Table1' AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table1 UNION ALL SELECT 'Table2'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table2 UNION ALL SELECT 'Table3'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table3 UNION ALL SELECT 'Table4'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table4 UNION ALL SELECT 'Table5'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table5 UNION ALL SELECT 'Table6'AS TableName, ProcessGroup, RecordCreateDate FROM dbo.Table6 ORDER BY RecordCreateDate

This returns the results shown in Figure 19.

Figure 19

image

Note the ProcessGroup column. This proves that, although the TARDIS Child package for Process Group 3 was executed last, it was able to finish first.

There you have it. I hope this can give you some ideas if you are ever in a similar situation. Bow ties are cool.

PASS Summit 2013 Interview with Matt Masson

At PASS Summit 2013 in Charlotte, I had the opportunity to sit down with Matt Masson (Blog|Twitter), Senior Program Manager on the Integration Services Team at Microsoft. I was really honored when Matt explained how busy his week was and then offered me a half hour anyway. I want to give a tremendous THANK YOU to Matt for being so generous with his time.

I had no grand plan/agenda for my series of interviews of Microsoft folk at PASS Summit 2013. As such, I plan to just display the transcript of my conversation with Matt as it occurred. NOTE: With Matt’s permission, I have edited out “Um” and “Ah” and other byproducts of casual conversation so that it flows better in writing.

The Transcript

Mark V

With the way things are going, with Cloud, and everything else going on, what does the future of SSIS development look like 5 years down the road?

Matt

I think five years out is a bit too far. We’re seeing a lot of big changes, especially around Hadoop. I think Hadoop and Big Data processing have been a big disruptor to the ETL space. I think there’s still a lot of what we call “traditional” ETL work, what people do today with SSIS. That’s where SSIS’ strength is. But we’re getting more and more requests about Cloud processing. That’s actually one of the things I’m going to talk about at PASS today, at the SSIS Roadmap session. One of the interesting things is, say, go back two or three years ago, we had people asking, “Can I have SSIS running in the cloud? Can you make SSIS run in the cloud?” And we’re like, “Yeah, that’s a great idea. Let’s go build it.” And then we started asking, “What scenarios?” and “Why do you want to run SSIS in the cloud?” Customers didn’t know. OK. Where’s your data? Data is all on prem. If your data’s all on prem, running in the cloud doesn’t necessarily make sense, right? I think, as we’re seeing a shift of more and more data to cloud sources, so they’re landing in places like Azure, or even pulling in from remote sites or pulling in from different cloud providers like Salesforce.com or something like that. If your data’s already IN the cloud, then doing your ETL processing closer to that data makes a lot of sense. So, today, you can run SSIS in an Azure VM and we’re having a lot of customers do that. So, you’re using your traditional On-Prem tools. It’s just running in the Cloud.

Other things we’re considering and looking at is, basically, what if SSIS could run as a service? What if you didn’t need your VMs? You could just deploy your packages and run things like that?

In addition to traditional ETL, we’re also looking at other technologies. There’s other data movement technologies out there like Azure Data Sync, which is very simple: I want to keep my On-prem databases and my Azure databases in sync. So, you don’t need a full ETL framework. You don’t need an ETL developer. Sync just takes care of it for you automatically.

So that leads us to a couple of different angles. We’re trying to make ETL easier, more automatic. Just keep schemas in sync. While for the more advances scenarios, your traditional ETL scenarios, SSIS still makes a lot of sense. We need to evolve SSIS to better fit in the “Cloud” world.

Then there’s Big Data and Big Data processing. You’re seeing an of evolution of technologies on Hadoop, right? There’s a lot of different technologies, lots of things going on. You’re seeing lots of tools at different stages of maturity. It’s a really interesting space to see how it’s evolving. One of the things I’m going to talk about today is to show SSIS integration with HDInsight, for example. So, from SSIS, you can provision HDInsight clusters, you can run Hive jobs, Pig jobs. You basically orchestrate everything you want to do on Hadoop from SSIS. You get the nice visual experience which is lacking from Hadoop and the Big Data system today.

Mark V

So, when you think about Hadoop, and the Cloud, and the Democratization of data; bringing BI to the Masses; the revolution of Self-Serve, one of the things you have is Users looking at data that they may not know how to vet properly. So, when I think of tools like DQS (Data Quality Services) that are often integrated into ETL, what are some of the things that we could look for in the future? Not necessarily products, but just concepts for how Microsoft is going to help handle that with moving data around to enable that Self-Service, while still keeping it easy to get to.

Matt

So, Self-Service is an interesting space. We have Power Query coming out, which gives you self-service, light-weight ETL. I think our self-service vision has been resonating really well. We’re seeing more and more customers picking up on that. But, just like there’s a space for Self-Service BI, but also a need for traditional BI modelers to take that raw data into a model concept so that the “self-service” people can actually build their reports from there, I think the same thing applies in the ETL space as well. There’s Power Query for that light-weight, self-serve ETL, but there’s still the need for traditional ETL development as well for IT to automate these processes, make them reliable, do the complex transformations, apply business logic, apply filtering, etc. I think there’s going to be that “professional” or “corporate” ETL as well as self-serve ETL. That challenge for us is figuring out whether that is a single tool that does both; perhaps a single tool with different faces or personas, for different roles. I think we’re going to see a lot of convergence in our tools going forward. I think one of Microsoft’s strengths is the rapid time to results, making it as easy as possible to get it, and also have that functionality there that you can extend to do the more complex ETL scenarios as well.

Mark V

One of the other things you’re really known for is the BI Power Hour. Can you talk a little bit about how that was born and how it’s evolved and what it’s like to be a part of something like that?

Matt

Sure. The BI Power Hour is really interesting and I was nowhere near the beginning of it. I think it was Bob Baker who started the original Power Hour and it was focused around Office BI. And then the SQL folks eventually took over. But the idea was to let the Product Team have fun and show off the power of the products in your non-typical scenarios, with no business value whatsoever. And we’ve sort of made it more and more ridiculous as time goes on. There are certain teams, like Reporting Services, that have always been there since the beginning, and they always did a game. Every year they did a game. I think they did Tic Tac Toe, and then Hangman; the game got more and more complex as they went through the years. I think I saw my first Power Hour in 2009 and I immediately wanted to be a part of it. I had never seen one before and I just thought it was really exciting. And the next year, I asked the organizer, Pej Javaheri, if I could participate. He wasn’t sure; “SSIS doesn’t usually do a Power Hour” and “it’s not very interesting.” So, I decided to prove him wrong. Since Pej left Microsoft, I’ve taken over the Power Hour. I do most of the coordinating and stuff. It’s always really interesting to make sure there is a business message there. We’re not as explicit about it anymore. But, afterwards, we always have people coming up to us and saying, “I didn’t know the tools could do that” and “I want to know more.” That’s really the whole point, essentially. And if we can get laughs doing it, then that’s even better. We usually try to balance out presenters showing new technology, show off some valuable things. I typically just do ridiculous demos. I have a whole story that goes along with it. It’s a lot of fun. The hardest part is justifying the days of work that goes into a ten minute demo.

Mark V

It’s really exciting to see people who were involved in building the tools and are just so excited about features getting to go play with them.

Matt

With my demos, which usually revolve around cats, I had spent some time in SSIS and built some custom transformations. I’ve had someone ask me afterwards, “Why do you spend so much time on this? Why aren’t you doing work for the real product?” Yeah… it is a good point, but usually I limit Power Hour stuff to my “free time.” So flights, at home, things like that is usually when I work on those things. I try to really time box it, to justify to myself, devoting time to this really fun thing.

Mark V

When I saw you at TechEd and you were talking about the SSIS Catalog, one of the things you said was that there was some debate within Microsoft regarding the Package Deployment Model and the new Project Deployment Model. Even within the team, people were arguing about which way to go, and you were finally brought around to the Project Deployment Model. Is that something that is common when you are getting features ready for a product that you have that kind of debate? Is there a lot of that?

Matt

Yes, there’s a LOT of debate. The bigger the team, the more debate there is. 2012 was really interesting because that was as big as the SSIS team has really been. We actually had half our team located in Shanghai and they were really driving the Server components. And half our team located in Redmond. So, doing the coordination and making sure both teams agreed on the scenarios of what we were trying go toward was really important. Doing development is all about resource constraints, right? You have a ton of stuff you want to do and you have to figure out, “Where is my time best spent?” Sometimes you’re making guesses. If you only do exactly what the customers want, you’re not necessarily moving your platform forward far enough. If we only focused on bug fixing, we probably wouldn’t have gotten a lot of the great functionality that we did out of 2012.

Mark V

…And the rounded corners…

Matt

Well, the rounded corners, yeah. Actually the rounded corners joke was just a random Power Hour joke that I just came up with on the fly. I’ve been using it since. Although I was in somebody’s session and they spent ten minutes building up that joke and it was really painful to watch. But the rounded corners was just WPF, that’s just the way it looked. But I made the joke about Interns coming in and sanding down the corners for three months. And I actually had an angry customer come up to me afterwards and say, “You guys spent three months working on rounded corners and yet you didn’t fix the Web Services Task” and storm off. “It was a JOKE!” At PASS, people usually get that something’s a joke. At Tech Ed, people expect Microsoft presenters to be more serious and jokes don’t always go over well.

Mark V

Even at a BI Power Hour?

Matt

When I did my first BI Power Hour at Tech Ed, I got a standing ovation when I did some of my lines, not because it was a great presentation, but I think the line was “I’m a programmer. What do I need real friends for when I can create them programmatically?” Standing ovation. And it wasn’t because it was funny. It was because the audience felt the same way. And I just felt really sad at that point. And the next day, I had people coming up to me offering to be my friend and saying, “I don’t have any friends on Facebook either. I had to stop using it.” And they just didn’t get that it was a joke. I did my Power Hour at the Boston user group and nobody laughed. There were some chuckles, but that was it. But then I realized afterwards, when I was talking with somebody else, that the audience actually thought it was real and that they felt sorry for me. So, they didn’t know they were supposed to laugh.

Back to planning. There are definitely different viewpoints on the team. One thing was related to Package Deployment versus Project Deployment. Every time you change functionality, but keep supporting a feature, your Test Matrix increases. So, the number of scenarios you have to test goes up. And we were really short on Test resources. And you can’t release something unless it’s properly tested. So, at one point, they wanted to say “No more Package Deployment Model; we’re just going to do Project because it means we can add more functionality because we’re not supporting these other things anymore.” It just did not make sense to take approach. I think the thing I had mentioned at Tech Ed was Single Package Deployment versus Full Package deployment. Long debates. But it came down to the architectural difference. We showed how much it would cost to implement Single Package Deployment and how much it would cost without. If it’s an extra month in development time, how many bugs can we fix in a month? How many other improvements can we make in a month? So, it’s a balancing act. I still think it’s the right decision. At the same time that we’re making those decisions internally, we’re talking to our MVPs, getting their feedback. I know the MVPs felt really strongly about Project Deployment, keeping it all together. And we were trusting in that. They’re basically the voice of our customers.

Wrapping Up

With Matt being so busy, and prepping for a session, I left the interview off there.

I have only had the chance to use SSIS 2012 one one project. And even with that small taste of this fabulous tool, I was tempted to just give Matt some applause and call it a day. I really appreciate the work and time that went into making SSIS 2012 such a tremendous improvement over previous versions of Integration Services.

I think Matt made some really great points here. The Big Data revolution was certainly a “disruptor” to common ETL. When dealing with data that is aging too quickly or in quantities that make taking the time to bring it into a data warehouse impractical, that certainly would disrupt common thinking around traditional ETL. While, as Matt points out, the need for traditional ETL will remain, there is some need on the part of those of us in the industry to re-assess what ETL looks like in some cases. It’s not always going to be a series of SSIS packages running on a server and populating a data warehouse. Sometimes, it will be information workers using Power Query to bring data from many sources into Excel.

As far as the Power Hour, that holds so many features that I strive to put into my own presentations. Humor is a huge one. There is a lot of research that shows that people learn better when they are having fun. Not to mention that an audience that is having a good time is less likely to throw rotten tomatoes; they stain, you know. Combine that with using features of the tools in creative ways, and you’ve really got something. I love finding new and exciting uses for technology. I often think of Ed Harris’ great line as NASA’s Gene Krantz in Apollo 13, “I don’t care what anything was DESIGNED to do; I care what it CAN do.”

I liked hearing from Matt that there is often a lot of debate within the SSIS team when it comes to features. it should remind all of us of time spent on project teams in our own work. The point this raises is that we need to remember that Microsoft, like any other organization, has finite resources that need to be spent in the best way they can. I hope we can all keep that in mind when we wonder why certain features haven’t gotten much love or don’t work the way we would want them to.

Matt’s point about MVPs is an important one. Along with what prestige may come from receiving the MVP award, there is also responsibility to serve as a voice for the Community as a whole. Being an MVP is not about getting to wear that MVP ribbon at Summit or a pretty trophy; it’s about leadership, with benefits and obligations along with it.

That brings us to the end. Even though my second interview was with Kamal Hathi, that happens to be the longest one as well. Since I have the typing skills of a rainbow trout, transcribing the audio for these interviews is a long process. As such, I will aim to have the post on my interview with Kasper de Jonge (Blog|Twitter) next week and the one with Kamal the week after. Thanks for your patience.

Making Your SSIS Configuration Files More Readable

When you first create a configuration file in your Integration Services project, the XML contents of the file are just laid out in one long line. This makes reading and/or modifying this file more challenging than it needs to be.

There is an easy way to format your file to alleviate this issue. In BIDS, go to File/Open/File, as in the image below:

image

Browse to your configuration file and select it to open within BIDS.

My brand new configuration file looks like this image below:

image

You see that the contents are all on one line. Bleh.

To fix that, you can go to Edit/Advanced/Format Document, as shown below:

image

I have done that to my config file and now it looks like this:

image

As you can see, this is much easier to deal with. Just save this document and close it. Now, even when you open it in Notepad, that formatting persists.

image

This little trick can make dealing with your configurations files a bit easier.

Saving Indiana Jones (and your SSIS packages) From “Bad Dates”

I have been working on a project modifying the ETL for a data warehouse. The source is a Pervasive database which, through SSIS 2005 packages, populates a SQL Server 2005 data warehouse and associated SSAS 2005 cubes. I encountered an issue which reminded me of a scene in the first Indiana Jones film, Raiders of the Lost Ark. If you had a nickel for every time you heard of SSIS reminding someone of Indiana Jones….you’d now have….a nickel. Yeah. I’m kind of different.

It was a scene in Cairo where Indy and Sallah had taken the headpiece to the Staff of Ra to an old man who they asked to translate the markings on the object. While they are talking, Indy tosses a date into the air to catch it in his mouth. Sallah, noticing a victim of the dates (poisoned moments before by a nefarious character) laying dead, snatches the date from the air just above Indy’s mouth with the line, “Bad dates.” Sallah saves Indy’s life by preventing the bad date from being consumed. Sounds a lot like an SSIS data flow to me, and you may be starting to see my connection.

In my situation, the source system had allowed invalid dates to populate columns with Date data type. These dates had a year of 0000. My client wanted me to route the records with invalid dates to a separate table for later analysis. Ok. Cool. I would just use the ISDATE() function in SSIS to separate out the records with “bad dates.” Easy.

<<insert cool John Williams music as I proceed to attempt to use ISDATE in SSIS only to find that it does not natively exist>>

Now what was I going to do? “Don’t know. I’ll think of something.”

I played around with using the Data Conversion transformation, planning to route the invalid dates out the Error output . That was not working for me. Instead, it was seeing the year 0000 as 2000 and just letting all the records out the Success output. Suboptimal.

So, I reached out to the SQL community on Twitter (using the #sqlhelp and #ssis tags) asking if there was an elegant way to perform ISDATE() type operations in SSIS. I was thinking I may end up doing some scripting (not my strong suit to be sure) but was trying to see if there was a cool way to do it without scripting. Looking back at that, I should have embraced the opportunity to learn more about scripting instead of trying to stick with the tools I know better. Josef Richberg (blog: http://josef-richberg.squarespace.com/, Twitter: @sqlrunner) responded by suggesting I look at VB functions. That was the final little push I needed to dig into the Script Component (transformation). And I am glad for having received it.

I started by changing my data source query to convert the date field involved to a string before bringing it into the data flow. Here is my example to illustrate.

We will create tables to play with in the Adventureworks database, in the markvsql schema.

USE Adventureworks

GO

 

--Create the markvsql schema if it does not already exist

IF NOT EXISTS

(

    SELECT * FROM sys.schemas WHERE [name] = 'markvsql'

)

EXEC sp_executesql N'CREATE SCHEMA markvsql'

GO

 

CREATE TABLE markvsql.isdatessis2005_source

(

      checkdate_string varchar(30) NULL

)

 

CREATE TABLE markvsql.isdatessis2005_destination

(

      checkdate_string varchar(30) NULL

    , checkdate datetime NULL

)

 

CREATE TABLE markvsql.isdatessis2005_errorlog

(

      checkdate_string varchar(30) NULL

    , failreason varchar(100) NULL

)

 

/*insert the good date*/

INSERT INTO markvsql.isdatessis2005_source (checkdate_string) 

VALUES('2010-01-01 00:00:00.000')

/*insert the bad date*/

INSERT INTO markvsql.isdatessis2005_source (checkdate_string) 

VALUES('0000-01-01 00:00:00.000')

You will note that the second record has a checkdate_string value with a 0000 in the year position. If we take a moment to try converting our checkdate_string field to a date, we will see the problem.

SELECT

      checkdate_string

    , CONVERT(datetime, checkdate_string) AS checkdate

FROM markvsql.isdatessis2005_source

This query leads to the following error message:

Msg 242, Level 16, State 3, Line 1

The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.

If we were keeping this all in T-SQL, we could just use the ISDATE() function and be on our way. However, SSIS lacks this function natively. Thus, I used the Script Component (transformation) to check for invalid dates, populating a new boolean Output column with the result of the VB ISDATE() function. I then used the Conditional Split to route the valid dates to my destination table and the invalid ones to my errorlog table based on the new boolean field’s value. This worked quite well for me and I was astounded at how easy it was to implement.

Here is a screenshot of my data flow, which is pretty much self documenting.

image

For those unfamiliar with the Script Component (transformation), I was, before doing this, you must choose whether the component will be used as a Source, Destination, or Transformation when you add it to your data flow. For the purposes of this process, you must choose Transformation. Here are screenshots of the Script Component (transformation) properties.

image

The only input column in our example is the checkdate_string field. The Usage Type defaults to read only and I left it that way.

image

The checkdate_string_isdate field was added as a new column (via Add Column button) and assigned the Boolean Data Type since the VB ISDATE() function returns a boolean result.

image

Notice the single line of code:

Row.checkdatestringisdate = IsDate(Row.checkdatestring).

That is seriously the only line of code I had to write. And, thanks to Intellisense, I bagged it on the first try. All it does is set that value of our new boolean output field to the result of the IsDate() function.

When we run our package, we see that our Script Component (transformation) and Conditional Split work just fine indeed.

image

That is really all there is to it. This same technique also works in SSIS 2008, although to use the VB scripting, you will need to change the scripting language in the Script transform from C# (the default) to VB. I poked around with doing this in C# for this post and soon decided that, for the moment, I will leave that to others.

When there are bad dates to be dealt with in a house in Cairo, then Sallah is your man. For SSIS, you can use the Script Component (transformation).

I have included a zip of my SSIS 2005 solution which also includes the .sql file featuring my code snippets shown above. Note that you will need to change the SQL instance in the Adventure Works data source to match your own.

ISDATE Functionality In SSIS Solution Download

Incorrect Information in SSIS 2005 Send Mail Task Tooltip For Attachments Property

I was trying to dynamically send multiple attachments via the Send Mail Task in a package.

In the Send Mail Task Editor, when clicking on the Attachments property field, the tool tip tells you that the file names must be separated by semicolons:

“Specifies the files to include as attachments to the message. Separate the file names with semicolons.”

Alas, a semicolon did not work for me. It was not until I separated my file names with a | (pipe) that my package ran successfully and emailed the files.

So, use a | to separate files in the Attachments property of the Send Mail Task in SSIS 2005; not a semi-colon.