At PASS Summit 2013 in Charlotte, I had the opportunity to sit down with Matt Masson (Blog|Twitter), Senior Program Manager on the Integration Services Team at Microsoft. I was really honored when Matt explained how busy his week was and then offered me a half hour anyway. I want to give a tremendous THANK YOU to Matt for being so generous with his time.
I had no grand plan/agenda for my series of interviews of Microsoft folk at PASS Summit 2013. As such, I plan to just display the transcript of my conversation with Matt as it occurred. NOTE: With Matt’s permission, I have edited out “Um” and “Ah” and other byproducts of casual conversation so that it flows better in writing.
With the way things are going, with Cloud, and everything else going on, what does the future of SSIS development look like 5 years down the road?
I think five years out is a bit too far. We’re seeing a lot of big changes, especially around Hadoop. I think Hadoop and Big Data processing have been a big disruptor to the ETL space. I think there’s still a lot of what we call “traditional” ETL work, what people do today with SSIS. That’s where SSIS’ strength is. But we’re getting more and more requests about Cloud processing. That’s actually one of the things I’m going to talk about at PASS today, at the SSIS Roadmap session. One of the interesting things is, say, go back two or three years ago, we had people asking, “Can I have SSIS running in the cloud? Can you make SSIS run in the cloud?” And we’re like, “Yeah, that’s a great idea. Let’s go build it.” And then we started asking, “What scenarios?” and “Why do you want to run SSIS in the cloud?” Customers didn’t know. OK. Where’s your data? Data is all on prem. If your data’s all on prem, running in the cloud doesn’t necessarily make sense, right? I think, as we’re seeing a shift of more and more data to cloud sources, so they’re landing in places like Azure, or even pulling in from remote sites or pulling in from different cloud providers like Salesforce.com or something like that. If your data’s already IN the cloud, then doing your ETL processing closer to that data makes a lot of sense. So, today, you can run SSIS in an Azure VM and we’re having a lot of customers do that. So, you’re using your traditional On-Prem tools. It’s just running in the Cloud.
Other things we’re considering and looking at is, basically, what if SSIS could run as a service? What if you didn’t need your VMs? You could just deploy your packages and run things like that?
In addition to traditional ETL, we’re also looking at other technologies. There’s other data movement technologies out there like Azure Data Sync, which is very simple: I want to keep my On-prem databases and my Azure databases in sync. So, you don’t need a full ETL framework. You don’t need an ETL developer. Sync just takes care of it for you automatically.
So that leads us to a couple of different angles. We’re trying to make ETL easier, more automatic. Just keep schemas in sync. While for the more advances scenarios, your traditional ETL scenarios, SSIS still makes a lot of sense. We need to evolve SSIS to better fit in the “Cloud” world.
Then there’s Big Data and Big Data processing. You’re seeing an of evolution of technologies on Hadoop, right? There’s a lot of different technologies, lots of things going on. You’re seeing lots of tools at different stages of maturity. It’s a really interesting space to see how it’s evolving. One of the things I’m going to talk about today is to show SSIS integration with HDInsight, for example. So, from SSIS, you can provision HDInsight clusters, you can run Hive jobs, Pig jobs. You basically orchestrate everything you want to do on Hadoop from SSIS. You get the nice visual experience which is lacking from Hadoop and the Big Data system today.
So, when you think about Hadoop, and the Cloud, and the Democratization of data; bringing BI to the Masses; the revolution of Self-Serve, one of the things you have is Users looking at data that they may not know how to vet properly. So, when I think of tools like DQS (Data Quality Services) that are often integrated into ETL, what are some of the things that we could look for in the future? Not necessarily products, but just concepts for how Microsoft is going to help handle that with moving data around to enable that Self-Service, while still keeping it easy to get to.
So, Self-Service is an interesting space. We have Power Query coming out, which gives you self-service, light-weight ETL. I think our self-service vision has been resonating really well. We’re seeing more and more customers picking up on that. But, just like there’s a space for Self-Service BI, but also a need for traditional BI modelers to take that raw data into a model concept so that the “self-service” people can actually build their reports from there, I think the same thing applies in the ETL space as well. There’s Power Query for that light-weight, self-serve ETL, but there’s still the need for traditional ETL development as well for IT to automate these processes, make them reliable, do the complex transformations, apply business logic, apply filtering, etc. I think there’s going to be that “professional” or “corporate” ETL as well as self-serve ETL. That challenge for us is figuring out whether that is a single tool that does both; perhaps a single tool with different faces or personas, for different roles. I think we’re going to see a lot of convergence in our tools going forward. I think one of Microsoft’s strengths is the rapid time to results, making it as easy as possible to get it, and also have that functionality there that you can extend to do the more complex ETL scenarios as well.
One of the other things you’re really known for is the BI Power Hour. Can you talk a little bit about how that was born and how it’s evolved and what it’s like to be a part of something like that?
Sure. The BI Power Hour is really interesting and I was nowhere near the beginning of it. I think it was Bob Baker who started the original Power Hour and it was focused around Office BI. And then the SQL folks eventually took over. But the idea was to let the Product Team have fun and show off the power of the products in your non-typical scenarios, with no business value whatsoever. And we’ve sort of made it more and more ridiculous as time goes on. There are certain teams, like Reporting Services, that have always been there since the beginning, and they always did a game. Every year they did a game. I think they did Tic Tac Toe, and then Hangman; the game got more and more complex as they went through the years. I think I saw my first Power Hour in 2009 and I immediately wanted to be a part of it. I had never seen one before and I just thought it was really exciting. And the next year, I asked the organizer, Pej Javaheri, if I could participate. He wasn’t sure; “SSIS doesn’t usually do a Power Hour” and “it’s not very interesting.” So, I decided to prove him wrong. Since Pej left Microsoft, I’ve taken over the Power Hour. I do most of the coordinating and stuff. It’s always really interesting to make sure there is a business message there. We’re not as explicit about it anymore. But, afterwards, we always have people coming up to us and saying, “I didn’t know the tools could do that” and “I want to know more.” That’s really the whole point, essentially. And if we can get laughs doing it, then that’s even better. We usually try to balance out presenters showing new technology, show off some valuable things. I typically just do ridiculous demos. I have a whole story that goes along with it. It’s a lot of fun. The hardest part is justifying the days of work that goes into a ten minute demo.
It’s really exciting to see people who were involved in building the tools and are just so excited about features getting to go play with them.
With my demos, which usually revolve around cats, I had spent some time in SSIS and built some custom transformations. I’ve had someone ask me afterwards, “Why do you spend so much time on this? Why aren’t you doing work for the real product?” Yeah… it is a good point, but usually I limit Power Hour stuff to my “free time.” So flights, at home, things like that is usually when I work on those things. I try to really time box it, to justify to myself, devoting time to this really fun thing.
When I saw you at TechEd and you were talking about the SSIS Catalog, one of the things you said was that there was some debate within Microsoft regarding the Package Deployment Model and the new Project Deployment Model. Even within the team, people were arguing about which way to go, and you were finally brought around to the Project Deployment Model. Is that something that is common when you are getting features ready for a product that you have that kind of debate? Is there a lot of that?
Yes, there’s a LOT of debate. The bigger the team, the more debate there is. 2012 was really interesting because that was as big as the SSIS team has really been. We actually had half our team located in Shanghai and they were really driving the Server components. And half our team located in Redmond. So, doing the coordination and making sure both teams agreed on the scenarios of what we were trying go toward was really important. Doing development is all about resource constraints, right? You have a ton of stuff you want to do and you have to figure out, “Where is my time best spent?” Sometimes you’re making guesses. If you only do exactly what the customers want, you’re not necessarily moving your platform forward far enough. If we only focused on bug fixing, we probably wouldn’t have gotten a lot of the great functionality that we did out of 2012.
…And the rounded corners…
Well, the rounded corners, yeah. Actually the rounded corners joke was just a random Power Hour joke that I just came up with on the fly. I’ve been using it since. Although I was in somebody’s session and they spent ten minutes building up that joke and it was really painful to watch. But the rounded corners was just WPF, that’s just the way it looked. But I made the joke about Interns coming in and sanding down the corners for three months. And I actually had an angry customer come up to me afterwards and say, “You guys spent three months working on rounded corners and yet you didn’t fix the Web Services Task” and storm off. “It was a JOKE!” At PASS, people usually get that something’s a joke. At Tech Ed, people expect Microsoft presenters to be more serious and jokes don’t always go over well.
Even at a BI Power Hour?
When I did my first BI Power Hour at Tech Ed, I got a standing ovation when I did some of my lines, not because it was a great presentation, but I think the line was “I’m a programmer. What do I need real friends for when I can create them programmatically?” Standing ovation. And it wasn’t because it was funny. It was because the audience felt the same way. And I just felt really sad at that point. And the next day, I had people coming up to me offering to be my friend and saying, “I don’t have any friends on Facebook either. I had to stop using it.” And they just didn’t get that it was a joke. I did my Power Hour at the Boston user group and nobody laughed. There were some chuckles, but that was it. But then I realized afterwards, when I was talking with somebody else, that the audience actually thought it was real and that they felt sorry for me. So, they didn’t know they were supposed to laugh.
Back to planning. There are definitely different viewpoints on the team. One thing was related to Package Deployment versus Project Deployment. Every time you change functionality, but keep supporting a feature, your Test Matrix increases. So, the number of scenarios you have to test goes up. And we were really short on Test resources. And you can’t release something unless it’s properly tested. So, at one point, they wanted to say “No more Package Deployment Model; we’re just going to do Project because it means we can add more functionality because we’re not supporting these other things anymore.” It just did not make sense to take approach. I think the thing I had mentioned at Tech Ed was Single Package Deployment versus Full Package deployment. Long debates. But it came down to the architectural difference. We showed how much it would cost to implement Single Package Deployment and how much it would cost without. If it’s an extra month in development time, how many bugs can we fix in a month? How many other improvements can we make in a month? So, it’s a balancing act. I still think it’s the right decision. At the same time that we’re making those decisions internally, we’re talking to our MVPs, getting their feedback. I know the MVPs felt really strongly about Project Deployment, keeping it all together. And we were trusting in that. They’re basically the voice of our customers.
With Matt being so busy, and prepping for a session, I left the interview off there.
I have only had the chance to use SSIS 2012 one one project. And even with that small taste of this fabulous tool, I was tempted to just give Matt some applause and call it a day. I really appreciate the work and time that went into making SSIS 2012 such a tremendous improvement over previous versions of Integration Services.
I think Matt made some really great points here. The Big Data revolution was certainly a “disruptor” to common ETL. When dealing with data that is aging too quickly or in quantities that make taking the time to bring it into a data warehouse impractical, that certainly would disrupt common thinking around traditional ETL. While, as Matt points out, the need for traditional ETL will remain, there is some need on the part of those of us in the industry to re-assess what ETL looks like in some cases. It’s not always going to be a series of SSIS packages running on a server and populating a data warehouse. Sometimes, it will be information workers using Power Query to bring data from many sources into Excel.
As far as the Power Hour, that holds so many features that I strive to put into my own presentations. Humor is a huge one. There is a lot of research that shows that people learn better when they are having fun. Not to mention that an audience that is having a good time is less likely to throw rotten tomatoes; they stain, you know. Combine that with using features of the tools in creative ways, and you’ve really got something. I love finding new and exciting uses for technology. I often think of Ed Harris’ great line as NASA’s Gene Krantz in Apollo 13, “I don’t care what anything was DESIGNED to do; I care what it CAN do.”
I liked hearing from Matt that there is often a lot of debate within the SSIS team when it comes to features. it should remind all of us of time spent on project teams in our own work. The point this raises is that we need to remember that Microsoft, like any other organization, has finite resources that need to be spent in the best way they can. I hope we can all keep that in mind when we wonder why certain features haven’t gotten much love or don’t work the way we would want them to.
Matt’s point about MVPs is an important one. Along with what prestige may come from receiving the MVP award, there is also responsibility to serve as a voice for the Community as a whole. Being an MVP is not about getting to wear that MVP ribbon at Summit or a pretty trophy; it’s about leadership, with benefits and obligations along with it.
That brings us to the end. Even though my second interview was with Kamal Hathi, that happens to be the longest one as well. Since I have the typing skills of a rainbow trout, transcribing the audio for these interviews is a long process. As such, I will aim to have the post on my interview with Kasper de Jonge (Blog|Twitter) next week and the one with Kamal the week after. Thanks for your patience.