Power Query Decathlon – Beginner 09: Loading Results

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

Once you have constructed your query, it is time to Close & Load it. This is the process of saving the query for use and also, optionally, executing it at the same time.

You have a total of 4 options for loading your query results:

  • Load it to the Excel Data Model (Power Pivot) in the same Excel workbook housing the query (This is the option I use most frequently and recommend highly)
  • Load it to a worksheet in the same Excel workbook housing the query (the Default)
  • Load it to the Excel Data Model AND a worksheet
  • Load it to neither of the above options (load it nowhere at all, just creating the query)

Note: Up to this point, I have kept the default name for the query we have used, Query1. I have closed and loaded it several times with this name. However, I advise you, with your own queries, to give them appropriate names before you Close & Load them. I just wanted to keep the focus of each post narrow and didn’t want to deal with it until now.

Load to Data Model

We will start this post by opening our existing query and giving it a proper name.

1. Double-click the query Query1 you see in the Workbook Queries pane of the Power Query ribbon, as shown in Figure 1.

Figure 1

image

This will open the Query Editor shown in Figure 2.

Figure 2

image

The name of the query is found in the PROPERTIES section of the Query Settings pane, just above the APPLIED STEPS. The query name will determine the name of the table when the data is loaded. For example, so far, I have been loading my query to the Data Model. So, if I open my Power Pivot management window in this file, we can see there is a single table there called Query1, as shown in Figure 3.

Note: I am not expecting you to take this particular step. I show this merely for illustration.

Figure 3

image

This is the final destination for the data that runs through the Query1 query.

2. Give your query an appropriate name in the PROPERTIES, such as Laureate. An example is shown in Figure 4.

Figure 4

image

3. Click the Close & Load button, shown by the arrow in Figure 4. This will execute the query and load its results once again to the Excel Data Model. This does not duplicate what was already there for this query. Rather, it discards the data that was there and populates it again. The results are shown in Figure 5.

Figure 5

image

Grand. The renaming of the query carried to the Workbook Queries, as you would expect. However, it did NOT carry through to the Data Model. Figure 6 shows my data model again, this time after the renaming of the query.

Figure 6

image

Hm. The name of the table is still Query1, the original name. At first, I was disappointed that the renaming did not carry into the data model. Then I realized it was a very good thing that the original table name remained. Imagine I had created lots of DAX Calculated Columns and Calculated Measures referencing this table. If the table then got renamed, those calculated columns and fields would be in danger of breaking since the table name they were written to use would no longer exist. That could break my entire data model. So, the fact that the table in the Data Model retained its original name is a very important safety feature.

This also speaks, once again, to my recommendation for naming your queries before you load them anywhere. If I had named my query Laureate before that first Close & Load way back in the first post, then my table here in the Data Model would be named Laureate and all would be well. As it stands, since we will not be doing anything with this Data Model, I will not worry about it.

Figure 7 shows a different view of the of the Excel file, allowing you to see the entire file.

Figure 7

image

Notice there is only a single worksheet, Sheet1, and that it has no data. This is because I have thus far been loading to the Data Model alone.

Load To Worksheet

We can change the loading destination for an existing query. There are a few options for how to accomplish this. For now, we will use the Right-Click method.

4. Right-Click the Laureate query. This will reveal the menu shown in Figure 8.

Figure 8

image

5. Choose the Load To option shown by the arrow in Figure 8. This will reveal the Load Options dialog shown in Figure 9.

Figure 9

image

6. Click the checkbox next to Load to Worksheet to select it.

7. Click the checkbox next to Load to Data Model to deselect it. The Load Options dialog should now resemble Figure 10.

Figure 10

image

8. Click OK. This will reveal the warning shown in Figure 11.

Figure 11

image

Power Query knows that we have previously loaded the data resulting from the query to the Data Model. Since we have just deselected the Data Model as an option in Step 7, Power Query is letting us know that the table in the Data Model corresponding to this query will be deleted. Yikes. This is a good safety measure. In our case, losing that table is fine.

9. Click Continue. This will execute the query and load it to a table in a new worksheet, as shown in Figure 12.

Figure 12

image

Load To Data Model AND a Worksheet

I first want to say that I have yet to come across a situation where it made sense to me to load data to the Data Model AND a Worksheet from a single query. However, it is an option, so I will show you how.

I mentioned earlier that there was more than one way to change where a query loads. Above, I showed the Right-Click option. There is another way to do it that does not require a right-click and is great for devices that do not support right-clicking.

10. Hover the mouse cursor of the Laureate query in the Workbook Queries pane. This will activate a little preview pane for the query, called the Peek, as shown in Figure 13.

Figure 13

image

11. Click the ellipsis shown by the arrow in Figure 13. This will reveal the menu shown in Figure 14.

Figure 14

image

12. Click the Load To option shown by the arrow in Figure 14. This will activate the Load Options dialog shown in Figure 15.

Figure 15

image

13. Click the checkbox for Load to Data Model. The Load Options box should now resemble
Figure 16.

Figure 16

image

14. Click Load. This will reveal the warning shown in Figure 17.

Figure 17

image

This warning is a bit different than the one we encountered above in Figure 11. It tells us that changing the load options to add the Data Model as a target will cause Power Query to rebuild the table in the Worksheet, thereby breaking references to that table in expressions, etc. This is not a concern in this example.

15. Click Continue. This will produce the result shown in Figure 18.

Figure 18

image

This is not a step you need to take here, but I want to show that there is now a Laureate table in the Data Model. This is shown in Figure 19.

Figure 19

image

Loading Nowhere

Loading to neither the Data Model nor to a Worksheet can be a fantastic option in some circumstances. It can be great when you want to turn a single large query into separate tables in a Data Model. This is also a fantastic option when you want to bring one column from a source into an existing query and have that data reside in a single table in your Data Model. There are other situations as well.

16. Right-click the Laureate query in the Workbook Queries pane to show the menu in Figure 20.

Figure 20

image

This will activate the Load Options dialog shown in Figure 21.

Figure 21

image

17. Click the checkboxes next to both options to deselect both. The Load Options dialog should resemble Figure 22.

Figure 22

image

18. Click Load. This will reveal the warning shown in Figure 23.

Figure 23

image

Once again, the text of the warning is contextually appropriate to the action being taken. That is very helpful. Not all Microsoft products have error messages that are this clear.

19. Click Continue. This will produce the result shown in Figure 24.

Figure 24

image

You will note that this operation was quicker than the others. The reason is that the query did not execute as there was no place to put the results. Thus, the query indicates “Load is disabled.” All of the steps in the query we created throughout the series are there. They just have no destination at the moment.

Changing Close & Load Default Action

By default when installing Power Query, at least as of the November 2014 release, the default behavior for the Close & Load button is to load to a Worksheet alone. The poor Data Model gets nothing. However, as a general best practice, I highly recommend loading to the Data Model and not a Worksheet. Reasons for this include the far better data compression you get when using the Data Model. Also, even in Excel 2013, the row limit for a Workbook is just over 1 million rows. The Data Model has no such limitation. There are factors that limit the amount of data overall, but no hard limit on rows. Add to that the fact that loading to the Data Model allows you to using DAX with your data and the choice is clear. As such, I find it convenient to change the default behavior of the Close & Load button to suit this preference.

The Power Query ribbon has a section called Machine Settings. This is where you can make this change.

Figure 25

image 

20. Click the Options button, shown by the arrow in Figure 25. This will open the Options dialog, shown in Figure 26.

Figure 26

image

Under the Default Query Load Settings, the default setting is to “Use standard Power Query load settings.”

21. Click the radio button to the left of “Specify custom default load settings.” This will enable the options just beneath it, as shown in Figure 27.

Figure 27

image

22. Click the checkbox next to “Load to worksheet” to deselect it.

23. Click the checkbox next to “Load to Data Model” to select it. The Default Query Load Settings should now resemble Figure 28.

Figure 28

image

24. Click OK. There is not an obvious change to show so I won’t bother with an image.

This is a perfect place to stop for this post. In the next and final post in the series, we will demonstrate various ways of managing queries.

Power Query Decathlon – Beginner 08: Error Handling

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

Dealing With Errors Found After Close & Load

In the last post in this series, we worked with Date columns, namely our Birth Date column. After some transformation examples, we ended up preserving just the Year part of the that column and replacing our Birth Date column with a Birth Year column. Grand. However, when that query was saved, that were two rows that generated errors. Figure 1 shows this.

Figure 1

image

When there are errors resulting from a query, the number of errors will appear to the right of the Rows Loaded information in the Workbook Queries pane of the Power Query tab in Excel. The error information is not just informative. It also serves as a hyperlink to a special query filtered for just the error rows involved.

1. Click the “2 errors” hyperlink. This should launch a new query called Errors in Query1, as shown in Figure 2.

Figure 2

image

2. Click the white space to the right of the top “Error” value in the Birth Year column. This will reveal the error for that column on the row selected. The result should resemble what is shown in
Figure 3.

Figure 3

image

When you click on a row in a particular column, the contents of that cell appear in a new box at the bottom of the query window. If the selected cell is an Error cell, then the result is that the details of the error are revealed.

Note: You can also click on the Error value itself to get this information. However, that creates a new query step that filters for just that one row, making it difficult to look at more than one row quickly. So, I prefer to click next to the Error value instead as it makes it easier to go from one row to another.

In this case, the error is that Power Query could not isolate the Year value from “1891-00-00” as that value is not a valid date and therefore has no Year component.

3. Click the white space to the right of the Error value on the second row. That is shown in
Figure 4.

Figure 4

image

That looks familiar. Both of these records have the same problem. For the Nobel Laureate in question, the prize committee’s records included only the Year of birth, but not the exact date.

The good news is that, for our purposes, we only want the Year anyway. However, we will need to take some action in order for Power Query to capture the Birth Year values for these two Laureates.

You will note that there are transformation options here in this Errors in Query1 query. If you make changes here to fix the errors and Close & Load this query, it will create a new query in your workbook that does not load anywhere but is accessible from other queries. The “Load is disabled” note will be explained in the next post on Loading Results. Figure 5 shows what it looks like after saving the Errors query.

NOTE: We will not be saving this Errors in Query1 query as part of this post. So, the next image is just for information and does not accompany specific actions you should take if you are following along.

Figure 5

image

This gives you the option of filtering the errors out of the Query1 query altogether (we will show that process later) and only dealing with them in the second, Errors in Query1 query. You could then append the cleaned up rows from the Errors in Query1 query to the Query1 query to bring them back into the dataset.

This ability is similar to the ability of SQL Server Integration Services to route Error rows along a different path so that the errors can be dealt with. While I have not done this much yet with Power Query, I very much appreciate having the option.

OK. Back to the example. Both of our errors have the issue that the proper Year value in their Birth Date is followed by “-00-00” which is certainly invalid for both the Month and the Day. The fact that their issues are identical is very convenient for us because it allows us to easily fix both errors with a single action.

4. If you still have the Errors in Query1 query open, as shown in Figure 4, please click the X in the upper right of the query window, as shown in Figure 6, to close it.

Figure 6

image

Power Query will know that the query has not been saved and will prompt you to do so, as shown in Figure 7.

Figure 7

image

5. Since we do not want to keep a separate query for these errors, choose Discard, shown by the arrow in Figure 7. This will take us back to the Workbook Queries pane of the Power Query tab, as shown in Figure 8.

Figure 8

image

6. Double-click the Query1 query to open it. This will open the Query Editor, as shown in Figure 9.

Figure 9

image

The query will open to the final step, Renamed Columns1. However, at this step, the Birth Date column had already been converted to a Date. In order for Power Query to properly deal with our invalid dates, we must change the values for our two errors BEFORE the conversion to Date occurs. Since we changed Birth Date to a Date in the Changed Type1 step, shown highlighted in Yellow in Figure 9, we will have to insert a new step just before that, just AFTER the Added Custom step.

7. Click the Added Custom step in the APPLIED STEPS, shown by the arrow in Figure 9. This will change our Results Pane to the state of the query AFTER that step has been applied but BEFORE any subsequent step. This is shown in Figure 10.

Figure 10

image

Note that, as of this step, the Birth Date column is still of the Any data type. This is the perfect spot in the query for us to change some invalid date values to valid ones.

8. With the Birth Date column selected, click the Replace Values button, shown in Figure 10. This will reveal the warning shown in Figure 11. Note: There is a Replace Values button on the Transform tab as well. They work the same.

Figure 11

image

Since the steps in a query build upon each other, and have a specified order, adding new steps in the middle can cause issues. For example, if I add a new step here that removes columns, those columns will still be referenced in subsequent steps and those steps will become invalid and broken and unhappy. However, in our case, we are not making structural changes to our query, we are just replacing values in the data itself. So, we will be fine inserting our Replace Values step in the middle.

9. Click the Insert button shown by the arrow in Figure 11. This will activate the Replace Values dialog shown in Figure 12.

Figure 12

image

10. Enter “-00-00” in the Value to Find and “-01-01” in the Replace With boxes, respectively. Make sure the Match Entire Cell Contents box is NOT checked. The reason is that we only want to change the –00-00 parts of the values with –01-01. If we check that box, then Power Query will search for rows where the Birth Date columns contains the value –00-00. That would find zero results.

11. Click OK. The result should resemble Figure 13.

Figure 13

image

The benefit of taking this step will not be immediately apparent in the Results Pane. However, if you click through the steps following this new Replaced Value1 step, you should see that they all return data in the Results Pane. This means that there are no structural problems with any of those steps.

12. Click each step after Replaced Value1 in the APPLIED STEPS, shown by the arrows in Figure 13. The result of the final step, Renamed Columns1, is shown in Figure 14.

Figure 14

image

To make sure we have properly dealt with the two error rows, it is necessary for us to Close & Load the query.

13. Click the Close & Load button shown by the arrow in Figure 14. This will execute query and return us to the Workbook Queries pane of the Power Query tab, as shown in Figure 15.

Figure 15

image

The absence of any mention of Errors shows that our fix worked.

Setting Up Errors

Sometimes, you will take steps that reveal errors right away in your dataset and you can actually see them in the Query Editor. This section will create some errors for us to use as examples.

14. Double-click the Query1 query to open it in the Query Editor, as shown in Figure 16.

Figure 16

image

In this example, we will take specific steps cause some errors to show a few of the options available for dealing with rows that have errors in them. Our first step will be to modify some values in the Award Year column that will fail to convert to a whole number.

15. With the Award Year column selected, click the Replace Values button shown by the arrow in Figure 16. This will activate the Replace Values dialog we have seen before, as shown in Figure 17.

Figure 17

image

16. Type “1903” into the Value To Find box and a alpha text string of your choice in the Replace With box.

17. Click OK. The result should resemble Figure 18.

Figure 18

image

Note the text string we chose now appears in the Award Year column for the rows that contained 1903 as the Award Year. Also note that the Award Year column, which no longer contains values that are clearly one particular data type, has been converted to an Any data type by Power Query. This is something to keep an eye on. Even if you make specific data type assignments, subsequent actions could override that. You might be tempted to just avoid data type assignments until the last step of the query. However, you need to remember that some transformations are only available on columns of a specific data type. So, in reality, this is just something you need to watch out for. Double-checking at the very end to make sure columns have the types you want is not a bad idea.

18. Click the Data Type button in the Transform section of the Home tab of the Ribbon, shown by the arrow in Figure 18. This will show the Data Type options in Figure 19.

Figure 19

image

19. Choose Whole Number, as shown by the arrow in Figure 19. The results should resemble Figure 20.

Figure 20

image

Our text values failed to convert to a Whole Number and are therefore creating an Error state on those rows for the Award Year column.

When you have Errors in rows, you have a few options. These are found in two different places, which might seem counter-intuitive. However, since they behave quite differently, I can see what the decision was made.

Remove or Keep the Error Rows

You have the option of removing rows with an error state or even filtering your dataset to return only the error rows. Both of these options are found on the Home tab of the Ribbon in the Reduce Rows section.

20. With the Award Year column selected, click the Remove Errors button, shown by the arrow in Figure 20. This will completely remove from the dataset any rows that have an Error in the Award Year column. The results should resemble Figure 21.

Figure 21

image

The error rows have been filtered out of the query and will no longer participate in any subsequent query steps or appear in the results that are loaded into Excel.

21. Click the X next to the shiny new Removed Errors query step, shown by the arrow in
Figure 22. This bring us back to the query we had in Figure 20, with the three error rows present.

Figure 22

image

Suppose we want to keep only the error rows and get rid of the rest of the rows.

22. Click the tiny triangle on the right-hand side of the Remove Errors button, shown by the arrow in Figure 23.

Figure 23

image

Instead of just removing the error rows, which is the default operation performed when clicking the Remove Errors button, this will reveal the options shown in Figure 24.

Figure 24

image

You have the option of still just removing the errors. However, you also have the option to keep the error rows and dump the rest.

23. Choose Keep Errors, as shown by the arrow in Figure 24. This will produce the results shown in Figure 25.

Figure 25

image

The only rows left in our dataset are those that have an Error in the Award Year column. This shows that we had not sorted our dataset chronologically by Award Year since several 1903 rows were not visible before. We don’t want to keep this result, though.

24. Click the X next to the Kept Errors step to delete it and return to the state shown in Figure 26.

Figure 26

image

Replace Errors

You also have the option to replace the Error state with a specific value that you specify. This is similar to the Replace Value option, but instead of specifying a Value to Find, Power Query just looks for the Error state. This option is on the Transform tab.

25. With the Award Year column selected, click the Replace Errors button in the Any Column section of the Transform tab. This will activate the Replace Errors dialog as shown in Figure 27.

Figure 27

image

From here, you can decide what single value to use in place of your errors. The value you choose will replace every instance of the Error on the selected column(s) and must match the data type of the column(s). For example, a blank (no value), is considered to be of the Text data type by Power Query. Since the Award Year column in this example has been converted to a Whole Number, a blank will not be accepted here.

26. Leave the Value textbox empty and click OK. The result is shown in Figure 28.

Figure 28

image

Today, the role of HAL 9000 will be played by Power Query: “I’m sorry, Dave. I’m afraid I can’t do that.”

This means that you will sometimes have to give some thought when deciding when and how to deal with errors.

27. Enter 0 for the Value instead of leaving it blank.

28. Click OK. The result should resemble Figure 29.

Figure 29

image

The errors rows remain, but their values have been changed to a value that does not produce and Error state. Grand.

Back in Step 13, we saved the Query1 after fixing those two error rows in Birth Year. Everything we have done since then was just to show how to deal with errors we specifically created. At this point, we can just close the query and discard changes and we will be ready for the next post in the series.

29. Click the X in the upper right corner of the Query Editor to close it.

30. Choose Discard from the screen that follows.

That takes care of dealing with Errors. In the next post, we will talking about Loading results to Excel.

Power Query Decathlon – Beginner 07: Date Columns

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

In this post, we will play with transformation options available for Date typed columns. Our starting point is shown in Figure 1.

NOTE: Because my regional settings for Windows are set to English – United States, my default format for dates is mm/dd/yyyy. So, in this post, my dates will follow that format. The functions in Power Query will work appropriately for your region.

Figure 1

image

The Birth Date column still has the Any data type. Our first course of action will be to change its type to Date.

1. With the Birth Date column highlighted, click the Data Type option on the ribbon (on either the Home or Transform tabs). This will reveal the options shown in Figure 2.

Figure 2

image

2. Choose Date. The result will resemble Figure 3.

Figure 3

image

Note that the values in the column have become right-justified, are now italic, and the hyphens between the date parts have changed to slashes. Also, you will notice that the final query step in the APPLIED STEPS section is still Changed Type1. Since the last step performed in the last post was to change the data type of the Award Percentage column to a decimal number, and the first of this one is to change the type of the Birth Date column to a Date, Power Query merged both of those changes into a single step.

Since we will want to keep the Birth Date column while we play with the various options, we will be adding new columns. New columns are added to the far right of the query. So, our next course of action will be to move the Birth Date column to the end to make it easier to view it alongside the new columns.

3. Right-click the Birth Date column to expose the menu shown in Figure 4. Note, if you are using a device with no Right-click option, you can also use the Move button on the Home tab of the ribbon.

Figure 4

image

4. Choose “To End” as shown in Figure 4. The result will resemble Figure 5.

Figure 5

image

The Birth Date column is now at the far right of the query, after Award Percentage.

5. With the Birth Date column selected, click the Date button in the From Date & Time section of the Add Column tab of the ribbon, as shown in Figure 5. This will reveal the options shown in Figure 6.

Note: It is important to use the Add Column tab here so that our transformation will create a new column instead of replacing the existing one.

Figure 6

image

As dates have multiple parts, there are separate functions you can use for dealing with each part.

Let’s create a new column that returns the Month part of the date.

6. From the Month menu shown in Figure 6, choose Month, shown by the arrow in Figure 7.

Figure 7

image

The result will be a new column, called Month by default, to the right of the Birth Date column, as shown in Figure 8.

Figure 8

image

Note that this returns the ordinal number for the month, not a name. In Figure 8, I am showing the Home tab in order to highlight the fact that the Month transformation also performed the added work of converting the resulting column to the Decimal Number data type.

7. Since we will not be keeping this column, click the X next to the Inserted Month query step, as shown in Figure 9.

Figure 9

image

This will take our query back to the way it was in Figure 6, shown again below as Figure 10.

Figure 10

image

In addition to finding the parts of a date, you can also find other values based on that date. For example, you can use a date value to find the last day of that month. This could be useful in situations where the number of days in a month will matter. This functionality is easier than building logic to determine which years were leap years (ah, February) and then building the End Date column from there.

8. From the Date menu on the Add Column tab, choose Month –> End of Month, as shown in Figure 11.

Figure 11

image

The result is shown in Figure 12.

Figure 12

image

Note that the transformation also correctly typed the resulting column as a Date.

We do not want to keep this column, so we will delete this step.

9. Click the X next to the Inserted End of Month step in the APPLIED STEPS, as shown by the arrow in Figure 13, to delete it.

Figure 13

image

This will, once again, take us back to the state of the query in Figure 6, as shown in Figure 14.

Figure 14

image

I used this dataset for a demo in which I wrote the result to the Excel Data Model (we will cover that in a future post). One thing I had not though about was that the earliest value supported for a Date column in Power Pivot is January 1, 1900. You can see from the data that there are many Nobel Laureates who made the the inconvenient decision to be born prior to 1900. The nerve of some people… 🙂 For that reason, I determined it was best to just use the Birth Year of each Laureate instead of the full date. To clarify, we want to replace the Birth Date column with a Birth Year column.

10. With the Birth Date column selected, click the Date button in the Date & Time Column section of the Transform tab of the Ribbon, as shown in Figure 14. This will reveal the familiar Date options shown in Figure 15.

Figure 15

image

11. Choose Year from the Year option. This will produce the result shown in Figure 16.

Figure 16

image

Figure 16 shows the Home tab in order to highlight the Data Type.

Since we used the Transform tab instead of the Add Column tab, the existing Birth Date column containing the full date was replaced with a Birth Date column containing just the Year part of the Birth Data. Note that the data type was changed to a Decimal Number. We have a few issues with this result. First, the name of the column no longer makes sense for the content it holds. And second, we will never ever want to perform math on these values, so a text data type would be more appropriate in order to prevent mathematical operations that have no hope of being valid.

12. On the Home or Transform tabs, change the data type of the Birth Date column to Text, as shown in Figure 17.

Figure 17

image

Notice that the values in the column are now left-justified.

13. Right-click the Birth Date column header, and choose Rename, as shown in Figure 18.

Figure 18

image

Note that you can also double-click the column heading or use the Rename button in the Manage Columns section of the Home tab to rename this column.

14. Name the column Birth Year. The result should resemble Figure 19.

Figure 19

image

If you want to save this query and file for use in the next post, go for it. If you do, you should notice that there are two errors found, as shown in Figure 20.

Figure 20

image

 

Those will be dealt with in the next post, which covers handling errors.

Power Query Decathlon – Beginner 06: Number Columns

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

An apology: I want start this post by apologizing for making it harder to follow this series than I should have. Rather than put the onus on you, dear reader, to maintain the working file for each post, I should have just supplied the necessary source file at the beginning of each post. I have hereby rectified that on all previous posts in this series and will provide the work file for each post from here on.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

While I tend not to do much with number columns in Power Query other than formatting, I wanted to make sure to provide an example to show that you certainly CAN work with them. There are certainly occasions where doing math in Power Query makes sense. I have just not encountered many yet.

We start the the query shown in Figure 1.

Figure 1

image

Our sole number column in this query is the Awardees column, shown highlighted in Figure 1. This fields contains the number of people that shared each portion of a particular Nobel Prize. We will start out with something basic.

Information

The Information button, in the Number Column section of the Transform tab and the From Number section of the Add Column tab, has three options. The options are shown in Figure 2.

Figure 2

image

The Is Even option returns the value TRUE if the number is an even number and FALSE if the number is odd. The Is Odd option does the opposite, returning TRUE for odd numbers and FALSE for even ones. Contrary to what you might think, the Sign option is not related to Tesla OR the “Wormsign” cry from Dune. It actually returns a –1 for negative numbers and a 1 for positive numbers.

Our first example will be to use the Sign option. In order to make it worthwhile, we will first use the Replace Values transformation to make some of our values negative.

1. With the Awardees column highlighted, click the Replace Button on the Home tab of the ribbon, shown in Figure 3. Note: The Replace Values button on the Transform tab will work just fine as well.

Figure 3

image

This will activate the Replace Values dialog, as shown in Figure 4.

Figure 4

image

2. Enter 1 in the Value To Find and –8675309 in the Replace With boxes, respectively, as shown in
Figure 4. Thanks to Jenny for the number.

3. Click OK. The result will resemble Figure 5.

Figure 5

image

Notice the presence of –8675309 values where the 1 values used to be. Note also that doing this causes the Awardees column to change to the Any data type. Interesting. Even though the new value is still a whole number, the use of an Any Column transformation will convert that column to the Any data type. You can see this in Figure 6.

Figure 6

image

4. Making sure the Awardees column is selected, click the Data Type option and change it to Whole Number.

5. Click to select the Awardees column again.

6. Click the Sign option of the Information button on the Add Column tab of the ribbon, as shown in Figure 7.

Figure 7

image

Since we chose the Add Column tab from which to perform this action, the result will be a new column containing the results of the Sign transformation, as shown in Figure 8.

Figure 8

image

Notice that all of the rows with a negative value for Awardees have a –1 value for the Sign column.

Since we don’t want to keep the work from this example, we will delete these steps to return to our starting point.

7. Delete the last three steps in the APPLIED STEPS, as shown in Figure 9.

Figure 9

image

Like Math and Stuff

For this example, we will use the values in the Awardees column to calculate the percentage of the Nobel Prize won by each Laureate. Take a close look at Figure 10.

Figure 10

image

For the 1903 Nobel Prize for Physics, we see there are three Laureates. However, at first glance, the values seem a bit confusing. A little research cleared that up, though. Antoine Henri Becquerel won 1/2 of that prize while Pierre Curie and Marie Curie split the other half, getting 1/4 each. Interesting, eh? The More You Know.

This means we can use the value in the Awardees field to calculate what percentage of each Nobel Prize was awarded to each Laureate.

8. Click the Add Custom Column button on the Add Column tab of the ribbon, as shown in
Figure 11.

Figure 11

image

This will launch the dialog shown in Figure 12.

Figure 12

image

9. Name the new column something sensible, like Award Percentage.

10. Type “1/” in the Custom Column Formula box.

11. Scroll down in the Available Columns box until Awardees becomes visible.

12. Click on Awardees to select it.

13. Click the Insert button, shown by the arrow in Figure 13.

Figure 13

image

Once you have completed these steps, the Add Custom Column dialog should resemble Figure 14.

Figure 14

image

The formula is 1/[Awardees]. This means that, for each row, this new column will be equal to the result of dividing the value 1 by the value in the Awardees column in that row.

14. Click OK. The result is shown in Figure 15.

Figure 15

image

As we would expect, the values make sense as Mr. Becquerel has 50% and the Curies each have 25%. Grand.

The Award Percentage column defaults to the Any data type. Let’s make it something more appropriate.

15. Click the Award Percentage column to select it.

16. Click the Data Type option on either the Home or Transform tabs. The Transform tab option is shown in Figure 16.

Figure 16

image

17. Select Decimal Number as the data type.

The final query for this example will resemble Figure 17.

Figure 17

image

If you are following along with your own file, you can click Close & Load to save this query and then save the file for next time. Otherwise, you can just use the source file I will provide at the beginning of that next post, in which we will play with Date Column options.

Power Query Decathlon – Beginner 05: Text Columns

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

As discussed in the last post in this series, Power Query Decathlon – Beginner 04: General Transformations, the transformations on the Transform tab in Power Query are arranged in meaningful groups, largely driven by the data type of the column in question. If you have not done so, follow the steps in that post in order to be ready to follow along.

In this post, we will focus on the transformations available for Text columns, shown by the arrow in Figure 1.

Figure 1

image

Format

The Format transformations deal with capitalization, spaces, and non-printing characters. The options and their descriptions are as follows:

“lowercase”: makes all of the letters in the selected text column(s) lowercase

“UPPERCASE”: makes all of the letters in the selected text column(s) UPPERCASE

“Capitalize Each Word”: makes the first letter in each word of the selected text column(s) UPPERCASE and all other letters lowercase

“Trim”: removes any leading and trailing spaces from the values in the selected text column(s)

“Clean”: removes any non-printable characters from the values in the select text column(s)

The one we will demonstrate in this post is the Capitalize Each Word transformation. Figure 2 shows that the values in the Gender and Award Category columns are lowercase. While this is not an issue with understanding, it may be more appealing for users if the values were capitalized instead.

Figure 2

image

1. Select both the Gender and Award Category columns.

2. Click the Format button on the Ribbon.

3. Choose the “Capitalize Each Word” option, as shown in Figure 3.

Figure 3

image

The results of this transformation are shown in Figure 4.

Figure 4

image

Hazzah!

Merge Columns

Merging columns involves combining the values from more than one column into a single column. There are actually two different results you can achieve when merging columns. The first is to replace all of the merged columns with the new column. With this option, the original columns go away. The second is a create a brand new column with the merged values while leaving the original source columns intact. We will start with option 1.

4. Select both the First Name and Last Name columns.

5. Click the Merge Columns button on the Transform tab of the Ribbon, as shown in Figure 5. NOTE: The Merge Columns button is grayed out by default. It is only activated when more than one column is selected.

Figure 5

image

After clicking the Merge Columns button, the dialog shown in Figure 6 will appear.

Figure 6

image

In this dialog, you have the option to choose a separator for the values in the resulting column. You also have the ability to choose the name of that column.

6. Name the column something appropriate, like Full Name.

7. Click the Separator drop down. This will display the options shown in Figure 7.

Figure 7

image

8. Choose Space. This will, as you might expect, separator the values from each column with a space. The dialog should now look like Figure 8.

Figure 8

image

9 Click OK. This will result in what appears in Figure 9.

Figure 9

image

Note that the First Name and Last Name columns have been replaced by the new Full Name column. In this example, we will assume that is NOT what we want.

10. Delete the Merged Columns step, shown by the arrow in Figure 9.

This time, we will create the Full Name column while preserving the First Name and Last Name columns.

11. Select both the First Name and Last Name columns, just as we did before.

12. Click the Add Column tab of the Ribbon, shown by the arrow in Figure 10.

Figure 10

image

The Add Column tab has some options that are available on other tabs, and a few that are only available here. Figure 11 shows the Add Column tab.

Figure 11

image

While we will not go into them in this series, I will take a quick moment to discuss one of the options here. The Add Custom Column button allows you to create a new column manually with M code. As of the November 2014 release of Power Query, the dialog for the Add Custom Column is not very robust. But let’s give the team some time. They are doing great work.

13. Click the Merge Columns button on the Add Column tab, shown by the arrow in Figure 11.

14. The steps for creating the new column on the Add Column tab are identical to the steps on the Transform tab, other than the tab on which you click the button. Follow the same steps as before, specifying the Space for the separator and a meaningful name like Full Name, as shown in Figure 12.

Figure 12

image

15. Click OK. The result should resemble Figure 13.

Figure 13

image

Note that the new column is added to the end, the far right, of the query. I did scroll to the right before capturing Figure 13 in order to see the new column. We have ability to move it elsewhere, though.

16. Right-click on the Full Name column to reveal the context menu shown in Figure 14.

Figure 14

image

17. Click Move.

18. Click To Beginning, as shown in Figure 14. NOTE: If you are using a device without the option for right-clicking, you can also just select the column and use Move button in the Manage Columns section of the Home tab. However you accomplished the move, the result will appear in Figure 15.

Figure 15

image

I did scroll to the left before capturing Figure 15 to see the Full Name column in its new position. The Move options are great when you want to move something a long way, like when you have a lot of columns. When you want to move columns within your field of view, you can just click and drag them.

19. Click and drag the Full Name column until the indicator shows it can be dropped to the right of the Last Name column, as shown in Figure 16, and release it.

Figure 16

image

The result will be that the Full Name column is now positioned between the Last Name and Birth Date columns, as shown in Figure 17.

Figure 17

image

Note that Power Query still put both of the column moves into a single step, even though we used two different methods to accomplish them. Cool.

Split Column

The last transformation we will cover in this post is the Split Column. This allows you to divide a single column into two or more columns based on a delimiter you specify or a number of characters. In this example, we will split the Birth Date column into 3 columns, with the splits occurring on the “-“ between the components of the date value.

20. Select the Birth Date column. NOTE: Our Birth Date column is still of the Any data type at this point. However, the Split Column transformation will still work just fine.

21. Click the Split Column button on the Transform ribbon, shown by the arrow in Figure 18.

Figure 18

image

This will reveal the two options shown in Figure 19.

Figure 19

image

22. Click the By Delimiter option shown by the arrow in Figure 19. This will open the Spit Column dialog shown in Figure 20.

Figure 20

image

These are the default options. These options will look for commas and, due to the active option under Split, create a new column after each comma. In our case, the comma delimiter will not help.

23. Click the delimiter drop down, shown by the arrow in Figure 20. This will reveal the pre-populated options shown in Figure 21.

Figure 21

image

The hyphen, the separator between the parts of the value in our Birth Date column is not available in this list. No worries.

24. Click the “Custom” option, shown by the arrow in Figure 21. This will activate a textbox allowing you to enter the delimiter character(s) you need.

25. Enter in a hyphen, as shown in Figure 22.

Figure 22

image

The next choice we have to make is in the Split section. The options are pretty self-explanatory. For our example here, the default option, “At each occurrence of the delimiter,” is the one we want.

26. Click OK. This will produce the result shown in Figure 23.

Figure 23

image

There are a few things to note here. First, because we chose the Split Column button on the Transform ribbon, the original Birth Date column is gone. Second, the delimiter itself, the hyphen in our example, is not persevered. Third, for better or worse, Power Query has chosen not only to split the column, but also to change the data type of the resulting columns to Whole Number since they seem to match that pattern. So, this is a case where a single action generated TWO query steps. If the data type change is not appropriate, you can delete that step and still keep the split columns.

Note: The reason that final step is called “Changed Type 1” is that we already had a step called “Changed Type.” When there must be more than one instance of the same transformation, Power Query will merely add an increment number to the name since the names of steps MUST be unique within a query. The reason for this is that the steps behind the scenes in the M code reference previous steps by name. If there is more than one step with the same name, the M code will freak out as it could not know which one is correct.

The last thing we need to do in this post is to undo the split column, since we don’t need that going forward.

27. Delete both the Changed Type 1 and Split Column steps shown by the arrow in Figure 23. The result will be just like Figure 17 above, with the Full Name column having been created and moved into position just to the right of the Last Name column. This is shown in Figure 24.

Figure 24

image

That wraps up our work with Text Columns. In the next post in the series, we will start working with Number Columns. If you want to follow along, please use the Close & Load button to save this query and then save the Excel file. We will continue to use it in upcoming posts.

Power Query Decathlon – Beginner 04: General Transformations

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

It is time to start covering Transformations. These are what really help set Power Query apart from the capabilities offered by Power Pivot. Power Pivot has long had a respectable list of sources you could use. However, it has never provided the ability to transform/shape/modify data before pulling it into the data model. Sure, you could do that sort of thing in source queries, but that only helps when pulling data from a source that directly supports queries, like a database. With Power Query, once you have pulled data from a source, you have a lot more work you can do before sending it to its final destination (reference to terrible movie franchise not intended).

While there are Transformations available to you from the Home tab of the Query Editor, the full list of available options is available on the Transform tab. For this post, we will focus on the Transformation tab.

1. Click the Transform tab heading as shown in Figure 1.

Figure 1

image

This will expose the Transform tab of the Ribbon, as shown in Figure 2.

Figure 2

image

As you can see in Figure 2, the available Transformations are organized into the following groups:

  • Table
  • Any Column
  • Text Column
  • Number Column
  • Date & Time Column

For this post, we will show examples from the first two options, Table and Any Column.

Table Transformations

Table Transformation, as their name suggests, operate on the table as a whole. Figure 3 shows a closer look at those options.

Figure 3

image

The Use First Row as Headers option is pretty straightforward and we have already seen that in the post on importing data. We will not cover it again here.

For now, we will start with the Group By transformation. This is useful when you have a table of detail data that you want to summarize in some way. For this example, we will use the Group By transformation to return a list of the Nobel Laureates that have one more than one Nobel Prize.

We will start with the query we saved from last time, as shown in Figure 4. If you have not done so, you will need to follow the steps in the previous post, Power Query Decathlon – Beginner 03: Managing Columns and Rows, to create the query show in Figure 4.

Figure 4

image

2. Using whatever method you like, select both the First Name and Last Name columns. The main option in Windows is to hold down the Ctrl key while clicking each column in succession.

3. Once they are selected, click the Group By button on the Ribbon, as shown in Figure 5.

Figure 5

image

Doing this will activate the Group By dialog shown in Figure 6.

Figure 6

image

The Group By section shows the columns we will be grouping by, First Name and Last Name. Notice there is a + to the right of Group By. This is so that we can add additional columns do group by. Since we selected the two columns manually, they both added themselves for us.

The New Column Name toward the bottom shows the new aggregation that will be performed as part of this Group By. The default is to perform just a count of rows according to the Group By columns. In our case, each row represents and awarding of the Nobel Prize, so this default is just fine. Note that you can also choose NOT to add any new columns, and merely produce a distinct list. If you want to do that, you can click the – to the right of the Count column. This will remove that entirely. For now, we will just specify a different name for our new column.

4. Change the New Column Name value, shown by the arrow in Figure 6, to read Award Count.

5. Click OK. This will perform the transformation and produce a result like the one shown in    Figure 7.

Figure 7

image

6. Click the filter button for the Award Count column, shown by the arrow in Figure 7. This will produce the filter dialog for that column, as in Figure 8.

Figure 8

image

The filter dialog shows that no Laureate (person or organization) has ever won more than 3 Nobel Prizes (at the time of this writing, anyway). Interesting.

7. Click the checkbox next to 1, shown by the arrow in Figure 8, to deselect it. At this point, 2 and 3 should stay selected.

8. Click OK. This should produce a result similar to Figure 9.

Figure 9

image

Cool. The International Committee of the Red Cross has won 3 Nobel Prizes. Good on ya. Marie Curie is listed as well. She was actually the first person to win two. That rules.

That is the Group By transformation in a nutshell. Before we continue, we will want to go back to what we had in Figure 4.

9. Click the X next to both the Filtered Rows 1 and Grouped Rows steps, as shown by the arrow in Figure 9. Note: The X will appear for the Grouped Rows step once it has focus. This will bring us back to the beginning, as shown in Figure 10.

Figure 10

image

The Transpose transformation will basically pivot all rows into columns. There are enough rows currently in this dataset that we end up losing data while doing this. So, we will not be showing an example of it. But feel free to play around with Transpose on a smaller set of data.

Any Column Transformations

When data first comes into Power Query from a source that is not strongly typed, like a text file or the API that is the source for this query, columns will have the Any data type. This is not a type that you can assign. Rather, it is the one Power Query uses when it cannot be sure of the data type for a column. To see the data type for a column, highlight the column and look at the top of the Any Column transformation box, shown in Figure 11.

Figure 11

image

In fact, if you examine the data type for each column in the query, they are all Any at this point. One of the things that is often done during the transformation of data is to change the data type of columns. Perhaps you have dates stored as numbers in your source and need them to actually be Dates to enable calendar-based calculations. Or perhaps you have a numerical value stored as text in a file and you need to convert it to an actual number in order to perform mathematical calculations.

10. Select the First Name, Last Name, Gender, Award Year, and Award Category columns.

11. Click the Data Type drop-down and select Text, as shown in Figure 12.

Figure 12

image

In case you are wondering, Yes, I am making the Award Year a text column. Why? Well, when will it EVER make sense to do math with this column? Will I SUM it? Average it? Heck no. So, since it will never make sense as a participant in a mathematical calculation, I will prevent my end users of the resulting data model from attempting it by making it a Text column. Decisions like this are a good example of helping to save end users from themselves.

12. Click the Awardees column to select it.

13. Click the Data Type option and set it to Whole Number, like in Figure 14. This field contains the number of awardees that shared any particular awarding of a Nobel Prize. Since this is a Count, it will make sense to perform mathematical operations with it. Thus, we make it a Whole Number.

Figure 13

image

Since we have made these data type changes in succession, I want to call your attention to the fact that Power Query was smart enough to put them into a single step rather than one for each change. The APPLIED STEPS shown in Figure 15 show that effectively.

Figure 15

image

The last transformation we will cover in this post is the mighty Replace Values transformation. This allows you to search the values in a column and replace one particular value with a different value you specify. The new value must be of the same data type as the replaced value and will replace every instance of the search value with the new value specified. Since it works on every data type (as far as I have determined, anyway), it makes sense that Microsoft chose to place this in the Any Column transformations section of the Ribbon.

14. Click the filter button for the Birth Date column, shown in Figure 16.

Figure 16

image

This will activate the Filter dialog for that column, shown in Figure 17.

Figure 17

image

Note the value by the arrow. This value is not a valid date. So, when we try to convert this column to a date, we will get errors on any row with this value.

15. Click the Cancel button to close the filter dialog.

16. Make sure the Birth Date column is selected.

17. Click the Replace Values button on the Ribbon, shown by the arrow in Figure 18.

Figure 18

image

This will open the Replace Values dialog shown in Figure 19.

Figure 19

image

Enter “0000-00-00” into the Value To Find box and leave the Replace With box empty. The dialog should look like Figure 20.

Figure 20

image

18. Click OK.

19. Click the Filter button on the Birth Date column again. It should resemble Figure 21.

Figure 21

image

In place of the (0000-00-00) value, we have a blank. This will play much better when we convert this to a Date column later. NOTE: There is a good reason we are not converting the Birth Date column to be a Date type in this post. So, if you want to continue to follow along, please leave the Birth Date column as the Any data type for now.

20. Click the Home tab heading to return to that tab.

21. Click the Close & Load button, shown in Figure 22, to save this progress.

Figure 22

image

In the next post, we will show examples of what you can do with Text columns.

Power Query Decathlon – Beginner 03: Managing Columns and Rows

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

Once you have a query pointed at a source in Power Query, it is a good time to select just the columns and rows you actually want to work with. I would strongly advise against the “just pull everything” philosophy. Any processing you do on data you are just going to get rid of later is wasted processing and wasted time.

Managing Columns

If you have not done so, and wish to perform the steps in this post, please follow the steps in the From Web section of the previous post, Power Query Decathlon – Beginner 02: Importing Data. This example will start there as a foundation. Figure 1 shows our starting point.

Figure 1

image

There are a few different methods for selecting columns. The first we will show is the Choose Columns option.

1. Click the Choose Columns button on the Query Editor Ribbon, shown by the arrow in Figure 1. This will activate the column chooser, as shown in Figure 2.

Figure 2

image

Even in this one screen, there are a few options. If you only want a small subset of the columns available, you can click the checkbox next to (Select All Columns) to deselect all columns and then click the checkbox for the columns you want. Or you can click the checkbox to deselect the columns you don’t want.

2. With the method of your choice, ensure that only the columns shown in Figure 3 are selected and click OK. We will not be using the others.

Figure 3

image

Once you have clicked OK, your Results pane should resemble Figure 4.

Figure 4

image

Note that the columns that were NOT checked have been removed. We also have a new step listed in the APPLIED STEPS.

In the previous post in the series, I recommended that you rename objects as soon as you can. Now the we have removed the columns we don’t want, this is a good time to make sure all the remaining columns have the name we want them to have.

3. Right-click on the “firstname” column to reveal the menu shown in Figure 5.

Figure 5

image

4. Choose Rename. This will put the column name in edit mode, as shown in Figure 6.

Figure 6

image

5. Change the name to First Name and hit the <Enter> key. The result will resemble Figure 7.

Figure 7

image

6. Rename the other columns until the have the names you want. The names I chose are shown in Figure 8.

Figure 8

image

As long as you take no other actions in between the renaming of the columns, Power Query will combine all of the renaming actions into a single query step. Schweet.

You can also rename columns by highlighting the column and clicking the Rename button on the Query Editor ribbon, as shown in Figure 9.

Figure 9

image

This is a good option when you are using a device where right-clicking is challenging or impossible. I pretty much always have the option to right-click so I just use that.

We can also change our column selection by using the Remove Columns button. That button and its underlying options are shown highlighted in Figure 10.

Figure 10

image

The first option, Remove Columns, will remove all currently select columns. This is also the default behavior of the Remove Columns button itself.

7. Use Ctrl-Click to select the First Name and Last Name columns, as shown in Figure 11.

Figure 11

image

8. Click the upper half of the Remove Columns button. The result will be that the First Name and Last Name columns will be removed from the query, as in Figure 12.

Figure 12

image

Great. Those columns are gone. However, we will actually want them. So, we will need to undo that action. Notice the Remove Columns step that is shown by the arrow in Figure 12. This is the step that removed those columns.

9. Click the X-shaped thingy-bobber for Removed Columns to delete that step. The result should resemble Figure 13.

Figure 13

image

You also have the option to remove all columns EXCEPT the ones you have selected. This is a great option when your source has a lot of columns and you only want a few of them.

10. Ctrl-click the First Name and Last Name columns to select them both, as shown in Figure 14.

Figure 14

image

11. Click the bottom half of the Remove Columns button, as shown in Figure 14. This will expose the Remove Other Columns option shown in Figure 15.

Figure 15

image

12. Click the Remove Other Columns option. This will remove all columns EXCEPT the First Name and Last Name, as shown in Figure 16.

Figure 16

image

Since we actually DO want those removed columns, we can undo this action by clicking the X next to Removed Other Columns shown by the arrow in Figure 16. This will return us to situation shown in Figure 17.

Figure 17

image

That is all we will cover for Managing Columns. We will cover moving columns around in a future post. Let’s move on to Rows.

Managing Rows

In Power Query, you have the ability to to filter rows based on the values that live in columns. For example, in the dataset we are using for the example, There are some rows that don’t have a value at all in the First Name column. That absence of a value appears in the dataset as (null). In that case, we want to remove the row entirely.

13. Click the triangle to the right-hand side of the column header for First Name, as shown by the arrow in Figure 18.

Figure 18

image

This will expose the Filter dialog for that column, shown in Figure 19.

Figure 19

image

To verify that the rows with the (null) in the First Name are actually rows we want to exclude, we can temporarily filter for just those rows.

14. Click the (Select All) to deselect all of the values.

15. Click the checkbox next to (null) so that ONLY that value is selected. Your filter dialog should resemble Figure 20.

Figure 20

image

16. Clock OK to apply that filter to the dataset. This will only keep rows that have a (null) for the First Name field. All other rows will be removed. This should resemble Figure 21.

Figure 21

image

It is pretty obvious that these rows provide no value to us. Therefore, we can conclude that filtering them from the dataset is probably a good idea.

We have a few options for doing this. We can delete the Filter Rows step like we have deleted steps before. However, we could also merely reconfigure the filter on that step.

Note: Query Steps build on each other. Renaming a step in the middle is likely to result in subsequent steps breaking as they would reference the original name of the step and not the new name. In this example, we have no subsequent steps, however, I feel it is important to show you how to change an existing step so you know you have that option when you need it.

17. Click the gear icon to the right-hand side of the Filtered Rows step, shown by the arrow in Figure 21. This will launch a different filter dialog than we have seen before. That new dialog is shown in Figure 22.

Figure 22

image

Right now, our query is filtered to only return rows where the First Name column EQUALS null. We want to make the exact opposite choice, ie we only want to keep rows where the First Name column  DOES NOT EQUAL null.

18. Click the triangle to the right-hand side of the equals operator, shown by the arrow in Figure 22. This will expand to show all of the operators we have available to us, as in Figure 23.

Figure 23

image

23. Click the “does not equal” value from the list so that the dialog now looks like Figure 24.

Figure 24

image

24. Click OK to commit the change to the Filtered Rows step. Your query should not resemble Figure 25.

Figure 25

image

If you like, you can click the arrow icon (now shown with a filter icon as well) in the First Name column header to see that all values EXCEPT (null) are selected.

Sometimes, you want to keep/remove rows based on their position instead of by a filter on values in columns.

Suppose you want to just keep the top 10 rows. To do that, we use the Keep Rows button, shown in Figure 26.

Figure 26

image

25. Click the Keep Rows Button.

26. Click the Keep Top Rows option. This will activate the dialog shown in Figure 27.

Figure 27

image

27. Enter 10 and then hit your <Enter> key or click OK. The results should resemble Figure 28. The Keep Bottom Rows would work in a very similar way.

Figure 28

image

28. Delete the Keep First Rows step shown by the arrow in Figure 28, returning the results to what appears in Figure 29.

Figure 29

image

If the rows you want to keep are in the middle, you can do that as well. Suppose we want to keep just the two rows for Marie Curie, rows 6 and 7.

29. Click the Keep Rows button and then the Keep Range of Rows option. This will bring up the dialog shown in Figure 30.

Figure 30

image

30. Since the first row we want is row 6, the first Marie Curie row, we enter 6 for the First Row. Since there are two rows we want, we enter 2 for the number of rows.

31. With those values entered, as in Figure 30, hit your <Enter> key or click OK. This will produce the result shown in Figure 31.

Figure 31

image

32. Delete the Row Range step to undo the last action and return the dataset to what you see in Figure 32.

Figure 32

image

The Remove Rows options work very much the same way, just removing them instead of keeping them. I encourage you to play around with them. We won’t walk through them here.

33. Click the Close and Load button to save this query as we will use this as a base for the remainder of this series.

34. Save the Excel file so you won’t have to do all this again… unless you want to. It is pretty fun. Smile

That’s it for this post. In the next one, we will start doing some transformations on the data.

Power Query Decathlon – Beginner 02: Importing Data

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

In this post, we will demonstrate a few of the options available for importing data into Excel with Power Query. There are a lot of options available, many with their own options. An exhaustive examination of all of them is outside the scope of this post and this series. Since a common use for self-service BI tools in Excel is to mash up data from your Enterprise data sources with data from elsewhere, we will focus on a few sources common in that situation.  For now, we will be covering the following:

1. From Table

2. From File (Text File)

3. From Database (From SQL Server Database)

4. From Web

If we are not covering a source you are wanting to learn about, please follow this link to the Office Online documentation from Microsoft. Also, feel free to comment on this post as well. A series that walks through different sources and how to use them is certainly not out of the question.

1. From Table

The From Table source allows you to take data residing in an Excel worksheet and import it into the Excel Data Model where you can easily relate it to other data using Power Pivot.

In this example, we have a list of the Nobel Prize categories along with ID values that have been assigned to them matching our internal database. This list is formatted as a table, but it does not have to be as the From Table source allows for specifying a range of cells as well. Figure 1-1 shows the table of data in Excel.

Figure 1-1

image

Since the data in this table is a list of Categories, I named the table Category. I highly recommend naming objects appropriately right away. The sooner you give objects the names you want them to have, the easier your life will be (as it relates to that data anyway).

1-1. With a cell in the table selected, click the Power Query tab of the Excel Ribbon, as shown in Figure 1-2.

Figure 1-2

image

1-2. On the Power Query Ribbon, click the From Table button, as shown in Figure 1-3.

Figure 1-3

image

This will launch a new query based on that table and open the Query Editor, as shown in Figure 1-4.

Figure 1-4

image

It’s that simple. Note that the name of the Query matches the name of the Table in Excel. Bonus.

2. From File (Text File)

There are a handful of options for using the From File source. Those options are shown in Figure 2-1.

Figure 2-1

image

For this example, we will use the From Text option to use a .txt file as a source.

2-1. Click the From Text option as shown in Figure 2-1. This will open a file browser window for selecting the source file as shown in Figure 2-2.

Figure 2-2

image

We will be using a text file I created listing out the Nobel Prize winners, but you can use any text file you want that has tabular data in it. By “tabular” I mean that a file containing paragraphs of text is not a great choice. Rather, you want something where the information is laid out in columns and rows.

2-2. To select the file and continue, you have a few options.

a. Double-click the file

b. Click the file and hit the <Enter> key on your keyboard

c. Click the file and click the Open button

Whichever option you choose, Power Query will launch a new query and open the Query Editor for it, as shown in Figure 2-3.

Figure 2-3

image

Notice that the name of the Query is Query1. Unlike the From Table source, the From File source does not, as of the November 2014 release, automatically use the file name as the Query name. Also notice that there are three steps in the APPLIED STEPS pane. The From File source WILL automatically detect (with great accuracy in my experience) whether the first row in the file makes suitable column headers. If so, it will apply that transformation so you do not have to. The Change Type step looked for numeric data and converted any columns with just numbers into an appropriate numeric format.

3. From Database (From SQL Server Database)

Power Query can extract data from many different database systems. The full list is shown in Figure 3-1.

Figure 3-1

image

In this example, we will pull data from a SQL Server database.

3-1. Click the From SQL Server Database Option. This will launch the dialog shown in Figure 3-2.

Figure 3-2

image

You will need the name of a SQL Server database engine instance. In my case, I have a default instance running on my laptop, so I can use “localhost” as my server name. You have the option to enter a database name as well. If not, you will have to navigate to the database you want in a future step. You also have the option of entering a Transact SQL query to extract your data. For this example, we will start by just entering the server name and continuing.

3-2. Enter the name of the database server (or localhost if you are working on a machine with the database engine instance you want) and either hit your <Enter> key or click OK. This will launch the window shown in Figure 3-3.

Figure 3-3

image

Connecting to a SQL Server database requires authentication. The default is to just use the credentials of the user currently logged on. However, if you have specific credentials for SQL Server, you can click the Database option shown in Figure 3-3 and enter that username and password there. In this example, I will use my Windows login so I can just click Save.

3-3. Click Save to use the method you have chosen.

If you are connecting to a SQL Server instance that has not been configured to use encrypted connections, you should receive the message shown in Figure 3-4 the first time you connect to that server.

Figure 3-4

image

If you do see this message, and are not going to use data that would put privacy at risk, you can click OK and continue. In my case, I am connecting to a database engine instance on my laptop and using public data, so this is not an issue.

If the server you are connecting to has been configured to use encrypted connections, then this message will not appear. Whether you get this message and click OK or you do not get this message at all, you should see a Navigator pane at the right listing the databases on the server, as shown in Figure 3-5.

Note: The instructions for configuring your SQL Server database engine to allow encrypted connections can be found here.

Figure 3-5

image

On my server, I will use the AdventureWorksDW2014 database. You can use whatever database you like.

3-4. Click the triangle to the left of the database you want to use, as shown by the arrow in Figure 3-5 above. This will expand that node to reveal the tables and views in that database, which, for the AdventureWorksDW2014 database will resemble Figure 3-6.

Figure 3-6

image

3-5. Hover the cursor over the object of your choice to reveal the Peek for that object. The Peek for the DimDate table is shown in Figure 3-7.

Figure 3-7

image

3-6. Click the Edit button, shown by the arrow in Figure 3-7, to open the query editor for a new query based on the selected object. A new query based on the DimDate table is shown in
Figure 3-8.

Figure 3-8

image

Notice that Power Query automatically used the name of the source object as the name of query. Cool. You can also see that there are two query steps in this query so far, the Source and the Navigation.

Clicking on the gear icon to the right of the Source in the APPLIED STEPS will show the server connection screen shown above in Figure 3-2 and repeated here as Figure 3-9.

Figure 3-9

image

Clicking the gear icon to the right of the Navigation step will reveal the object explorer from
Figure 3-6 repeated here as Figure 3-10.

Figure 3-10

image

Steps with gears are really helpful as they allow you to see the specific choices that were made in that step. Many steps perform simple enough actions that a gear icon is not necessary.

4. From Web

Before I get into this one, I think a quick note is in order. I used this as the foundation for my PASS Summit 2014 presentation, Power Query: Data Chemistry for the Masses. Less than two weeks before Summit, I followed (largely by accident) the following steps to consume a API using JSON,  without really knowing how to explain what JSON even was, with ZERO code. I was amazed. Right then and there, I revamped my demo to have this as a foundation. I had the great pleasure of sharing my joy about this with members of the Power Query team during Summit. That was awesome to let people who work extremely hard to produce great products know that their work is appreciated. I recommend you do that same if you get the opportunity.

NOTE: Revamping a demo really close to when the presentation is scheduled is not a great idea, particularly for an inexperienced speaker. If I may borrow a line from Charlie in Top Gun when analyzing flight footage of Maverick performing a split S, “The encounter was a victory, but I think we have shown it as an example of what not to do.”

The From Web source is useful when we already have a URL for a web page from which we want to pull data. It is through this source that we can utilize data sources that are not in the list of sites indexed for use by the Online Search source. The list of public data sources used by Online Search can be found here.

The following is the URL for the API from the Nobel Prize website allowing us to pull information directly from the authority on this data. The URL will communicate with the API using JSON (Java Script Object Notation). Feel free to follow the link in your browser and see the raw data.

http://api.nobelprize.org/v1/laureate.json

4-1. Click the From Web button on the Power Query Ribbon, shown with the arrow in Figure 4-1.

Figure 4-1

image

This will open the From Web dialog shown in Figure 4-2.

Figure 4-2

image

4-2. Enter the URL as shown in Figure 4-2 then hit the <Enter> key or click OK. This will establish a new query using that data returned as a source, as shown in Figure 4-3.

Figure 4-3

image

I know what you’re thinking. “Mark, dude, what the heck kind of dataset is that?” JSON is hierarchical. What we see right here is just the top level of the hierarchy, ie the List itself.

4-3. Click the Into Table button shown with the arrow in Figure 4-3. This will convert that JSON List object into a Table for easier use in Power Query and result in what is shown in Figure 4-4.

Figure 4-4

image

That’s not much better. However, we now see a handy little “Expand” button, shown by the arrow in Figure 4-4. This will “expand” our list items into Records to make them easier to use.

4-4. Click the Expand button. This will reveal the Records, as shown in Figure 4-5.

Figure 4-5

image

OK. We’re getting there.

4-5. Click the Expand button again to expand another level. This will produce a new pane as shown in Figure 4-6.

Figure 4-6

image

There we go. That will give us far more useful columns than we have seen thus far. If we click OK with the “Use original column name as prefix” option checked, each column name will be prefaced with “Value.” As such, the firstname column would be imported as Value.firstname. As a matter of personal preference, I prefer NOT to have that happen here, even though I do appreciate having the option. So, for this example, we will uncheck that option.

4-6. Click the “Use original column name as prefix” checkbox, shown by the arrow in Figure 4-6, to deselect it.

4-7. Hit your <Enter> key or click OK. This will produce what is shown in Figure 4-7.

Figure 4-7

image

We are starting to see some meaningful data now. We have a single row for each Nobel Laureate.

4-8. Scroll over all the way to the right to see the columns that are currently on visible. The result should resemble Figure 4-8.

Figure 4-8

image

We see our friendly Expand button again. The reason there is another level here is that some Laureates have one more than one Nobel Prize. Imagine that. “Oh, you only have ONE Nobel Prize? How cute.”

4-9. Click the Expand button, resulting in a separate record for each awarding of a Nobel Prize. The result should resemble Figure 4-9.

Figure 4-9

image

Now we have a row for each awarding of each prize. Grand.

4-10. Click the Expand button again to reveal additional columns about each awarding of each prize, as shown in Figure 4-10.

Figure 4-10

image

Well, how cool is that? Power Query remembered our preference from last time and left the “Use original column name as prefix” deselected.

4-11 Click OK of hit the <Enter> key to add the new columns to our dataset. The result is shown in Figure 4-11.

Figure 4-11

image

We now have all the columns we need. We could expand the affiliations column but I think we have made the point here.

Just to recap, we just consumed a JSON API using just a few clicks of the mouse and ZERO code. That’s right. We didn’t write a single line of code ourselves. That is just awesome in my humble opinion. We will be using the dataset create in steps 4-1 through 4-11 for much of this series, so if you want to save this by clicking the Close & Load button, shown by the arrow in Figure 4-11, you can do so. You can then save the Excel file and the query will be saved as well.

For now, we will move on to managing columns and rows.

Power Query Decathlon – Beginner 01: Installation and Tour

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

Power Query, a free add-in for Excel 2010 and Excel 2013 is easily installed. Just search for “power query download” in your search engine of choice. Or click on Figure 1.

Figure 1

image

It is important that you download the version that matches your version of Excel, 32-bit or 64-bit.

Figure 2

SNAGHTML7152be

To determine which version of Office you are running, see this link.

The installation wizard will walk you through the process. The steps are clear so I will not go into it here. Once the installation is complete, there will be a new option for you in Excel.

Figure 3

SNAGHTML53341c

Clicking this option will activate the Power Query ribbon. This is where the journey of working with Power Query begins.

Figure 4

SNAGHTML598210

Figure 5 shows the basic options for sourcing data. We will go into these options, and demonstrate a few, and the next post in this series.

Figure 5

SNAGHTML6263fd

Figure 6

SNAGHTML64ad9eThe options in Figure 6 are used for combining queries that already exist, Merge and Append. We will cover those later in the series as well.

 

 

 

 

 

Figure 7

SNAGHTML683696

The option shown in Figure 7, Workbook Queries, is used to turn the Workbook Queries pane on or off.

 

 

 

 

 

Turning it on allows you see the queries that already exist in the workbook, as shown in Figure 8.

Figure 8

SNAGHTML6a3266

Once you have started a query by choosing a data source, or you have opened an existing query, you will have access to the Query Editor, as shown in Figure 9.

Figure 9

image

The sections of the Query Editor are as follows (corresponding to the labels in Figure 9):

1. Query Editor Toolbar/Ribbon: This is where we find the various buttons we can use for taking actions on our data. For a while, it was just called the Query Editor Toolbar. Lately, I have been seeing documentation from Microsoft referring to it as the Query Editor Ribbon. I prefer to call it the Ribbon since the similar structure in the rest of Office is called the Ribbon.

2. Formula Bar: All actions taken in the Power Query interface actually generate code using the Power Query Formula Language, known informally as M. In the Formula Bar, you can see the M code the currently selected query step. This bar is turned off by default. When you are first getting started, leaving it off is just fine. Once you feel ready to start learning more about M, then turning it on can help you with that by showing you the M code for each step you take.

3. Results Pane: This window shows you what your data looks like AFTER the execution of the currently selected query step.

4. Query Settings: This window holds a few properties of the query as a whole. The first the name of the query. I encourage you to choose an appropriate name for each query you make. Not only will it make more sense when you or someone else looks at it later, but it will also determine the name of the resulting table that gets created once you have loaded the results of the query to Excel.

This concludes this first post on the series. In the next post, we will start putting Power Query to use for importing data.

Power Query Decathlon – Beginner: Introduction

Power Query is an add-in for Excel that helps us easily import data from many different sources into Excel. During that process, we have the option to create new columns, change existing ones, filter out rows, and generally shape the data before it arrives in Excel. This is different from the capabilities we have had in Power Pivot as Power Query allows us to work with data BEFORE it arrives in our data model. With Power Pivot, you must import the data before you can do anything else with it. I don’t mean this as a dig against Power Pivot. Power Pivot is an amazing tool and I absolutely love it. Power Query merely fills a gap in the user story. If you are familiar with SQL Server Integration Services, you can think of Power Query as the End User version. That is not totally fair to Power Query, but if that helps with your foundation, then go for it.

A Power Query query is a job that imports data from one or more sources, optionally modifies or shapes that data, then loads the results to Excel. A query is composed of Steps arranged in a specific order. Each action you take in Power Query will produce steps. Sometimes multiple similar actions will be combined into a single step. Occasionally, a single action will result in multiple steps.

This series will go through various capabilities of Power Query and show how to use some of them. As this is meant for beginners, I will not be covering the Power Query Formula Language (informally known as “M”). That is the language behind the scenes in Power Query. Everything you do in the tool generates M code that will be executed against your data. M is tremendously powerful and takes Power Query to exciting places. However, I think that is best left for a deeper dive. Since the vast majority of what people will need to do with Power Query is possible directly in the user interface provided, we will focus there.

Being a Decathlon, this series will have 10 installments:

Power Query Decathlon – Beginner 01: Installation and Tour

Power Query Decathlon – Beginner 02: Importing Data

Power Query Decathlon – Beginner 03: Managing Columns and Rows

Power Query Decathlon – Beginner 04: General Transformations

Power Query Decathlon – Beginner 05: Text Columns

Power Query Decathlon – Beginner 06: Number Columns

Power Query Decathlon – Beginner 07: Date Columns

Power Query Decathlon – Beginner 08: Error Handling

Power Query Decathlon – Beginner 09: Loading Results

Power Query Decathlon – Beginner 10: Query Management

By the end of this series, you should have a solid foundation with which to start working with Power Query.

Note: At the time of this writing, automating the refresh of data in Power Query is only possible if you have a Power BI subscription. As such, it is difficult to consider Power Query to be ready as an Enterprise resource. It is still a great tool for end users and for targeted solutions where total automation is not necessary. That said, it is safe to assume that Microsoft is aware of this limitation and we will likely see things improve as Power Query matures as a product.

Power Query is not bound to the development cycle of any other tool. As such, it is updated pretty frequently, sometimes monthly. This series was put together using the November 2014 version of Power Query. In other versions, screens and functionality could be different (likely better).