Power Query Decathlon – Beginner 08: Error Handling

27 January, 2015 (10:00) | Decathlon, Power Query | By: Mark V

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

Dealing With Errors Found After Close & Load

In the last post in this series, we worked with Date columns, namely our Birth Date column. After some transformation examples, we ended up preserving just the Year part of the that column and replacing our Birth Date column with a Birth Year column. Grand. However, when that query was saved, that were two rows that generated errors. Figure 1 shows this.

Figure 1

image

When there are errors resulting from a query, the number of errors will appear to the right of the Rows Loaded information in the Workbook Queries pane of the Power Query tab in Excel. The error information is not just informative. It also serves as a hyperlink to a special query filtered for just the error rows involved.

1. Click the “2 errors” hyperlink. This should launch a new query called Errors in Query1, as shown in Figure 2.

Figure 2

image

2. Click the white space to the right of the top “Error” value in the Birth Year column. This will reveal the error for that column on the row selected. The result should resemble what is shown in
Figure 3.

Figure 3

image

When you click on a row in a particular column, the contents of that cell appear in a new box at the bottom of the query window. If the selected cell is an Error cell, then the result is that the details of the error are revealed.

Note: You can also click on the Error value itself to get this information. However, that creates a new query step that filters for just that one row, making it difficult to look at more than one row quickly. So, I prefer to click next to the Error value instead as it makes it easier to go from one row to another.

In this case, the error is that Power Query could not isolate the Year value from “1891-00-00” as that value is not a valid date and therefore has no Year component.

3. Click the white space to the right of the Error value on the second row. That is shown in
Figure 4.

Figure 4

image

That looks familiar. Both of these records have the same problem. For the Nobel Laureate in question, the prize committee’s records included only the Year of birth, but not the exact date.

The good news is that, for our purposes, we only want the Year anyway. However, we will need to take some action in order for Power Query to capture the Birth Year values for these two Laureates.

You will note that there are transformation options here in this Errors in Query1 query. If you make changes here to fix the errors and Close & Load this query, it will create a new query in your workbook that does not load anywhere but is accessible from other queries. The “Load is disabled” note will be explained in the next post on Loading Results. Figure 5 shows what it looks like after saving the Errors query.

NOTE: We will not be saving this Errors in Query1 query as part of this post. So, the next image is just for information and does not accompany specific actions you should take if you are following along.

Figure 5

image

This gives you the option of filtering the errors out of the Query1 query altogether (we will show that process later) and only dealing with them in the second, Errors in Query1 query. You could then append the cleaned up rows from the Errors in Query1 query to the Query1 query to bring them back into the dataset.

This ability is similar to the ability of SQL Server Integration Services to route Error rows along a different path so that the errors can be dealt with. While I have not done this much yet with Power Query, I very much appreciate having the option.

OK. Back to the example. Both of our errors have the issue that the proper Year value in their Birth Date is followed by “-00-00” which is certainly invalid for both the Month and the Day. The fact that their issues are identical is very convenient for us because it allows us to easily fix both errors with a single action.

4. If you still have the Errors in Query1 query open, as shown in Figure 4, please click the X in the upper right of the query window, as shown in Figure 6, to close it.

Figure 6

image

Power Query will know that the query has not been saved and will prompt you to do so, as shown in Figure 7.

Figure 7

image

5. Since we do not want to keep a separate query for these errors, choose Discard, shown by the arrow in Figure 7. This will take us back to the Workbook Queries pane of the Power Query tab, as shown in Figure 8.

Figure 8

image

6. Double-click the Query1 query to open it. This will open the Query Editor, as shown in Figure 9.

Figure 9

image

The query will open to the final step, Renamed Columns1. However, at this step, the Birth Date column had already been converted to a Date. In order for Power Query to properly deal with our invalid dates, we must change the values for our two errors BEFORE the conversion to Date occurs. Since we changed Birth Date to a Date in the Changed Type1 step, shown highlighted in Yellow in Figure 9, we will have to insert a new step just before that, just AFTER the Added Custom step.

7. Click the Added Custom step in the APPLIED STEPS, shown by the arrow in Figure 9. This will change our Results Pane to the state of the query AFTER that step has been applied but BEFORE any subsequent step. This is shown in Figure 10.

Figure 10

image

Note that, as of this step, the Birth Date column is still of the Any data type. This is the perfect spot in the query for us to change some invalid date values to valid ones.

8. With the Birth Date column selected, click the Replace Values button, shown in Figure 10. This will reveal the warning shown in Figure 11. Note: There is a Replace Values button on the Transform tab as well. They work the same.

Figure 11

image

Since the steps in a query build upon each other, and have a specified order, adding new steps in the middle can cause issues. For example, if I add a new step here that removes columns, those columns will still be referenced in subsequent steps and those steps will become invalid and broken and unhappy. However, in our case, we are not making structural changes to our query, we are just replacing values in the data itself. So, we will be fine inserting our Replace Values step in the middle.

9. Click the Insert button shown by the arrow in Figure 11. This will activate the Replace Values dialog shown in Figure 12.

Figure 12

image

10. Enter “-00-00” in the Value to Find and “-01-01” in the Replace With boxes, respectively. Make sure the Match Entire Cell Contents box is NOT checked. The reason is that we only want to change the –00-00 parts of the values with –01-01. If we check that box, then Power Query will search for rows where the Birth Date columns contains the value –00-00. That would find zero results.

11. Click OK. The result should resemble Figure 13.

Figure 13

image

The benefit of taking this step will not be immediately apparent in the Results Pane. However, if you click through the steps following this new Replaced Value1 step, you should see that they all return data in the Results Pane. This means that there are no structural problems with any of those steps.

12. Click each step after Replaced Value1 in the APPLIED STEPS, shown by the arrows in Figure 13. The result of the final step, Renamed Columns1, is shown in Figure 14.

Figure 14

image

To make sure we have properly dealt with the two error rows, it is necessary for us to Close & Load the query.

13. Click the Close & Load button shown by the arrow in Figure 14. This will execute query and return us to the Workbook Queries pane of the Power Query tab, as shown in Figure 15.

Figure 15

image

The absence of any mention of Errors shows that our fix worked.

Setting Up Errors

Sometimes, you will take steps that reveal errors right away in your dataset and you can actually see them in the Query Editor. This section will create some errors for us to use as examples.

14. Double-click the Query1 query to open it in the Query Editor, as shown in Figure 16.

Figure 16

image

In this example, we will take specific steps cause some errors to show a few of the options available for dealing with rows that have errors in them. Our first step will be to modify some values in the Award Year column that will fail to convert to a whole number.

15. With the Award Year column selected, click the Replace Values button shown by the arrow in Figure 16. This will activate the Replace Values dialog we have seen before, as shown in Figure 17.

Figure 17

image

16. Type “1903” into the Value To Find box and a alpha text string of your choice in the Replace With box.

17. Click OK. The result should resemble Figure 18.

Figure 18

image

Note the text string we chose now appears in the Award Year column for the rows that contained 1903 as the Award Year. Also note that the Award Year column, which no longer contains values that are clearly one particular data type, has been converted to an Any data type by Power Query. This is something to keep an eye on. Even if you make specific data type assignments, subsequent actions could override that. You might be tempted to just avoid data type assignments until the last step of the query. However, you need to remember that some transformations are only available on columns of a specific data type. So, in reality, this is just something you need to watch out for. Double-checking at the very end to make sure columns have the types you want is not a bad idea.

18. Click the Data Type button in the Transform section of the Home tab of the Ribbon, shown by the arrow in Figure 18. This will show the Data Type options in Figure 19.

Figure 19

image

19. Choose Whole Number, as shown by the arrow in Figure 19. The results should resemble Figure 20.

Figure 20

image

Our text values failed to convert to a Whole Number and are therefore creating an Error state on those rows for the Award Year column.

When you have Errors in rows, you have a few options. These are found in two different places, which might seem counter-intuitive. However, since they behave quite differently, I can see what the decision was made.

Remove or Keep the Error Rows

You have the option of removing rows with an error state or even filtering your dataset to return only the error rows. Both of these options are found on the Home tab of the Ribbon in the Reduce Rows section.

20. With the Award Year column selected, click the Remove Errors button, shown by the arrow in Figure 20. This will completely remove from the dataset any rows that have an Error in the Award Year column. The results should resemble Figure 21.

Figure 21

image

The error rows have been filtered out of the query and will no longer participate in any subsequent query steps or appear in the results that are loaded into Excel.

21. Click the X next to the shiny new Removed Errors query step, shown by the arrow in
Figure 22. This bring us back to the query we had in Figure 20, with the three error rows present.

Figure 22

image

Suppose we want to keep only the error rows and get rid of the rest of the rows.

22. Click the tiny triangle on the right-hand side of the Remove Errors button, shown by the arrow in Figure 23.

Figure 23

image

Instead of just removing the error rows, which is the default operation performed when clicking the Remove Errors button, this will reveal the options shown in Figure 24.

Figure 24

image

You have the option of still just removing the errors. However, you also have the option to keep the error rows and dump the rest.

23. Choose Keep Errors, as shown by the arrow in Figure 24. This will produce the results shown in Figure 25.

Figure 25

image

The only rows left in our dataset are those that have an Error in the Award Year column. This shows that we had not sorted our dataset chronologically by Award Year since several 1903 rows were not visible before. We don’t want to keep this result, though.

24. Click the X next to the Kept Errors step to delete it and return to the state shown in Figure 26.

Figure 26

image

Replace Errors

You also have the option to replace the Error state with a specific value that you specify. This is similar to the Replace Value option, but instead of specifying a Value to Find, Power Query just looks for the Error state. This option is on the Transform tab.

25. With the Award Year column selected, click the Replace Errors button in the Any Column section of the Transform tab. This will activate the Replace Errors dialog as shown in Figure 27.

Figure 27

image

From here, you can decide what single value to use in place of your errors. The value you choose will replace every instance of the Error on the selected column(s) and must match the data type of the column(s). For example, a blank (no value), is considered to be of the Text data type by Power Query. Since the Award Year column in this example has been converted to a Whole Number, a blank will not be accepted here.

26. Leave the Value textbox empty and click OK. The result is shown in Figure 28.

Figure 28

image

Today, the role of HAL 9000 will be played by Power Query: “I’m sorry, Dave. I’m afraid I can’t do that.”

This means that you will sometimes have to give some thought when deciding when and how to deal with errors.

27. Enter 0 for the Value instead of leaving it blank.

28. Click OK. The result should resemble Figure 29.

Figure 29

image

The errors rows remain, but their values have been changed to a value that does not produce and Error state. Grand.

Back in Step 13, we saved the Query1 after fixing those two error rows in Birth Year. Everything we have done since then was just to show how to deal with errors we specifically created. At this point, we can just close the query and discard changes and we will be ready for the next post in the series.

29. Click the X in the upper right corner of the Query Editor to close it.

30. Choose Discard from the screen that follows.

That takes care of dealing with Errors. In the next post, we will talking about Loading results to Excel.

Write a comment