NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.
You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.
It is time to start covering Transformations. These are what really help set Power Query apart from the capabilities offered by Power Pivot. Power Pivot has long had a respectable list of sources you could use. However, it has never provided the ability to transform/shape/modify data before pulling it into the data model. Sure, you could do that sort of thing in source queries, but that only helps when pulling data from a source that directly supports queries, like a database. With Power Query, once you have pulled data from a source, you have a lot more work you can do before sending it to its final destination (reference to terrible movie franchise not intended).
While there are Transformations available to you from the Home tab of the Query Editor, the full list of available options is available on the Transform tab. For this post, we will focus on the Transformation tab.
1. Click the Transform tab heading as shown in Figure 1.
This will expose the Transform tab of the Ribbon, as shown in Figure 2.
As you can see in Figure 2, the available Transformations are organized into the following groups:
- Any Column
- Text Column
- Number Column
- Date & Time Column
For this post, we will show examples from the first two options, Table and Any Column.
Table Transformation, as their name suggests, operate on the table as a whole. Figure 3 shows a closer look at those options.
The Use First Row as Headers option is pretty straightforward and we have already seen that in the post on importing data. We will not cover it again here.
For now, we will start with the Group By transformation. This is useful when you have a table of detail data that you want to summarize in some way. For this example, we will use the Group By transformation to return a list of the Nobel Laureates that have one more than one Nobel Prize.
We will start with the query we saved from last time, as shown in Figure 4. If you have not done so, you will need to follow the steps in the previous post, Power Query Decathlon – Beginner 03: Managing Columns and Rows, to create the query show in Figure 4.
2. Using whatever method you like, select both the First Name and Last Name columns. The main option in Windows is to hold down the Ctrl key while clicking each column in succession.
3. Once they are selected, click the Group By button on the Ribbon, as shown in Figure 5.
Doing this will activate the Group By dialog shown in Figure 6.
The Group By section shows the columns we will be grouping by, First Name and Last Name. Notice there is a + to the right of Group By. This is so that we can add additional columns do group by. Since we selected the two columns manually, they both added themselves for us.
The New Column Name toward the bottom shows the new aggregation that will be performed as part of this Group By. The default is to perform just a count of rows according to the Group By columns. In our case, each row represents and awarding of the Nobel Prize, so this default is just fine. Note that you can also choose NOT to add any new columns, and merely produce a distinct list. If you want to do that, you can click the – to the right of the Count column. This will remove that entirely. For now, we will just specify a different name for our new column.
4. Change the New Column Name value, shown by the arrow in Figure 6, to read Award Count.
5. Click OK. This will perform the transformation and produce a result like the one shown in Figure 7.
6. Click the filter button for the Award Count column, shown by the arrow in Figure 7. This will produce the filter dialog for that column, as in Figure 8.
The filter dialog shows that no Laureate (person or organization) has ever won more than 3 Nobel Prizes (at the time of this writing, anyway). Interesting.
7. Click the checkbox next to 1, shown by the arrow in Figure 8, to deselect it. At this point, 2 and 3 should stay selected.
8. Click OK. This should produce a result similar to Figure 9.
Cool. The International Committee of the Red Cross has won 3 Nobel Prizes. Good on ya. Marie Curie is listed as well. She was actually the first person to win two. That rules.
That is the Group By transformation in a nutshell. Before we continue, we will want to go back to what we had in Figure 4.
9. Click the X next to both the Filtered Rows 1 and Grouped Rows steps, as shown by the arrow in Figure 9. Note: The X will appear for the Grouped Rows step once it has focus. This will bring us back to the beginning, as shown in Figure 10.
The Transpose transformation will basically pivot all rows into columns. There are enough rows currently in this dataset that we end up losing data while doing this. So, we will not be showing an example of it. But feel free to play around with Transpose on a smaller set of data.
Any Column Transformations
When data first comes into Power Query from a source that is not strongly typed, like a text file or the API that is the source for this query, columns will have the Any data type. This is not a type that you can assign. Rather, it is the one Power Query uses when it cannot be sure of the data type for a column. To see the data type for a column, highlight the column and look at the top of the Any Column transformation box, shown in Figure 11.
In fact, if you examine the data type for each column in the query, they are all Any at this point. One of the things that is often done during the transformation of data is to change the data type of columns. Perhaps you have dates stored as numbers in your source and need them to actually be Dates to enable calendar-based calculations. Or perhaps you have a numerical value stored as text in a file and you need to convert it to an actual number in order to perform mathematical calculations.
10. Select the First Name, Last Name, Gender, Award Year, and Award Category columns.
11. Click the Data Type drop-down and select Text, as shown in Figure 12.
In case you are wondering, Yes, I am making the Award Year a text column. Why? Well, when will it EVER make sense to do math with this column? Will I SUM it? Average it? Heck no. So, since it will never make sense as a participant in a mathematical calculation, I will prevent my end users of the resulting data model from attempting it by making it a Text column. Decisions like this are a good example of helping to save end users from themselves.
12. Click the Awardees column to select it.
13. Click the Data Type option and set it to Whole Number, like in Figure 14. This field contains the number of awardees that shared any particular awarding of a Nobel Prize. Since this is a Count, it will make sense to perform mathematical operations with it. Thus, we make it a Whole Number.
Since we have made these data type changes in succession, I want to call your attention to the fact that Power Query was smart enough to put them into a single step rather than one for each change. The APPLIED STEPS shown in Figure 15 show that effectively.
The last transformation we will cover in this post is the mighty Replace Values transformation. This allows you to search the values in a column and replace one particular value with a different value you specify. The new value must be of the same data type as the replaced value and will replace every instance of the search value with the new value specified. Since it works on every data type (as far as I have determined, anyway), it makes sense that Microsoft chose to place this in the Any Column transformations section of the Ribbon.
14. Click the filter button for the Birth Date column, shown in Figure 16.
This will activate the Filter dialog for that column, shown in Figure 17.
Note the value by the arrow. This value is not a valid date. So, when we try to convert this column to a date, we will get errors on any row with this value.
15. Click the Cancel button to close the filter dialog.
16. Make sure the Birth Date column is selected.
17. Click the Replace Values button on the Ribbon, shown by the arrow in Figure 18.
This will open the Replace Values dialog shown in Figure 19.
Enter “0000-00-00” into the Value To Find box and leave the Replace With box empty. The dialog should look like Figure 20.
18. Click OK.
19. Click the Filter button on the Birth Date column again. It should resemble Figure 21.
In place of the (0000-00-00) value, we have a blank. This will play much better when we convert this to a Date column later. NOTE: There is a good reason we are not converting the Birth Date column to be a Date type in this post. So, if you want to continue to follow along, please leave the Birth Date column as the Any data type for now.
20. Click the Home tab heading to return to that tab.
21. Click the Close & Load button, shown in Figure 22, to save this progress.
In the next post, we will show examples of what you can do with Text columns.
Categories: Decathlon, Power Query
1 reply »