NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.
You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.
Once you have a query pointed at a source in Power Query, it is a good time to select just the columns and rows you actually want to work with. I would strongly advise against the “just pull everything” philosophy. Any processing you do on data you are just going to get rid of later is wasted processing and wasted time.
If you have not done so, and wish to perform the steps in this post, please follow the steps in the From Web section of the previous post, Power Query Decathlon – Beginner 02: Importing Data. This example will start there as a foundation. Figure 1 shows our starting point.
There are a few different methods for selecting columns. The first we will show is the Choose Columns option.
1. Click the Choose Columns button on the Query Editor Ribbon, shown by the arrow in Figure 1. This will activate the column chooser, as shown in Figure 2.
Even in this one screen, there are a few options. If you only want a small subset of the columns available, you can click the checkbox next to (Select All Columns) to deselect all columns and then click the checkbox for the columns you want. Or you can click the checkbox to deselect the columns you don’t want.
2. With the method of your choice, ensure that only the columns shown in Figure 3 are selected and click OK. We will not be using the others.
Once you have clicked OK, your Results pane should resemble Figure 4.
Note that the columns that were NOT checked have been removed. We also have a new step listed in the APPLIED STEPS.
In the previous post in the series, I recommended that you rename objects as soon as you can. Now the we have removed the columns we don’t want, this is a good time to make sure all the remaining columns have the name we want them to have.
3. Right-click on the “firstname” column to reveal the menu shown in Figure 5.
4. Choose Rename. This will put the column name in edit mode, as shown in Figure 6.
5. Change the name to First Name and hit the <Enter> key. The result will resemble Figure 7.
6. Rename the other columns until the have the names you want. The names I chose are shown in Figure 8.
As long as you take no other actions in between the renaming of the columns, Power Query will combine all of the renaming actions into a single query step. Schweet.
You can also rename columns by highlighting the column and clicking the Rename button on the Query Editor ribbon, as shown in Figure 9.
This is a good option when you are using a device where right-clicking is challenging or impossible. I pretty much always have the option to right-click so I just use that.
We can also change our column selection by using the Remove Columns button. That button and its underlying options are shown highlighted in Figure 10.
The first option, Remove Columns, will remove all currently select columns. This is also the default behavior of the Remove Columns button itself.
7. Use Ctrl-Click to select the First Name and Last Name columns, as shown in Figure 11.
8. Click the upper half of the Remove Columns button. The result will be that the First Name and Last Name columns will be removed from the query, as in Figure 12.
Great. Those columns are gone. However, we will actually want them. So, we will need to undo that action. Notice the Remove Columns step that is shown by the arrow in Figure 12. This is the step that removed those columns.
9. Click the X-shaped thingy-bobber for Removed Columns to delete that step. The result should resemble Figure 13.
You also have the option to remove all columns EXCEPT the ones you have selected. This is a great option when your source has a lot of columns and you only want a few of them.
10. Ctrl-click the First Name and Last Name columns to select them both, as shown in Figure 14.
11. Click the bottom half of the Remove Columns button, as shown in Figure 14. This will expose the Remove Other Columns option shown in Figure 15.
12. Click the Remove Other Columns option. This will remove all columns EXCEPT the First Name and Last Name, as shown in Figure 16.
Since we actually DO want those removed columns, we can undo this action by clicking the X next to Removed Other Columns shown by the arrow in Figure 16. This will return us to situation shown in Figure 17.
That is all we will cover for Managing Columns. We will cover moving columns around in a future post. Let’s move on to Rows.
In Power Query, you have the ability to to filter rows based on the values that live in columns. For example, in the dataset we are using for the example, There are some rows that don’t have a value at all in the First Name column. That absence of a value appears in the dataset as (null). In that case, we want to remove the row entirely.
13. Click the triangle to the right-hand side of the column header for First Name, as shown by the arrow in Figure 18.
This will expose the Filter dialog for that column, shown in Figure 19.
To verify that the rows with the (null) in the First Name are actually rows we want to exclude, we can temporarily filter for just those rows.
14. Click the (Select All) to deselect all of the values.
15. Click the checkbox next to (null) so that ONLY that value is selected. Your filter dialog should resemble Figure 20.
16. Clock OK to apply that filter to the dataset. This will only keep rows that have a (null) for the First Name field. All other rows will be removed. This should resemble Figure 21.
It is pretty obvious that these rows provide no value to us. Therefore, we can conclude that filtering them from the dataset is probably a good idea.
We have a few options for doing this. We can delete the Filter Rows step like we have deleted steps before. However, we could also merely reconfigure the filter on that step.
Note: Query Steps build on each other. Renaming a step in the middle is likely to result in subsequent steps breaking as they would reference the original name of the step and not the new name. In this example, we have no subsequent steps, however, I feel it is important to show you how to change an existing step so you know you have that option when you need it.
17. Click the gear icon to the right-hand side of the Filtered Rows step, shown by the arrow in Figure 21. This will launch a different filter dialog than we have seen before. That new dialog is shown in Figure 22.
Right now, our query is filtered to only return rows where the First Name column EQUALS null. We want to make the exact opposite choice, ie we only want to keep rows where the First Name column DOES NOT EQUAL null.
18. Click the triangle to the right-hand side of the equals operator, shown by the arrow in Figure 22. This will expand to show all of the operators we have available to us, as in Figure 23.
23. Click the “does not equal” value from the list so that the dialog now looks like Figure 24.
24. Click OK to commit the change to the Filtered Rows step. Your query should not resemble Figure 25.
If you like, you can click the arrow icon (now shown with a filter icon as well) in the First Name column header to see that all values EXCEPT (null) are selected.
Sometimes, you want to keep/remove rows based on their position instead of by a filter on values in columns.
Suppose you want to just keep the top 10 rows. To do that, we use the Keep Rows button, shown in Figure 26.
25. Click the Keep Rows Button.
26. Click the Keep Top Rows option. This will activate the dialog shown in Figure 27.
27. Enter 10 and then hit your <Enter> key or click OK. The results should resemble Figure 28. The Keep Bottom Rows would work in a very similar way.
28. Delete the Keep First Rows step shown by the arrow in Figure 28, returning the results to what appears in Figure 29.
If the rows you want to keep are in the middle, you can do that as well. Suppose we want to keep just the two rows for Marie Curie, rows 6 and 7.
29. Click the Keep Rows button and then the Keep Range of Rows option. This will bring up the dialog shown in Figure 30.
30. Since the first row we want is row 6, the first Marie Curie row, we enter 6 for the First Row. Since there are two rows we want, we enter 2 for the number of rows.
31. With those values entered, as in Figure 30, hit your <Enter> key or click OK. This will produce the result shown in Figure 31.
32. Delete the Row Range step to undo the last action and return the dataset to what you see in Figure 32.
The Remove Rows options work very much the same way, just removing them instead of keeping them. I encourage you to play around with them. We won’t walk through them here.
33. Click the Close and Load button to save this query as we will use this as a base for the remainder of this series.
34. Save the Excel file so you won’t have to do all this again… unless you want to. It is pretty fun.
That’s it for this post. In the next one, we will start doing some transformations on the data.