Power Query Decathlon – Beginner 10: Query Management

29 January, 2015 (10:00) | Decathlon, Power Query | By: Mark V

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

For the final post in this series, we cover Query Management. There are several options available. We begin with our Workbook Queries pane.

1. Right-click the Laureate query to reveal the menu shown in Figure 1.

Figure 1

image

The query management options on this menu are as follows:

  • Edit – This allows you to open the query in the Query Editor
  • Delete – um…. this lets you delete the query. Note: If you delete the query, any tables loaded from this query will also be deleted.
  • Refresh – This will execute this query, loading fresh data to the configured Load destination. The Laureate query in Figure 1 is set to Loan nowhere, so the Refresh button is grayed out.
  • Load To – This option lets you change the destination to which this query loads.
  • Duplicate – This makes a full duplicate of this query including the source and all the steps.
  • Reference – This lets you make a new query that uses this query as its source.
  • Merge – This lets you create a new query that combines columns from this query with another.
  • Append – This lets you create a new query to adds the rows from a second query onto this one, rather like a UNION in T-SQL.
  • Send To Data Catalog – This allows you to save this query into the Power BI Data Catalog so you can share it with other Power BI users.
  • Move To Group – This lets you create folders within the Workbook Queries pane to organize your queries.
  • Move Up/Down – This lets you alter the position the query occupies in the Workbook Queries list.
  • Show the Peek – This allows you to bring up the Peek (preview) pane for this query.
  • Properties – This allows you to change the Name and/or Description for this query.

We will not cover all of these in this post. We will just do a few of them. The options we will cover are Reference, Merge, and Append.

Reference

To have one query use another query as its source, we use Reference.

2. From the menu in Figure 1, choose the Reference option, as shown in Figure 2.

Figure 2

image

This will create and execute a new query, using the selected query as its Source and using the default Load options. The result should resemble Figure 3.

Figure 3

image

The new query takes the name of the first query with an ordinal added. Figure 4 shows the Power Pivot table that was created as a result.

Figure 4

image

I just wanted to show this to make you aware that you may need to change names in your data model manually when you use this technique.

3. Double-click the Laureate (2) query to open it in the Query Editor, as shown in Figure 5.

Figure 5

image

You can see that the Source for the query is Laureate, not the Nobel Prize api that we originally used.

We will modify Laureate (2) to return only the Individuals (not organizations) that won prizes and name it accordingly.

4. Click the filter button in the Gender query header.

5. Click the Org option to deselect it, as shown in Figure 6.

Figure 6

image

6. Change the name of the query, in the PROPERTIES box to Individual Laureates, as shown in Figure 7.

Figure 7

image

7. Use Ctrl-Click to select both the Award Year and Award Category columns, as shown in
Figure 8.

Figure 8

image

8. Click the Group By button, shown by the arrow in Figure 8. This will launch the Group By dialog shown in Figure 9.

Figure 9

image

9. Click OK. The results should resemble Figure 10.

Figure 10

image

10. Click the Close & Load to save this query. This will save the changes to this query and execute it. The Workbook Queries pane should now resemble Figure 11.

Figure 11

image

We now have a summarized count of Individual Laureates by Award Year and Award Category.

Note: When I first started playing with Reference, I thought it would be a way to break a single source into multiple tables and minimize the impact on the source by only hitting that source once. My testing has shown me that that is not how it works. In the example above, executing (Refreshing) the Individual Laureates query will actually just include all the steps from the Laureate query when it runs. So, rather than execute the Laureate query and then use those results as a source, Power Query takes the steps from the Laureate query and includes them when executing any query Referencing the Laureate query. I tested this by creating a single query against a SQL Server table. I then created 4 other queries via the Reference method above. Tracing the SQL Server instance confirmed that the query was actually querying the source for each and every query, not just once. It seems to me, though, that there could be some advantage if you need to use the same source table multiple times and only want to build the steps to clean up that table once. So, in that sense, the Reference method could be a way to modularize your steps so you can maintain them in one place.

11. Create another Reference query based on Laureate. This will create Laureate (2) again.

12. Edit that query to return ONLY the Org values from the Gender column, as shown in Figure 12.

Figure 12

image

13. Give the query an appropriate name like Organization Laureates, as in Figure 13.

Figure 13

image

14. Use Ctrl-Click to select both the Award Year and Award Category columns, as shown in
Figure 14.

Figure 14

image

15. Click the Group By button, shown by the arrow in Figure 14. This will launch the Group By dialog shown in Figure 15.

Figure 15

image

16. Click OK. The results should resemble Figure 16.

Figure 16

image

17. Click the Close & Load to save this query. This will save the changes to this query and execute it. The Workbook Queries pane should now resemble Figure 17.

Figure 17

image

Note: When a Reference is made to a query by another, Laureate in this case, any changes to the referenced query can break the queries that reference it. So, in our example, making changes to Laureate could cause the other queries to become invalid. Just bear that in mind when deciding to use Reference.

Merge

Merge allows you to combine columns from one query into another, rather like joining tables together in a SQL query.

18. Right-click on the Individual Laureates query to reveal the menu shown in Figure 18.

Figure 18

image

19. Click the Merge option to launch the Merge dialog shown in Figure 19.

Figure 19

image

The query we chose, Individual Laureates, appears in the top section. This will be out main table. In the bottom section, we will choose the table from which to add columns to Individual Laureates.

20. Click the drop-down shown by the arrow in Figure 19. This will allow us to select the marge table, as shown in Figure 20.

Figure 20

image

21. Choose Organization Laureates. The dialog should now resemble Figure 21.

Figure 21

image

This is where we tell Power Query which columns to use for merging the tables so that rows in Individual Laureates can find matching rows in Organization Laureates.

22. Click the Award Year column heading under the Individual Laureates query, shown by the 1 in Figure 21.

23. Click the Award Year column heading under the Organization Laureates query, shown by the 2 in Figure 21. The dialog should now resemble Figure 22.

Figure 22

image

Note: It is not strictly necessary to always choose the column from the top query first. I just find it more intuitive. You can choose the Award Year column from Organization Laureates and THEN the one from Individual Laureates and this will still work just fine.

Once at least one join column has been specified for both queries, Power Query will indicate how many rows from the lookup query (Organization Laureates in this case) match rows in the main query. The highlighted message at the bottom of Figure 22 indicates that 110 or the 550 rows in Individual Laureates found matching rows in Organization Laureates when joining on Award Year.

For our merge example, we will join not just on Award Year, but on Award Category as well. I realize this is a pretty silly join since it will only match on rows where a specific award was shared by at least one Individual and an Organization. But, the mechanics of joining on more than one column are important.

24. Ctrl-Click the Award Category column heading under the Individual Laureates query, shown by the 1 in Figure 22. You must use Ctrl-Click so that Power Query knows that you are specifying an additional join column and not changing the join column selection you initially made.

25. Ctrl-Click the Award Category column heading under the Organization Laureates query, shown by the 2 in Figure 22. The dialog should now resemble Figure 23.

Figure 23

image

Here we can see both join columns selected in both queries. Since it is hard to see in Figure 23, Figure 24 shows a closer view of the column headings for the merge.

Figure 24

image

The 1 in the Award Year column header and the 2 in the Award Category column header indicate that we are joining first on Award Year and then on Award Category.

26. Click OK. The query should resemble Figure 25.

Figure 25

image

Our next step is to rename the Count column for the Individual Laureates query, shown by the arrow in Figure 25. The reason is that both queries contain a column named Count and and we cannot have duplicate columns names in a single query. Beside, Count is a pretty terrible name anyway. :)

27. Double-click the Count column header. It was pointed out to me in a comment on a previous post that I did not mention this method for renaming columns. Therefore, we shall use it here.

28. Give it a meaningful name like “Individual Count.” The query should now resemble Figure 26.

Figure 26

image

29. Click the Expand button, shown by the arrow in Figure 26. This will reveal the columns available from the Organization Query, as shown in Figure 27.

Figure 27

image

30. Change the selection so that only the Count column is selected, as in Figure 28.

Figure 28

image

Note the “Use original column name as prefix” checkbox. With this checked, the default, the new columns pulled from this query will be prefaced with “NewColumn” since that is the name of the merged column. I typically turn that off but will leave it on here in order to show what that looks like.

31. Click OK. The query should resemble Figure 29.

Figure 29

image

32. Rename the NewColumn.Count column to something more appropriate like “Organization Count.”

33. Change the name of the query in the PROPERTIES area to Merge Example. The query should not resemble Figure 30.

Figure 30

image

34. Close & Load. The Workbook Query pane should now resemble Figure 31.

Figure 31

image

 

Note: The Merge option is available when you editing an existing query. In that case, it will bring the new columns into THAT query instead of creating a brand new query that is the end result of the merging.

Append

The Append option allows you to basically stack all the rows from one query underneath the rows of another query. The result is a single query with all of the rows from both. It is important that the queries to be merged have the same structure.

35. Right-click the Individual Laureates query to bring up the menu shown in Figure 32.

Figure 32

image

36. Choose Append, shown by the arrow in Figure 32. This will bring up the Append dialog shown in Figure 33.

Figure 33

image

37. Use the drop-down shown by the arrow in Figure 33 to select Organization Laureates as the one to append. The dialog should now resemble Figure 34.

Figure 34

image

38. Click OK. The query should resemble Figure 35.

Figure 35

image

39. Name the query Append Example.

40. Close & Load. The Workbook Queries pane should now resemble Figure 36.

Figure 36

image

The Append Example query has 573 rows (550 from Individual Laureates and 23 from Organization Laureates).

Note: You can use the Append option when you are editing an existing query. In that case, the rows from the second table will be appended to the rows in THAT query instead of creating a new query.

Conclusion

That brings this Power Query Decathlon – Beginner series to an end. I hope you found it helpful and that you see what an exciting tool Power Query is.

Comments

Comment from Larry
Time February 17, 2015 at 11:38 am

Question re SQLSaturday Pre-con . . . will this event get into advantages / disadvantages of various data sources like Relational or Dimensional DB’s, or perhaps using SSAS? Or perhaps mixing Excel with Sharepoint – – – or is this session treat these issues as more or less transparent? Thanks.

Write a comment