Power Query Decathlon – Beginner 05: Text Columns

13 January, 2015 (10:00) | Decathlon, Power Query | By: Mark V

NOTE: This post was created using the November 2014 version of Power Query. In later versions, some of the screens or functionality could be different.

You can download the Excel 2013 file necessary to perform the steps in this post here. To get to the existing query, click the Workbook Queries button on the Power Query ribbon.

As discussed in the last post in this series, Power Query Decathlon – Beginner 04: General Transformations, the transformations on the Transform tab in Power Query are arranged in meaningful groups, largely driven by the data type of the column in question. If you have not done so, follow the steps in that post in order to be ready to follow along.

In this post, we will focus on the transformations available for Text columns, shown by the arrow in Figure 1.

Figure 1

image

Format

The Format transformations deal with capitalization, spaces, and non-printing characters. The options and their descriptions are as follows:

“lowercase”: makes all of the letters in the selected text column(s) lowercase

“UPPERCASE”: makes all of the letters in the selected text column(s) UPPERCASE

“Capitalize Each Word”: makes the first letter in each word of the selected text column(s) UPPERCASE and all other letters lowercase

“Trim”: removes any leading and trailing spaces from the values in the selected text column(s)

“Clean”: removes any non-printable characters from the values in the select text column(s)

The one we will demonstrate in this post is the Capitalize Each Word transformation. Figure 2 shows that the values in the Gender and Award Category columns are lowercase. While this is not an issue with understanding, it may be more appealing for users if the values were capitalized instead.

Figure 2

image

1. Select both the Gender and Award Category columns.

2. Click the Format button on the Ribbon.

3. Choose the “Capitalize Each Word” option, as shown in Figure 3.

Figure 3

image

The results of this transformation are shown in Figure 4.

Figure 4

image

Hazzah!

Merge Columns

Merging columns involves combining the values from more than one column into a single column. There are actually two different results you can achieve when merging columns. The first is to replace all of the merged columns with the new column. With this option, the original columns go away. The second is a create a brand new column with the merged values while leaving the original source columns intact. We will start with option 1.

4. Select both the First Name and Last Name columns.

5. Click the Merge Columns button on the Transform tab of the Ribbon, as shown in Figure 5. NOTE: The Merge Columns button is grayed out by default. It is only activated when more than one column is selected.

Figure 5

image

After clicking the Merge Columns button, the dialog shown in Figure 6 will appear.

Figure 6

image

In this dialog, you have the option to choose a separator for the values in the resulting column. You also have the ability to choose the name of that column.

6. Name the column something appropriate, like Full Name.

7. Click the Separator drop down. This will display the options shown in Figure 7.

Figure 7

image

8. Choose Space. This will, as you might expect, separator the values from each column with a space. The dialog should now look like Figure 8.

Figure 8

image

9 Click OK. This will result in what appears in Figure 9.

Figure 9

image

Note that the First Name and Last Name columns have been replaced by the new Full Name column. In this example, we will assume that is NOT what we want.

10. Delete the Merged Columns step, shown by the arrow in Figure 9.

This time, we will create the Full Name column while preserving the First Name and Last Name columns.

11. Select both the First Name and Last Name columns, just as we did before.

12. Click the Add Column tab of the Ribbon, shown by the arrow in Figure 10.

Figure 10

image

The Add Column tab has some options that are available on other tabs, and a few that are only available here. Figure 11 shows the Add Column tab.

Figure 11

image

While we will not go into them in this series, I will take a quick moment to discuss one of the options here. The Add Custom Column button allows you to create a new column manually with M code. As of the November 2014 release of Power Query, the dialog for the Add Custom Column is not very robust. But let’s give the team some time. They are doing great work.

13. Click the Merge Columns button on the Add Column tab, shown by the arrow in Figure 11.

14. The steps for creating the new column on the Add Column tab are identical to the steps on the Transform tab, other than the tab on which you click the button. Follow the same steps as before, specifying the Space for the separator and a meaningful name like Full Name, as shown in Figure 12.

Figure 12

image

15. Click OK. The result should resemble Figure 13.

Figure 13

image

Note that the new column is added to the end, the far right, of the query. I did scroll to the right before capturing Figure 13 in order to see the new column. We have ability to move it elsewhere, though.

16. Right-click on the Full Name column to reveal the context menu shown in Figure 14.

Figure 14

image

17. Click Move.

18. Click To Beginning, as shown in Figure 14. NOTE: If you are using a device without the option for right-clicking, you can also just select the column and use Move button in the Manage Columns section of the Home tab. However you accomplished the move, the result will appear in Figure 15.

Figure 15

image

I did scroll to the left before capturing Figure 15 to see the Full Name column in its new position. The Move options are great when you want to move something a long way, like when you have a lot of columns. When you want to move columns within your field of view, you can just click and drag them.

19. Click and drag the Full Name column until the indicator shows it can be dropped to the right of the Last Name column, as shown in Figure 16, and release it.

Figure 16

image

The result will be that the Full Name column is now positioned between the Last Name and Birth Date columns, as shown in Figure 17.

Figure 17

image

Note that Power Query still put both of the column moves into a single step, even though we used two different methods to accomplish them. Cool.

Split Column

The last transformation we will cover in this post is the Split Column. This allows you to divide a single column into two or more columns based on a delimiter you specify or a number of characters. In this example, we will split the Birth Date column into 3 columns, with the splits occurring on the “-“ between the components of the date value.

20. Select the Birth Date column. NOTE: Our Birth Date column is still of the Any data type at this point. However, the Split Column transformation will still work just fine.

21. Click the Split Column button on the Transform ribbon, shown by the arrow in Figure 18.

Figure 18

image

This will reveal the two options shown in Figure 19.

Figure 19

image

22. Click the By Delimiter option shown by the arrow in Figure 19. This will open the Spit Column dialog shown in Figure 20.

Figure 20

image

These are the default options. These options will look for commas and, due to the active option under Split, create a new column after each comma. In our case, the comma delimiter will not help.

23. Click the delimiter drop down, shown by the arrow in Figure 20. This will reveal the pre-populated options shown in Figure 21.

Figure 21

image

The hyphen, the separator between the parts of the value in our Birth Date column is not available in this list. No worries.

24. Click the “Custom” option, shown by the arrow in Figure 21. This will activate a textbox allowing you to enter the delimiter character(s) you need.

25. Enter in a hyphen, as shown in Figure 22.

Figure 22

image

The next choice we have to make is in the Split section. The options are pretty self-explanatory. For our example here, the default option, “At each occurrence of the delimiter,” is the one we want.

26. Click OK. This will produce the result shown in Figure 23.

Figure 23

image

There are a few things to note here. First, because we chose the Split Column button on the Transform ribbon, the original Birth Date column is gone. Second, the delimiter itself, the hyphen in our example, is not persevered. Third, for better or worse, Power Query has chosen not only to split the column, but also to change the data type of the resulting columns to Whole Number since they seem to match that pattern. So, this is a case where a single action generated TWO query steps. If the data type change is not appropriate, you can delete that step and still keep the split columns.

Note: The reason that final step is called “Changed Type 1” is that we already had a step called “Changed Type.” When there must be more than one instance of the same transformation, Power Query will merely add an increment number to the name since the names of steps MUST be unique within a query. The reason for this is that the steps behind the scenes in the M code reference previous steps by name. If there is more than one step with the same name, the M code will freak out as it could not know which one is correct.

The last thing we need to do in this post is to undo the split column, since we don’t need that going forward.

27. Delete both the Changed Type 1 and Split Column steps shown by the arrow in Figure 23. The result will be just like Figure 17 above, with the Full Name column having been created and moved into position just to the right of the Last Name column. This is shown in Figure 24.

Figure 24

image

That wraps up our work with Text Columns. In the next post in the series, we will start working with Number Columns. If you want to follow along, please use the Close & Load button to save this query and then save the Excel file. We will continue to use it in upcoming posts.

Comments

Comment from Chuck Whittemore
Time March 5, 2015 at 9:26 am

With regards to Custom Delimiters: do you know if its possible to delimit by Carriage Return?

Comment from Mark V
Time March 5, 2015 at 2:10 pm

The short answer is yes. :) If you use the UI to specify some other delimiter, you can then edit the M in the Formula Bar or Advanced Editor and replace that delimiter with #(lf). I played with this and it works, with a few drawbacks that you can get around with more M. A blog post on this with an example is forthcoming. Thanks for the question.

Comment from Mark V
Time March 5, 2015 at 2:25 pm

With some advice from Microsoft, I tried using #(cr) instead and it did not work for me. Only #(lf) did. So, you have a situation where you have CR with no corresponding LF?

Comment from Mark V
Time March 5, 2015 at 2:26 pm

That said, they did provide an M function that I am about to try out. It may work that way. That will be included in the upcoming blog post. Feel free to email me: my first name at my blog url.

Write a comment