Category: M

Power Query Free Training Webinar Follow-Up Questions

By Mark on April 17, 2015

On April 7th, I gave my Power Query: Data Chemistry for the Masses presentation as part of Pragmatic Works Free Training on the T’s series. You can find the recording of that webinar here. I want to take a moment to thank the folks at Pragmatic Works for organizing and hosting this series that is such a great asset to the community.

At the end of the session, there were more questions than we had time to handle. So, they emailed me the questions in order for me to answer them in a post like this. Behold!

What is the maximum number of rows allowed in an Excel worksheet?

This question was likely in response to my recommendation of loading Power Query results to the Excel Data Model (Power Pivot) instead of to a worksheet. According to this info from Microsoft, the maximum row count in a worksheet is 1,048,576.

When combining files in a folder can the workbook have multiple tabs?

As far as combining files using the From Folder source, I have only succeeded with text files. However, there is a post here on the Dutch Data Dude blog that explains how to combine that source with a little custom M to make this happen.

When passing a parameter to an M function, can you select from a pick list of choices?

This question was in response to my converting a power query on Nobel Prize Laureates into a function that prompts for a prize category and then returns the Laureates for that category. Typically, instead of users interacting directly with the function, the parameter is passed to the function via a custom column. There is great post here on the Data Chix blog on how to do this.

Could we get a copy of the file being used in the demo?

You can get all of the materials for the presentation, including the files used, on the page for that presentation on this very blog.

Is the option to load to data model available for Excel 2010 or later versions?

I just learned over the weekend at SQL Saturday Madison that Excel 2010 does NOT have the option to load directly to the Data Model. Bummer. So, if you are using Excel 2010, you will need to load to a worksheet and then import to Power Pivot from that worksheet.

How do you know a web page is in JSON?

Well, web pages are not really in JSON. JSON is actually a “lightweight document interchange format.” I think of it more like a simpler XML. This question comes from my use of the Nobel Prize API to download Laureate data in the JSON format as part of this presentation. This is also detailed here as part of my Power Query Decathlon – Beginner blog series. The URL for consuming the API using JSON is http://api.nobelprize.org/v1/laureate.json. You can also do it as a CSV using http://api.nobelprize.org/v1/laureate.csv which will actively download a CSV file with all of the same data.

What do the different separators mean? [] i guess means column, but what is {} for?

I assume this question is related to the appearance of [] and {} in M query syntax. You can read about these in the Query Formula Language Specification document. This is a large resource and not recommended for “light” reading. 🙂 According to the specification, “We can use the lookup operator ([]) to access the fields of a record by name.” Also, the brackets [] can be used for referring to Records by surrounding the elements within the Record. the {} curly braces are used to referring to a List. A great, and easy to follow, resource is Chris Webb’s great book Power Query for Power BI and Excel. I am still very much a student of M, so I will not go much deeper into this at the moment.

Any plan from MS to add the ability to export the resulting data to a place other than XL, Data Model, eg text file, db?

I would LOVE the ability to use Power Query to load data into a database or a file, etc. Note that Power Query is used outside of Excel for loading data in the new Power BI preview as well as the Power BI Designer, which is also currently in preview. I have yet to see public information regarding any detailed plans for using Power Query to load to a larger number of targets. However, given the investment Microsoft is making in Power Query, it seems likely to me that we will see it popping up in more places in the future.

Could we use an Itunes folder as a source and consolidate music metadata?

Yup.

Here is the M code for a query I used to do just that:

let
    Source = Folder.Files("C:\Users\Mark\Music\iTunes\iTunes Media\Music"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name", "Folder Path"}),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns","C:\Users\Mark\Music\iTunes\iTunes Media\Music\","",Replacer.ReplaceText,{"Folder Path"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value","Folder Path",Splitter.SplitTextByEachDelimiter({"\"}, null, false),{"Folder Path.1", "Folder Path.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Folder Path.1", type text}, {"Folder Path.2", type text}}),
    #"Replaced Value1" = Table.ReplaceValue(#"Changed Type","\","",Replacer.ReplaceText,{"Folder Path.2"}),
    #"Split Column by Position" = Table.SplitColumn(#"Replaced Value1","Name",Splitter.SplitTextByPositions({0, 2}, false),{"Name.1", "Name.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Position",{{"Name.1", Int64.Type}, {"Name.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type1","Name.2",Splitter.SplitTextByEachDelimiter({"."}, null, false),{"Name.2.1", "Name.2.2"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Name.2.1", type text}, {"Name.2.2", type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type2",{{"Name.1", "Track"}, {"Name.2.1", "Song"}, {"Name.2.2", "File Type"}, {"Folder Path.1", "Artist"}, {"Folder Path.2", "Album"}})
in
    #"Renamed Columns"

In the Code above, I highlighted the path to my Itunes folder. Just replace that with yours and past into a blank Power Query window. Here is the query once you have done so.

There is more data in the Itunes Music Library.xml file, but I haven’t figured out how to use it effectively yet.

Alright. That wraps up the questions. Thanks to all who tuned in to the webinar and provided such great questions.

Advanced Column Splitting in Power Query

By Mark on March 9, 2015 • ( 18 Comments )

A friend of mine left a question on one of my Power Query Decathlon posts asking if it was possible to split a column using a carriage return as a delimiter. After a little poking around myself, I was not finding the answer I sought. So, I reached out for help. Curt Hagenlocher, a developer at Microsoft was kind enough to help me out. As it turns out, it appears that splitting a column based on carriage returns does not work. However, you CAN split columns based on the line feed character.

Note: M is actually informal name for the Power Query Formula Language.

The sample data

First, I created a collection of lines of text separated by using the <Enter> key. I did this in NotePad++, a free but powerful utility. I activated the feature which allows viewing of all non-printing characters. That is shown in Figure 1.

Figure 1

In Figure 1, you can clearly see the CR and LF characters after lines 1 and 2.

I then highlighted this block of text and pasted it into each of the cells in Field2 in the table in Excel shown in Figure 2. Not, I had to paste this into each cell using the formula bar, not into the cells directly.

Figure 2

The result is a very simple table where all of the values in Field2 contain <CR><LF> characters, hidden by default.

I then added a query with this table as its source by using the From Table button on the Power Query Ribbon. The result is the query shown in Figure 3.

Figure 3

This is our starting point for two different solutions.

M-less Solution

The first, and easiest solution starts with a simple Split Column transformation on the Field2 column. Figure 4 shows how to configure the Split Column for this example.

Figure 4

We start by creating a basic Split Column making sure of the following:

Select whichever delimiter you want. However, it is best if you can find one that actually does not appear in the data.
Make sure the “At each occurrence of the delimiter” option is selected. This is the default, anyway.
Since there are a total of 3 lines of text in our Field2 domain of values, we need to set the “Number of columns to split into” option. You would think that the option indicating to split on each delimiter would mean this step would not be necessary. Unfortunately, as least as of the March 2015 update of Power Query, it is.

When you click OK, the result is shown in Figure 5.

Figure 5

Since there are no commas in this text, the result is a total of three columns, but only one of them has data. However, this is a necessary first step. The use of non-printing characters as delimiters is not supported by the Power Query UI. So, we need to manually edit the M for the Split Column by Delimiter step. Yes, I know I said this was M-less. However, the edit that is required is tiny and you are not really writing any M.

Figure 6 shows the unmodified M code for the Split Column by Delimiter step.

Figure 6

Notice the delimiter shown by the arrow. It is the comma since that is what we chose for this step initially. In order to change that to the <LF> character, we replace that comma with the #(lf) value as show in Figure 7.

Figure 7

The result of this change is shown in Figure 8.

Figure 8

Great, right? Awesome? Well, sort of. While this worked in our very simple situation in which we knew how many line of text were present, this was OK. And if that is your situation, then hazzah! However, reality is typically more complicated than that.

I saved this query as M-less in the Excel file provided with this post.

M-ful Solution

The more elegant and flexible solution requires M. And, I am still very much a student of M, and a beginner at that. The M code featured in this solution was provided by Microsoft’s Curt Hagenlocher, mentioned above, who was tremendously helpful in answering this question and very patient with my M noobness.

With this solution, the first step after the source is to change the data type of Field2 to Text. While I tried it without this step and it worked just fine on my simple sample data, it is not a bad precaution to ensure the Field2 column is text before proceeding. So, I left it in. The result is shown in Figure 9. Note that this is the last step in this solution that can be accomplished in the UI. It is all M from here.

Figure 9

There isn’t much to see here. But if you click on the View tab of the ribbon and then click on the Advanced Editor, you will see the M code for the entire query. That is shown in Figure 10.

Figure 10

The next step is to paste in the M provided by Curt. That code replaces the bottom section of the query, shown in the rectangle. Once the new code is pasted in, we will highlight a few parts. The final query is shown in Figure 11.

Figure 11

The entire query is pasted below to make it easy for you to copy. Note that, following the pattern of how M works, each step in the function, shown highlighted, has a name that is then referenced by subsequent steps.

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Field2", type text}}),
    SplitByDelimiter = (table, column, delimiter) =>
        let
            Count = List.Count(List.Select(Text.ToList(Table.Column(table, column){0}), each _ = delimiter)) + 1,
            Names = List.Transform(List.Numbers(1, Count), each column & "." & Text.From(_)),
            Types = List.Transform(Names, each {_, type text}),
            Split = Table.SplitColumn(table, column, Splitter.SplitTextByDelimiter(delimiter), Names),
            Typed = Table.TransformColumnTypes(Split, Types)
        in
            Typed,
    #"Split Column by Delimiter" = SplitByDelimiter(#"Changed Type","Field2","#(lf)")
in
    #"Split Column by Delimiter"

The steps in the function, as clearly as I can understand/explain them, are as follows:

Count = This step takes in the parameters for the table and the column and examines the contents in the column to discover how many times the delimiter passed in the parameter occurs in that column. This allows the function to dynamically create the correct number of columns instead of having to hard code it like we had to do in the M-less Solution.

Name = This step creates column names for as many columns as were determined in the Count step. The name for each column is prefaced with the original column name passed into the function.

Types = This step creates a list object containing all of the column names created by the Name step and appends to each the code necessary to converting them to text. This column will be used in a subsequent step to make it easy to convert all of the columns to text before exiting the function. This is cool. 🙂

Split = This step uses the Splitter.SplitTextByDelimiter function to perform the splits of the column into the columns provided in the Name step above using the specified delimiter.

Typed = This step uses the list of column names and the strings from the Types step to convert all of the newly created columns to Text. The results of this step are then returned from the function.

That completes the function walkthrough. Pretty neat, if you ask me.

The final step of the query is to invoke this function. This is done via the section of code in the red rectangle in Figure 11. For the table parameter of the function, this step uses the Changed Type query step we performed in the UI. This is a great example of how steps in queries, at least in M, are not limited to referencing only the single step that immediately preceded them.

The final result is shown in Figure 12.

Figure 12

This query was saved a M-ful in the working file provided.

NOTE: Neither of these solutions work for the <CR> character. So, replacing the #(lf) above for #(cr) does not produce the desired result. I even tried using #(cr)#(lf) and that does not work either. Just #(lf) worked for me.

There you have it. You can find the working file I used for this post right here.

One of the aspects I like best about blogging is the learning I get out of it. In order to try to explain things clearly in my posts, I need to study them and understand them. Through a previous post, I got a great question I had not thought about. Through the investigation, I found yet another great and helpful person within Microsoft in Curt Hagenlocher. In putting this post together, I obtained a far greater understanding of M and it capabilities than I had before.

Mark Vaillancourt

I drink coffee and know things