Using List.Contains To Filter Dimension Tables

In order to make a clean user experience for Power BI report consumers, I will limit the data in my Dimension tables to only include data that is in the Fact tables. For example, a customer dimension table might have 1,000 customers in it, but the sales fact table may only have 400 customers in it. This can happen because the sales data is limited to recent years, or specific regions, for a given report.

I don’t like loading up a slicer with dozens or hundreds of items that have no corresponding records. The same would apply if there was no slicer, but the consumer wanted to filter using the Filter pane. So I’ll filter the customer table so it only includes what I would call “active customers” that are shown in the sales table.

The most straight forward way to do this is by doing an Inner Join between the tables, but there is another way, using the powerful List.Contains() feature of Power Query. And what makes it so powerful is not just it’s utility, but when you run it against data in a SQL Server or similar server, Power Query will fold the statement.

Let me walk you through both methods so it is clear. I’ll be using the WideWorldImportersDW database. If you are a Power BI developer and don’t have a local copy of SQL Server and some test databases on your machine, you are doing yourself a disservice. See this video for quick instructions on getting SQL Server and some test databases up and running. All of the products are free.

There is a Power BI file attached below but you can easily follow along in a new PBIX file for these examples. For both example, you’ll need to connect to the Fact Sales and Dimension Stock Item tables. There are 627 items in the Stock Item table, but only 227 unique items in the Fact Sales table. So any slicers or filters would have 400 unnecessary items. Those are what we want to get rid of. First, let’s do the Inner Join method.

Inner Join to Limit Records in the Dimension Stock Item Table

  • First, we need a unique list of items from the Fact Sales table to work with. To do this, right-click on the Fact Sales table, and create a reference. Right-click on that newly created query and make sure it “Enable Load” is not checked. This table will not be loaded to the data model.

  • Right-click on the Stock Item Key field in this query and select Remove Other Columns. This step isn’t necessary, but it makes the query logic a bit easier to understand.

  • Right-click on the Stock Item Key and select Remove Duplicates. Power Query will probably tell you in the lower left you have 227 rows.

  • From the Dimension Stock Item Table, click on the Merge Queries button on the Home Ribbon.

  • Select the Stock Item Key column in the window.

  • Select the Fact Sale (2) table in the dropdown. This is the one we created a few steps ago. Then select the Stock Item Key field.

  • Select “inner” for the Join Kind. Your screen should look like this. You’ll notice it found 227 matches out of the total 627 items in the Dimension Stock Item table

2020-05-20 08_01_56-.png
  • Press Ok, then expand the field.

  • Right-click on the newly expanded Stock Item key.1 field and remove it. If you were to delete this field before you expanded it, it would have broken query folding, which means all data would have to be brought back to your Power Query engine and processed locally, or in the on-prem gateway if this were refreshing in the service. But by expanding and then deleting the field, it has the SQL Server do all of the work for you.

Now your Dimension Item Table has 227 records, which is the same number of unique items sold in the Fact Sales table. And it all folds.

There is another way to do this though, and it has some advantages.

List.Contains() to Limit Records in the Dimension Stock Item Table

  • The first few steps are the same:

    • Right-click on the Fact Sales table, and create a reference. Make sure it is not set to load to the data model. I’m going to rename this query ActiveItemList to make things easier, because we’ll be manually typing it in later.

    • Remove all but the Stock Item Key fields.

    • Remove duplicates. You should have 227 records.

  • Now we need to convert this to a list. This is actually simple. Right-click on the Stock Item Key field, and select Drill Down. This is necessary because List.Contains operates on lists as its name implies, not tables, fields, etc.

2020-05-20 08_10_28-.png
  • Now you have the 227 items in a list, not a table. The Drill Down command just added the field name in brackets to the table name, so you will see something like this in the formula bar:

= #"Removed Duplicates"[Stock Item Key]
  • Now we are ready to filter our Dimension Stock Item table. Go back to that table. You should delete the Inner Join merge and expansion steps from the previous example.

  • Filter the Stock Item Key by unchecking any item. Doesn’t matter which one. We just want Power Query to get the formula started for us. You will have something like this in the formula bar:

= Table.SelectRows(#"Dimension_Stock Item", each ([Stock Item Key] <> 2))
  • We are going to remove the ([Stock Item Key] <> 2) part, and make the entire formula look like this:

= Table.SelectRows(#"Dimension_Stock Item", each List.Contains(ActiveItemList, [Stock Item Key]))

List.Contains() takes our list named ActiveitemList and compares it to each value, the [Stock Item Key] field, in the Dimension Stock Item table and only keeps those that return true, i.e., a match was found.

You may think that this method is not too efficient. And you would be right, except for what Power Query does with this statement when folding is going on with a SQL Server. If you used this logic on a few hundred thousand rows from a CSV file, SharePoint list, or other source that didn’t fold, this would take forever. The Inner Join method is much faster for those scenarios, but the Inner Join method has more steps, it takes a bit more effort to set up, and the logic of why you are merging a table then deleting what you merged isn’t immediately clear when your future self is reviewing and modifying the code.

Power Query takes the List.Contains() logic and turns it into a SQL IN operator and lets the server do the work. If you right-click on this step in the Query Properties list, and select View Native Query, you’ll see something like this, which is the SQL statement sent to the server.

2020-05-20 08_28_46-.png

So which is better? It depends. If you have relatively small data set, say data from Excel, a CSV file, or a smaller SharePoint List, then List.Contains() will do fine. Anything under 10K records or perhaps a bit more would be ok. A dataset of any size will work fine with a source that will fold your statement, like SQL Server.

There are two reasons I like List.Contains() though and prefer it over an Inner Join when the data set allows it:

  1. The logic is cleaner. I am just filtering a column by a list. Not merging another table then deleting the merge. It is just one step in the Query Properties, a Filtered Rows operation. It is immediately clear what this step is doing.

  2. I can easily create a manual list to filter off of if I just want a few items. For example, if I only wanted a few products in the example above, my ActiveItemList could be one of the manual lists like below rather than a list generated from another table. The first example is few manually chosen numbers, the latter all items from 200 through 250. Creating a normal filter

= {200, 201, 202, 210, 254}

= {200..250}

So there you have it. Another way to filter one table from another set of data. See my Power BI file here.

Edit Oct 27, 2020 - to make this run much faster, do the following:

  1. In your list, right-click on the last step of the list and select “Insert Step After”

  2. Whatever is in the formula bar, wrap it with List.Buffer().

This will cause the list to be processed by Power Query much faster and create the native query almost instantly. The server will run the query at the same speed, which will be very fast. This will radically cut down on the time it takes Power Query to generate the SQL statement though, especially on larger lists.

Thanks to Nathan Watkins on Twitter for this suggestion.

Case Insensitive Merges In Power Query

Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. Some never use the shift key, and others CAPSLAP everything. It results in data that looks like this:

2020-04-26 09_49_43-Book1 - Excel.png

There are at least 2 ways to fix this before merging:

  1. Manually fix the data at the source. {shudder}

  2. Use one of the Power Query functions that will adjust the case - Text.Proper, Text.Lower, or Text.Upper.

The first is just unacceptable, unless you are a manager and can tell whomever entered it that way to go fix it. 😉

The second isn’t ideal because neither upper nor lower is right, and proper case will not correctly adjust names like Joe McGuinness. Text.Proper(“Joe McGuinness”) returns Joe Mcguinness.

Unfortunately, in the Table.NestedJoin() function documentation, there is no obvious way to do a join that ignores case. There is an optional parameter called keyEqualityComparers but believe it or not, that is never defined anywhere on Microsoft’s site. If you search for it, it will only show up as options for some functions, but there is no guidance on how to use it, and it may not help even if there were.

The good new is, in Power Query for both Power BI and Excel, there is a relatively new feature called “Fuzzy Merge.” It allows you to match names like Will, Bill, and William in merges. You have to tweak the sensitivity or the matches either don’t work, or return way too many matches.

But there is a way to use Fuzzy Merge to do exact matches, yet ignore case sensitivity.

You can see below if I merge the normal case table to the CAPSLAPPER table, I get no matches. It shows 0 of 10 records matched.

2020-04-26 10_02_52-.png

But if you do the following, you’ll get a perfect match on all records:

  1. Check “Use fuzzy matching…” at the bottom of the Join Kind section

  2. Expand the Fuzzy Matching options. You must do this. If you don’t, you will get a 10 out of 10 match in this sample data, but you will also get other matches that are not exact in a larger data set. For example, If I added a “MERLE BERBER” to the table, it would return as a match for “Merle Barber” - which we do not want.

  3. Change the Similarity Threshold to 1. The default is 0.8, and that is why in step 2 above you’ll get more matches than you want. By setting it to 1, it becomes an exact match.

  4. Check the “Ignore Case” box. This is where the magic happens.

  5. Uncheck the “Match by Combining Parts” box, and ignore the remaining boxes in the Fuzzy Matching Options. They aren’t needed for this.

2020-04-26 10_09_20-.png

Now your merge will show a 10 out of 10 match, without having to change the source data or change the case within Power Query.

Use Power BI's Enter Data Feature In Excel

EDIT: This article is no longer necessary as Excel’s version of Power Query now has an Enter Data feature built in. But the article below will show you how the Enter Data feature works, so still has some value. You do need Excel 2022 or Office 365. If you have an earlier version of Excel, you may still need these steps.

One of the cool things about Power BI is you have a nice “Enter Data” button on the home ribbon where you can quickly add data to your model. The table isn’t dynamic, but sometimes you just need a small table to finish your model to add descriptions or some other bit of data without creating another table in a source that needs to be connected to. Enter Data is perfect for this.

2020-02-23 18_04_10-Query1 - Power Query Editor.png

It looks like the image above. You just start typing the column names, some data, and name your table. The merge it with other queries or bring it directly into the data model.

Except that isn’t Power BI. I created that image in Excel! Notice how it is green instead of Power BI’s yellow? How did I do that? With just a little bit of trickery.

First, I created a table using Enter Data in Power BI with just a single column and record. Then I copied the M Code Power BI created. It looks like this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlSKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}})
in
    #"Changed Type"

Now create a new Blank query in Excel’s Power Query, and copy the Power BI code, then paste it in the Advanced Editor. Now in that query, select the little gear icon next to the Source statement, just like you would in Power BI to edit the table.

2020-02-23 18_08_42-Query1 - Power Query Editor.png

That’s it. The Create Table dialog shows up and you can add and change data and records however you like. I keep that code snippet in Evernote where I can do this in Excel anytime I need to without firing up Power BI to get the code. Power Query is just storing this data as a compressed binary string of characters, and can usually store up to 3,000 cells.

Why would you do this though? Unlike Power BI, Excel has a place for this kind of data, in a Table in the worksheet you import into Power Query. That is probably why the Excel team didn’t bother to expose this functionality anywhere in Excel. The underlying Power Query tool is very similar to the one in Power BI though, so all of that Create Table code is still there.

Maybe you have some data you want to be hidden, or at least difficult to find or modify, and don’t want to fool with hiding sheets, locking workbook properties, or setting worksheet’s hidden property in VBA to xlVeryHidden. All of those work well, but it also makes it harder for you to get at to edit if you need to in the future. This method is easy because it stays in the Power Query UI, where you are comfortable working.

Avoid Using Excel XLS Files As A Data Source

Power Query in both Power BI and Excel can use all Excel file formats as a data source, including the old 2003 format that ends in XLS. Most of the time, this works fine, but I recently hit an issue that consistently returned incorrect results.

Microsoft replaced the XLS format over a decade ago when Office 2007 launched with the new .XLSX format, but there are still companies that use the older XLS format. I frequently get data exported from a national payroll service that offers the XLS file format as the only spreadsheet based format.

It turns out that if data is formatted a certain way in the XLS file, Power Query will return the wrong value. Look at the data below.

2020-02-15 14_51_55-Negative Number Format Error  -  Compatibility Mode - Excel.png

Column A has integers 1-4, and then a negative 5. Column B is simply a list of formulas pointing to column A, but it is formatted with the number format 0.00;[Red]0.00 which is a normal format in the Number section. It just shows the number in red with no negative sign. I’m not sure why anyone would format a number this way. A negative number needs to either have a negative sign or be in parenthesis, but I’ve seen a lot of people use this, and this national payroll provider’s systems are programmed to use for all numerical data.

2020-02-15 14_55_16-Negative Number Format Error  -  Compatibility Mode - Excel.png

If the file is a 2007 or later format, this is no problem. But oddly enough, Power Query doesn’t see it that way in the XLS format. Power Query sees the below data when you connect to the file. Here is what it looks like when you start the import process:

In the preview window

In the preview window

In the editor

In the editor

You can see this is not just a visual issue. I added a column called “Addition” that just adds the values in [Regular Number] to [Special Format]. The positive numbers 1-4 added correctly, but -5 was added to 5 (instead of -5) returning the incorrect result of 0. It should be -10.

There is no fix for this. I’ve contacted Microsoft’s Power BI team and they confirmed it but understandably weren’t going to commit to spending a lot effort to fix a file format that was discontinued years before Power BI was first released.

Those of you that are a bit more savvy in working with XLS files might be thinking “Well, of course. To fully support XLS files you need to have the 2010 MIcrosoft Access Data Engine files installed.” If you’ve ever worked with Microsoft Access files in Power BI you have these files. And yes, the ACE engine does have some addition support for XLS beyond what Power Query supports natively, but it does not fix this issue. There are at least two workarounds for this:

  1. Save the file as an XLSX file or move the tab with the data you need into an existing XLSX file.

  2. Change the format, or better yet, remove all formatting. As long as the negative sign is visible to the human eye, then Power Query will see it as well.

It took me about an hour to figure out why a report I recently wrote was returning bad data. As you can imagine, the number format was the last thing I would consider to be an issue. I don’t know if this is the only issue with XLS files, but going forward, I am putting filters into my queries that will filter out XLS files before any File/Combine operations are done with Excel files, and fully documenting processes to save all files as XLSX before storing them in network, SharePoint, or Azure DataLake folders.

If you have XLS files in your workflow, you should consider doing the same. You never know what data formatting landmine lurks in that ancient file format that Power Query will read incorrectly.

Making Sense Of Subtotals Settings In The Power BI Matrix Visual

For showing data, the Matrix visual is one of the most popular visuals, and most comfortable for Excel users since it looks a lot like a Pivot Table. Showing subtotals the way you want though can be a bit of a challenge simply because there are so many settings in the Formatting pane in different sections.

Lets quickly look at a few examples before doing a deep dive. This is how a default matrix looks with just a few fields. It has two rows (Year and Month Name), one column (Category), and one value (Total Sales).

20190421 - Default Matrix.png

Ideally, there would be one place to change all of the settings for a Matrix visual for the subtotals, but there isn’t. The settings are split among at least three sections in the formatting pane, and changing some settings in one area will automatically change other settings in other sections. So it can be a bit tedious to get it to look like you want. The good news is the Power BI Desktop editor gives you a lot of control. You just have to know where to look.

When you add a Matrix visual, it defaults to a stepped value, putting the subtotals next to the first row, or Year, in the example we’ll work through. (See the first image in this post.)

If you prefer the totals at the bottom, select the formatting pane for the visual, scroll down to the Subtotals section, and looks for “Row subtotal position” and change it to “bottom.”

20190421 - Default Matrix subtotals on Bottom.png

This leaves a blank row though next to the Year row, and adds a new “Total” row description. It isn’t the most efficient use of space, but it makes more sense to many people to always have totals at the bottom.

If you don’t like the stepped format where the months are indented a bit, you can turn your matrix into more of a table format turning Stepped Layout off. Go to the Rows Headers section, and turn Stepped Layout off. Note that when you do this, it will automatically change the Subtotals/Row subtotal position setting to Bottom. You can change it back to top if desired.

20190421 - Matrix in Table Format.png

Now the subtotals are at the bottom, but you no longer have the blank row by the Year field as it moves the Months Names to the next column. This is similar to switching a Pivot Table view in Excel to Table Format.

Let’s step back for a minute though and look at the major places in the formatting pane to change how subtotals can be displayed.

  • Row Header section

    • Stepped Layout - turn on or off. Think of this in Excel terms as as a table format (off) or classic Pivot Table format (on). Examples are shown above in the first two images.

    • +/- icons - turn on or off. Here you can turn the +/- expand/collapse icons on or off to allow your users to drill down at each row subtotal. This provides very granular control for your end user. See image 1 below to see how this looks.

      • As of the April 2019 build of Power BI, the +/- only affects rows. The Matrix visual cannot yet collapse/expand columns. Go to this UserVoice request to vote for this feature.

  • Values section

    • Show on rows - turn on or off. This only applies if you have two or more fields or measures in the Values section. See Image 2a and 2b below for an example of this. It changes how rows are displayed, which also impacts how subtotals are shown.

  • Subtotals section

    • Row subtotals - turn off or on. This will globally turn subtotals off or on for the entire matrix.

    • Row subtotal position - top or bottom. This will show the subtotals above (top) or beneath (bottom) the detailed data. The Row Header/Stepped Layout setting may change this setting automatically.

    • Per row level - this will allow you to turn off or on subtotals for each level. See image 3a and 3b below for examples.

Image 1 - +/- icons

Image 1 - +/- icons

In images 2a and 2b below, there are multiple values in the Values section. This example has both Total Sales and Total Units. If you just have two values, showing in columns might work but it is often better to switch them to rows as shown in 2b below. If you have three or more fields in the Values sections, showing on rows is usually the better way to go.

Image 2a - multiple values in columns

Image 2a - multiple values in columns

Image 2b - multiple values in rows - Values/Show On Rows turned on

Image 2b - multiple values in rows - Values/Show On Rows turned on

If you want some subtotals, but not at every level, change the setting in the Subtotals section, Per row level settings. If Per row level is set to off, you’ll have subtotals on or off for every row you have data in. So in this example, it means subtotals at the Year, Month, and Group level.

  • Image 3a - Subtotals are on, Per row level is off. This means for every field you have in the Rows section, you’ll have a subtotal

  • Image 3b - Per row level is on, and I’ve turned off the “Group” level, so there are subtotals for Year and Month.

  • Image 3c - Per row level is on, and I’ve turned all subtotals off except the “Group” level.

Image 3a - Subtotals at every level

Image 3a - Subtotals at every level

Image 3b - Subtotals off for Group

Image 3b - Subtotals off for Group

Image 3c - Subotals on for Group, off for Year and Month

Image 3c - Subotals on for Group, off for Year and Month

The Matrix visual is very flexible. It is likely possible to get it to show the subtotals you want, but you have to play with several different sections in the Formatting panel and understand how they interact with each other.

PS. There is another UserVoice request to show more that just the word “Total” when there is just one field in the Values section. Go here to read more about it and vote for it.

Create A Dynamic Date Table In Power Query

Most Power BI and Power Pivot (Excel) reports require a date table so time intelligence functions can calculate correctly. I’m going to show you how to quickly set up a date table in Power Query, one that will be dynamic. The starting and ending dates will move with your data over time. And because it is in Power Query, it will be an imported table which is what Power BI loves most, being more efficient than a table created in DAX with calculated columns.

You can download the Excel sample data and Power BI file at the bottom of this article if you want to follow along step by step.

EDIT: I have a big block of M code you can use to create your own date table at this link - it is an Evernote page. The contents of this blog post though are still important is it will show you how to make that block of code dynamic.

First of all, we’ll create a static table to get the query set up, then modify it so it is dynamic. The fastest way I’ve seen to create a dates table in Power Query is to start with the one line statement shown below, which we’ll put in a Blank Query. In Power Query in Power BI, it is on the home menu, New Source, Blank Query. In Excel, it depends on the version you have.

Office 365 versions and Excel 2019:

  1. Data tab in Excel

  2. Get & Transform Data section of the ribbon

  3. Get Data dropdown menu

  4. From Other Sources

  5. Blank Query

In Excel 2016 and earlier, you will need to open the Power Query window first. This also works in Office 365 and Excel 2019 if you are already in Power Query.

  1. On the Home tab, select New Source

  2. Other Sources

  3. Blank Query

20190317 - Static Date Table.png

Now you have a blank query called Query1, with a Source in the applied steps (#1 in the above image), but nothing is there. In the formula bar, type the following (#2) and press enter:

={Number.From(#date(2018,1,1))..Number.From(#date(2019,12,31))}

Let’s break that down:

  • Lists are defined in Power Query by enclosing them in what I call squiggly brackets, but are also known as braces or curly brackets. { }

  • When used with numerical data, if you have two periods between the first and second number, it will generate a list of all numbers between those numbers. So {1..10} would generate a list of numbers from 1 to 10.

  • My first number is Number.From(#date(2018,1,1)). This returns 43101. That is the same thing You’d get in Excel if you typed 1/1/2018 in a cell, then formatted it as a number.

  • My second number is 43830, the numerical value of December 31, 2019.

  • The total list has 730 numbers, 43101 - 43830, which is two years of data. Perfect. No skipped dates.

Now we need this to be a table of dates, not a list of numbers. Let’s fix that.

  1. Click the “To Table” icon (#4) in the tool bar. In the To Table dialog box, leave the defaults of “None” for the delimiter and “Show as errors” for extra columns. There are no delimiters or extra columns here, so there won’t be any issues.

  2. For the data type where it says ABC/123, click that and change it to Date. This is the tiny box below with the arrow pointing to it in the image below.

  3. Rename the column to Date.

  4. Rename the query to Date.

  5. Right Click on the query and make sure Enable Load is checked.

20190317 - Date Column.png

At this point, it is a valid date table, but quite spartan, and it isn’t dynamic. First let’s make it dynamic so the date ranges move with your data. I’ve created some sample data that I’ll use for this. I want my date table to encompass the earliest data from my Sales table so all of my sales history is covered, and I want the latest dates in the table to handle dates in the Open Purchase Orders table, which has expected receipts of merchandise well into the future.

A couple of rules about Date tables first:

  • The dates must be contiguous. There can be no gaps. Even if your business doesn’t operate on the weekends or holidays, those dates cannot be missing from your table. The way the data was created using a List ensures no dates are skipped.

  • You should almost always include the full year for any date in your table. So if your data has Jan 1, 2019 in it, you should have all 365 days of 2019 in your date table. I am working with calendar years in this example. If you have a fiscal year, that is fine, but your data should still encompass a full fiscal year. So if your year is July 1 through June 30, if you have July 1, 2019 in your dates table, you should have all dates through June 30, 2020, a full fiscal year.

So how do we make this dynamic? We need to figure out what the earliest date is in the sales table, since that table is going to have the oldest data for sales history, and we need to know the latest date will be based on the last date in the OpenPOs table.

20190317 - Earliest Date.png
  1. Right-click on the Sales table make sure “Enable Load” is not checked. You’ll see why in a minute.

  2. Right-click again on the Sales table and select “Reference”. We now have a second query that is identical to the Sales table, because it is simply pointing to the Sales table. Any changes to the Sales table will affect this new query, which is probably called Sales (2) at this point.

  3. Rename Sales (2) to pmEarliestDate. I recommend you put no spaces in the name as that complicates things later on.

  4. Right-Click and ensure “Enable Load” is not checked. Not only is there no need to load this, it will create an error as lists cannot be loaded into Power BI or Power Pivot.

  5. Select the Date column (#1 in the above image)

  6. On the Transform tab, select Date (#2), then Earliest (#3).

Now we have the earliest date as a list. A single item list. In my sample data, it is Jan 2, 2015. Jan 1 was a holiday, so I didn’t expect any data there, but remember, I need to make this a full year, so let’s tweak the formula that was generated by Power Query.

  1. Power Query used the List.Min() function to the earliest date. List.Min(Source[Date])

  2. We need to extract the year first, so wrap that in Date.Year(). Now it is just 2015.

  3. Now convert it back to a date, starting with January 2. Wrap step 2 with #date(year,1,1). The final formula is:

= #date(Date.Year(List.Min(Source[Date])),1,1)

If you were on a fiscal year, you’d need to do a bit more math to get this to July 1, 2014, for example, but still pretty easy to do. You’ve probably figured out by now that #date() is the same things as DATE() in Excel, taking year, month, and day for the arguments.

Now, we need the last date. I’ll repeat the steps above on my OpenPOs table’s date column, but instead of the Earliest date, I want the Latest date. My final formula is:

= #date(Date.Year(List.Max(Source[Expected Receipt Date])),12,31)

This will give me a date of Dec 31, 2020 since the latest date in that table is Mar 31, 2020. Note this time Power Query used the List.Max function to get the latest date in the table. Call this query pmLatestDate

Now I know my first and last date, and both are calculated dynamically. If dates in my source tables change, so to these two parameters.

So how do we make the Date table we created use these dates? Remember our original Date table started with this Source line:

={Number.From(#date(2018,1,1))..Number.From(#date(2019,12,31))}

We need to edit that formula a bit. We just get rid of the hard coded dates, and replace with our dynamically calculated dates. It becomes this:

={Number.From(pmEarliestDate)..Number.From(pmLatestDate)}

Now we have a date table with 2,192 consecutive dates from Jan 1, 2015 to Dec 31, 2020, with leap years accounted for automatically for us.

So why did I create a reference from both the Sales and OpenPOs table, and ensure those original table are not loaded? Now I can tweak my date ranges in those original tables, and the dynamic dates will account for it. So if my actual sales table has history going back to 1995, I can filter that in the original sales table to be 2015 and later if I want, and my actual data and Date table will reflect that.

Same thing with my OpenPOs table. I’ve seen companies put fake data, say Dec 31, 2999, for things like blanket POs, or some future forecast, or whatever. I don’t want a date table or actual data going out that far, so I limit it to realistic date ranges in the OpenPOs table with a filter on the Date column.

To load relevant data, I need to create two more references:

  1. Create a reference from Sales, and call it Sales Data.

  2. Create a reference from OpenPOs, and call it Open Purchase Orders.

  3. Makes sure “Enable Load” is checked for both of these.

  4. At this point, in both tables, you can do any further transformations, such as removing columns, grouping data, or whatever you want. Because these transformations are on references to original Sales and OpenPOs table, the original tables are unaffected, and the Date table will be unaffected, which is what you want.

It can help to see how the queries are related to each other. Click on the View tab in Power Query, then Query Dependencies. You should see something similar to this:

20190317 - Query Dependencies.png

You can see how the calculation of the earliest date, pmEarliestDate, and the Sales Data query both come from the original Sales query, but changes in the Sale Data query will not affect the date calculation. You can also see which queries are loaded into Power BI from here, and which have the load disabled.

Finally, I need to enhance the Date table. Right now, it is just dates. But to have rich reporting capabilities, we need things like month names and numbers, quarters, years, etc. I’ll create a few as a place to get started:

  1. Select the Date column in the Date table, then on the Add Columns tab, select Date, Month, Month. This gives us the month number.

  2. Select the Date column, Add Columns, Date, Year, Year.

  3. Select the Date column, Add Columns, Date, Month, Name of Month

You get the idea. Now let’s add a few things that you cannot do through the tool bar. We are going to add a column that will give us the Month and Year in the format of MMM-YY, so Jan-15, Feb-15, Mar-15, etc. This is handy for a number of visuals.

  1. Add Column

  2. Custom Column

  3. Name the column MMM-YY

  4. Type in the following formula:

=Text.Start([Month Name],3) & "-" & Text.End(Text.From([Year]),2)

Make sure to set the type to Text so it isn’t the Any data type (the ABC/123 type)

Now, if you know anything about creating charts or tables in Power BI and Excel, that column will sort alphabetically, which is useless, unless you want April to be before January. Of course you don’t. So we need to create a sorting column for this. Add a new column, call it “MMM-YY Sort” and add this formula:

=[Year]*100 + [Month]

This will create a column that will have 201501, 201502, 201503, etc to correspond to Jan-15, Feb-15, Mar-15, etc. You can use this column to sort your MMM-YY sort. Lastly, change this column to a Whole Number format.

At this point you can add dozens of more columns depending on your date needs. Quarters, weeks, ISO 8601 formatted dates, etc. But in our simple example, this is a really short query, and is fully dynamic. In the Advanced Editor, it would look something like this:

let
    Source = {Number.From(pmEarliestDate)..Number.From(pmLatestDate)},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type date}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Date"}}),
    #"Inserted Month" = Table.AddColumn(#"Renamed Columns", "Month", each Date.Month([Date]), Int64.Type),
    #"Inserted Year" = Table.AddColumn(#"Inserted Month", "Year", each Date.Year([Date]), Int64.Type),
    #"Inserted Month Name" = Table.AddColumn(#"Inserted Year", "Month Name", each Date.MonthName([Date]), type text),
    #"Added MMM-YY" = Table.AddColumn(#"Inserted Month Name", "MMM-YY", each Text.Start([Month Name],3) & "-" & Text.End(Text.From([Year]),2)),
    #"Added MMM-YY Sort" = Table.AddColumn(#"Added MMM-YY", "MMM-YY Sort", each [Year]*100 + [Month]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added MMM-YY Sort",{{"MMM-YY Sort", Int64.Type}, {"MMM-YY", type text}})
in
    #"Changed Type1"

The final step is once this is loaded, close Power Query and set this as your date table for Power BI:

  1. Make sure automatic date logic is off in Power BI. Select File, Options and Settings, Options, Data Load. Uncheck “Auto Date/Time”. Leaving this checked will create all sorts of unnecessary hidden tables and columns. You’ve created a perfect Date table. Don’t let Power BI’s AI mess this up.

  2. In the Visual view, right-click on the Date table, and select Mark as Date Table, then select the Date column.

  3. It will validate it by ensuring you have no skipped or duplicate dates. Press Ok.

Now you can start creating your visuals, measures, etc. in Power BI, or pivot tables in Excel. As data is updated from your source system, your Date table will dynamically expand. This is handy as you would never have to edit the date range in the table and republish the report. The Power BI service will refresh all of the data, including calculating the ever-expanding date range as sales and purchase orders continue to happen into the future.

Below are the two files I used in my example, the PBIX file and the Excel XLSX file with the sample source data.

Power BI Dynamic Date Table

Excel Sample Source Data

Calculate Last Twelve Months Using DAX

One of the more common calculations a company uses is the last twelve months, or LTM, of data. This can be tricky if your date table always has a full year of dates for the current year, which it generally should. So if today is March 3, 2019, my date table will have dates through December 31, 2019. This is usually necessary for the date intelligence functions in DAX to work properly, and companies may have data beyond today in their model. For example, budget and forecast data will generally extend through the end of the year, or at least beyond today.

However, it often is challenging when you are trying to hide these future dates for specific measures. I’ve seen solutions that use functions like LASTNONBLANK() that get the last date with sales data in it, and that can work, but depending on how your data is laid out, it can make for larger and more complex measures with multiple FILTER() functions. For a visual you can sometimes use the relative filtering feature, but that won’t change the underlying value of the measure if you reuse it in another visual or refer to it from another measure.

Marco Russo recently wrote an excellent post on hiding future dates or calculations in DAX. The concept is brilliantly simple. Just add a column to your date table that returns TRUE if the date is today or earlier, or FALSE if it is after today, then use the CALCULATETABLE() function to return just a table of dates that fall in that TRUE range of dates.

That wouldn’t work for me though exactly as it was presented. I needed to create dates that were in the previous 12 calendar months, and I was working with a Power BI Dataset, which is a Live Query, and you cannot add columns to Live Query models.

20190303 - No New Column SSAS.png


So I opted to create two measures. First, I needed to create the date logic in my dates table. I wanted the previous 12 full calendar months, not the last 365 days of data. Note that my date table is named ‘Calendar’.

LTM Dates = 
VAR EndDate =
    EOMONTH ( TODAY (), -1 )
VAR StartDate =
    EDATE ( EOMONTH ( TODAY (), -1 ), -12 ) + 1
RETURN
    IF (
        MAX ( 'Calendar'[Date] ) >= StartDate
            && MAX ( 'Calendar'[Date] ) <= EndDate,
        TRUE (),
        FALSE ()
    )

This measure has two variables:

  1. EndDate - This calculates the last day of the month for the previous month based on TODAY().

  2. StartDate - This calculates the month 12 months prior to the EndDate, then adds one day to move to the first day of the next month.

Finally the measure uses a basic IF() statement, with some AND logic. If today is March 3, 2019, it will return TRUE for the dates March 1, 2018 through February 28, 2019. For dates before March 1, 2018, and after February 28, 2019, it returns FALSE. It will do the for the entire month of March. On April 1, the LTM range becomes April 2018 - March 2019.

I could have used the AND() function instead of the double ampersand, but I use the double ampersand as I can use multiple conditions, like condition1 && condition2 && condition3, whereas AND() is limited to two conditions. By getting in the habit of using &&, I never have to remove an AND() function and redo the syntax. Side note: Use double pipes to allow multiple conditions for OR logic. Condition1 || condition2 || condition3, as OR() is also restricted to two conditions.

Now I needed to calculate sales for LTM. I already had the [Total Sales] measure below:

Total Sales = 
CALCULATE(
    SUM(Sales[Sales]),
    Sales[Type]="ACT"
) 

The measure for [Sales LTM] then is:

Sales LTM = 
CALCULATE(
    [Total Sales],
    CALCULATETABLE(
        'Calendar',
        FILTER(
            'Calendar',
            [LTM Dates] = TRUE()
        )
    )
)

You could combine my first measure with the second measure, replacing [LTM Dates] with the full measure, after tweaking the date logic in the FILTER() section a bit in this [Sales LTM2] measure.

Sales LTM2 =
VAR EndDate =
    EOMONTH ( TODAY (), -1 )
VAR StartDate =
    EDATE ( EOMONTH ( TODAY (), -1 ), -12 ) + 1
RETURN
    CALCULATE (
        [Total Sales],
        CALCULATETABLE (
            'Calendar',
            FILTER (
                'Calendar',
                'Calendar'[Date] >= StartDate
                    && 'Calendar'[Date] <= EndDate
            )
        )
    )

However, this measure is both a bit more complex, and if you wanted to have other LTM measures, such as units sold, or cost of goods over the last year, dollars purchased, etc., you’d have to repeat the date logic in each measure. If you wanted to change the LTM logic, say switch from previous 12 completed calendar months to last 365 days, or last 12 calendar months but starting with this month, you’d have to edit every measure calculating the date range. By breaking it into two parts as I’ve done above, I can edit just the [LTM Dates] measure and all other measures that use it will automatically recalculate accordingly.

Also note that unlike Marco’s solution, my date measure will not behave as a calculated column.

  1. You could not use it in an iterator function such as SUMX(), AVERAGEX(), and so on, as iterators use row context, and measures generally do not have row context. Well, iterator measures do, but they have to have row context to start with. They cannot create it out of thin air.

  2. You also cannot use measures in slicers or filters in your report. For those, you must use either a calculated column, or bring the column in through Power Query.

  3. You cannot use it as the date column in a date intelligence function, because it isn’t a column.

Calculated columns, and better yet imported columns via Power Query, can be a better choice for the above secenario, but that is not always an option if your source data is from SSAS or a Power BI Dataset where adding columns isn’t permitted.


Quickly Pad Columns in Power Query with Text

When working with data in Power Query, you often will need to pad either the beginning or ending of a column with specific text, especially if the final result will be data you export to a CSV file for uploading into a different system. Common examples include:

  • Some sort of “number” that needs to be padded with leading zero’s.

  • Labels that need to be padded with trailing spaces to ensure the cell contents are a specific length.

This can even be useful in Power BI if a report is designed to be exported by the end user to upload or otherwise use for input into a different system.

In Excel, you can do this in a worksheet cell with a formula. This will create a string 15 characters wide with trailing periods, assuming the text you want to convert to a 15 character string is in cell A1.

=LEFT(A1&REPT(".",15),15)

You can do the same in Power Query with the following formula, but this is not the easiest way.

Text.Start([Labels] & Text.Repeat(".", 15), 15)

The Text.Start() function is equivalent to Left() in Excel, and Text.Repeat() corresponds to REPT().

But there is a faster way to do this and the Power Query function name tells you exactly what it is doing.

=Text.PadEnd([Labels],15,".")

Text.PadEnd() will cause the text value, [Labels] in this example, to be padded until it is 15 characters long. I used a period so you could see the results, though you’ll most often use a space. You can see the results are the same for Excel and Power Query.

20190224 - TextPad Start.png

I changed the font in the Power Query window to monospace so you could see the text is the same length with both methods. You can change to monospace on the View tab of Power Query.

Text.PadEnd() has 3 arguments:

  1. The text you are manipulating. This can be a single filed, or a formula that generates text.

  2. The length you want your field to be. Note that if the text is longer than the padded length, Text.PadEnd() will not truncate your data to your desired length. It will return the full length of your text, but with no padding as the string is already longer than the length you set.

  3. The text to pad with, which is optional. If you leave this off, it will use spaces, ASCII code 32. If you use two or more characters, it will result in an error. It must be a single character.

You can use Text.PadStart() to pad at the beginning. It works exactly the same way as its sister function Text.PadEnd().

=Text.PadStart([Labels],15,"0")

As you can see, you only need to change from Text.PadEnd() to Text.PadStart(). Using the Excel method, you not only have to switch from the LEFT() to RIGHT() function, but change the order of the concatenation of the text in the formula, making sure not to mess up the Text.Repeat() function. The Excel method would be:

=Text.End(Text.Repeat("0", 15) & [Labels], 15)
20190224 - TextPad Start.png

In this way, Power Query is very much like Excel in that there is usually more than one way to do the same thing. I think using the Text.Pad* functions makes your code more readable, and easier to edit without having to nest multiple functions.

Intellisense in Power BI's Power Query Formula Bar

Late in 2018, the Power BI team added Intellisense to the Advanced Editor for queries in Power Query, only in Power BI. This has not yet come to Power Query in Excel.

In the February 2019 version of Power BI, that Intellisense now comes to the formula bar.

20190214 - Intellisense In M Language.png

To see this you have to enable two things:

  1. Turn on the Formula Bar in the Layout section of the View ribbon in Power Query. That should always be on by the way. Working without this is like disabling the formula bar in Excel so you cannot see the cell contents.

  2. In Power BI, go to File|Options & Settings|Options, and scroll down to the Preview Features. Check the “M Intellisense” checkbox. You’ll need to restart Power BI for this to take effect.

It still doesn’t have Intellisense in the Custom Column formula box, but we are getting closer. I suspect this feature came as a result of the recently announced “Improved Python & R Script Editor” in Power BI this month.