Always Use Explicit Measures - Wait, What Is An Explicit Measure?

All of your analytics in Power BI are done with measures, and there are two ways to create measures. It is important to understand the difference between the two methods. If you drag a field from the data pane directly into a visual, that is an implicit measure. The model usually wraps it with SUM(), but other aggregations are possible, like COUNT() or AVERAGE().

Creating an implicit measure

In the above image, I am dragging the Profit field to the table, and Power BI is using the SUM() aggregation. The query Power BI uses to actually generate this table visual is below:

Implict measure in the DAX query

You can see it wrote SUM(‘financials’[Profit]) and wrapped it with CALCULATE(). That CALCULATE() is important later. Side note: If you’ve ever heard someone say they don’t use CALCULATE() in their reports, that is just nonsense. Whether you use the CALCULATE() function in a measure or not, Power BI uses it all of the time.

An explicit measure is when you explicitly write a measure using the DAX functions you need. For example, Total Profit = SUM(‘financials’[Profit]) is an explicit measure. Once created, I would then drag [Total Profit] to my visuals. The DAX query when using an explicit measure might look like this:

Explicit measure in a DAX Query

Here it is using the [Total Profit] measure. Notice there is no CALCULATE() function in the query. That is because when you refer to a measure, there is an implicit CALCULATE() already. It is as if the above said CALCULATE(‘Financials’[Total Profit]). Got it?


  • When you use an explicit measure, the DAX query for your visual will have an implicit CALCULATE().

  • When you use an implicit measure, the DAX query for your visual will have an explicit CALCULATE().


Don’t worry. That is as complex as this article gets.😉 It just helps to understand what goes on under the hood.

Note that Power BI always puts the table names in front of measures within the queries it generates. This is a bad practice for you and me. Never do this when writing measures, calculated columns, or calculated tables. The rule is:

  • When referencing a measure, always exclude the table name.

  • When referencing a column, always include the table name.

What Power BI does when building its own queries is its business. You’ll need to understand how to read the code, and you can see in the query above it appears it is referencing a Total Profit column, but it isn’t. Ok, enough of that tangent.

So, what difference does it make whether or not you use implicit or explicit measures? Here are the differences:

Advantages of implicit measures:

  1. Easier when adding a field to a visual.

Disadvantages of implicit measures: (not comprehensive)

  1. You cannot edit the measure in order to change the calculation. Say you had dragged in the ‘financials’[sales] field into a measure and it created the implicit SUM(‘financials’[sales]) measure, you cannot change it, other than the aggregatio type (COUNT, AVERAGE, etc.) If you need sales to be the net sales calculation by subtracting cash discounts, you cannot. You now have to write an explicit measure that might look like Net Sales = SUM(‘financials’[sales]) - SUM(‘financials’[cash discounts). Now you will need to remove your implicit sales measure from any visual that was using it and replace it with the new business logic. That is a lot more work than just editing an explicit measure had you created one in the first place. Be kind to your future self.

    1. This particular problem magnifies itself if you have added conditional formatting to that field. Once you swap out the implicit measure with an explicit measure, you have to redo all of the conditional formats!

  2. You cannot build off of it. The beauty of measures is you can create new measures that use other measures, and if you change the original measure, the dependent measures will automatically take in that new logic.

  3. Implicit measures do not work with Calculation Groups.

The Calculation Group thing is HUGE. What is the problem here?

I’m not going to do a deep dive into Calculation Groups. Others have already done a fantastic job here, so go read their blogs.

Consider the following visual. It has two measures.

The Quantity column is an implicit measure. I just dragged the Sales[Units] field into the visual and called it Quantity. The Total Units column is the explicit measure Total Units = SUM(Sales[Units]).

Now, I want to have another set of columns that will show me the units in the same period last year. I could create a new measure for this, or I could use a Calculation Group. The latter is more versatile, so I’ll do that. Using Tabular Editor I created a Time Intelligence Calculation Group with Current Period and Prior Year Calculation Items. Here is the full DAX for how that will work. Again, this isn’t an article to teach you how to create Calculation Groups. I’m focused on the downsides of implicit measures.

Logic in the Calculation Group

When I drop that Calculation Group to the Column section of the matrix, I get this visual:

Do you see the problem? The Current Period columns are fine. But look at the Prior Year columns. The Quantity column is showing the same as the Current Period. That is wrong. But the Total Units measure correctly shows the units from the prior year. Why is that?

It is the SELECTEDMEASURE() function in the Calculation Items shown above. That is how Calculation Groups work their magic. They have but one flaw. They cannot see implicit measures, so they ignore them. Power BI protects you from this though. When you create a Calculation Group, the model flips a switch that blocks the creation of additional implicit measures. Notice the DiscourageImplicitMeasure property. By default this is set to FALSE, but when you add a Calculation Group to a model, it gets set to TRUE.

Discourage Implict measures enabled

Now if you drag a field to a visual expecting a quick calculation, you won’t get one. It will either just show the values in that field, or block you from adding it to a visual outright. The property is called “Discourage Implicit Measures” and not “Block Implicit Measures.” The reason is the tabular model cannot foresee all possible third party clients working with the model, but for Power BI Desktop or writing a report in the service, it is a block.

So we’ve seen implicit measures are fast and easy to create, but have a number of shortcomings. I know it takes an extra 30-45 seconds to create an explicit measure, but trust me, creating explicit measures is a best practice. It will save you additional work in the future, and if you introduce Calculation Groups to your model, will help ensure they are returning the expected results.

Tips for Debugging your DAX Code

When trying to get your DAX measures to work correctly, there are a number of tools you can use to debug, which I will briefly mention, but not go into depth on. Instead, this post is about how to think about debugging your code. I am focusing on DAX here, but many of the methods would apply to any type of code.

First, the tools you can use. This is not a comprehensive list, just something to get you started.

  1. Use Variables. Variables have several benefits:

    1. Variables are processed one time and can be reused later in the measure in multiple places. This can speed up your code.

    2. You can return a variable result to the visual itself. This allows you to see what each component is doing to ensure each part is calculating correctly.

  2. CONCATENATEX - If a variable returns a table, you cannot return that variable directly to the visual. A measure requires a scalar (single) value. CONCATENATEX will convert a table to a scalar value so you can see everything in that table. You can also use the newer TOCSV or TOJSON functions, which do similar work different ways. TOCSV and TOJSON are the underlying functions that help EVALUATEANDLOG return results, and aren’t really meant to be used in a production model, but they are handy in debugging.

  3. EVALUATEANDLOG - This is a relatively new function that will generate a ton of output for each expression you wrap with this function. There is a massive 4 part series on EVALUATEANDLOG here, including links to download the free DAXDebugOutput tool that will be necessary to read what this function returns. Note: Be sure to remove EVALUATEANDLOG once you are done. It will slow your code down. You should never publish a report that has this function in it.

  4. Tabular Editor 3’s Debugger - This is simply the luxury vehicle for debugging DAX. More about this tool is here.

Instead of an in-depth look at these tools though, I want to look at the thought process when debugging and offer some tips.

First, when debugging DAX, be sure to do it in a Table or Matrix visual. You need to be able to see what numbers are being returned for each value in the columns and rows. This is impossible to do efficiently in a bar or line chart. To quickly convert your visual to a table or matrix, make a copy of it in Power BI, then while it is selected, select the Table or Matrix visual icon in the Visualization Pane. You might need to move fields around to get the correct fields in the Rows, Columns, and Values section.

Next, make a note of any filters you have set in the Filter Pane, then clear or remove them. You need to understand what the core calculation your measure is doing before any filters are applied. This may not be the result you want, but you have to start with the unfiltered result. For example, you might only be interested in getting an average 12 months of sales for Texas, but if the [Average 12 Months of Sales] measure is wrong at the global level, having your report filtered for United States is only muddying the issue. Make sure that figure is correct globally first. Then you can apply filters. If you apply the United States filter and Texas is still wrong, it is possible your measure is fine, but you have a problem in the data. Zip codes are assigned to the wrong state for example. But you will never figure that out if you have no confidence the measure itself is correct. When debugging DAX measures and queries, we take for granted that our data is good. That isn’t always the case, so keep that in mind.

We’ve all built reports where we create 3-4 measures, then create several measures with new logic and include those original measures. and before you know it, you have 40 measures with a complex web of dependencies. And the measure you are debugging references 8 of those measures.

dependency view from Tabular Editor

As you can see above, a single measure can have many, perhaps dozens, of fields and measures it is dependent upon. When this happens, I generally tear down the measure to the component parts, even replicating code from other measures into the one I am debugging. Often I find some code in one of those dependencies that works great for that particular measure in a specific chart, but it doesn’t work well when used inside of your measure. Context Transition can be invoked at the wrong granularity (subtotal or total level) and return undesirable results, or it has a particular filter applied or removed using CALCULATE. Start simple, and rebuild it. Once you get it working, you can replace the duplicated code with the measure it came from. Sometimes though you might find your new code runs much faster. Iterators like SUMX and AVERAGEX often benefit from streamlined code vs using measures. As already noted, using measures in iterators can produce undesirable results at the total level. Look at these two examples:

Average Sales per KG = AVERAGEX(Sales, Sales[Units] * Sales[Price] / RELATED(Products[KG]))

Average Sales per KG = AVERAGEX(Sales, [Total Sales] / [Total KG])

If you put both of those measures in a table and it is by invoice number, they will both produce the correct results at the detail level. However, the second example that uses measures will not produce correct results at any sub-total or grand total level. It will total the sales for the entire table then divide that by the total of the kilograms. A useless result. (Ignore I am using the / symbol vs DIVIDE(). I am going for clarity here, not best practice.)

So here, reusing measures actually cause a problem. That is not often the case. I am a huge fan of reusing measures, but it isn’t always the answer, and can make debugging more challenging. Never assume your dependent measures are correct, or at least correct for your current needs.

Next, take a step back and try to understand the question being asked. Have the question phrased in plain business language. For example, here are two ways you can be presented with a question from a user for the report:

  1. I need to get the total sales where the Relative Date is 0 through -7.

  2. I need to get the total sales from today going back 6 days.

Those are not even the same question, and there are several ways to interpret #2.

  1. You’ll notice in #1 that is a range of 8 days. So if you are getting that type of request from a user or co-worker, you might not think to push back, Instead you may give them a simple measure with a filter predicate like ‘Date’[Relative Date] <=0 && ‘Date’[Relative Date] >= -8. That filter is correct, but it likely answering the question in the wrong way.

  2. When the person says “today going back 6 days” what does “today” mean? Today’s calendar date, or the latest date in the model, which might be a few days ago. Using TODAY() in a measure and then walking back 6 days will give the most recent 7 days, but the last 2-3 days might have no values if there are weekends or holidays involved. That might be correct. It might not be.

The point is, understand in plain English what your user is needing in the visual. Then you can start to craft the code to return that value. But if they start talking all DAXy to you, you might be debugging the wrong logic.

Do it in Excel first. Or have the user do it in Excel. Sometimes pointing directly to cells will allow you to understand the logic. You can rarely convert Excel formulas to DAX, but once you understand the logic of what is happening, writing the DAX becomes much easier. And you will also have the benefit of the correct answer in Excel you are trying to match.

You can also write it out on paper, or better yet, if you are on site, use the biggest whiteboard you can find and draw the logic out there.

Sometimes you just need to a break. Go for a walk. I have solved tons of problems a few miles from home on my morning walks. Staring at the same code can reenforce that bad code in your head and you cannot see a way around it. Stepping away removes that barrier.

Ask for help. I cannot stress this enough. Innumerable times I’ve asked for help and in just talking through the problem, I find a new path myself without the other person haven spoken a word. Some people call this “talking to the rubber duck.” Put a rubber duck on your desk and when you get stuck, tell the duck what the issue is. Just talking out loud you have to go much slower than thinking through it, and that alone helps.

Of course, asking someone can reveal the answer or a path forward. Ask a co-worker. Many organizations have Power BI or data analytics groups in Teams, Yammer, Slack, or other collaboration tool. Don’t be afraid to jump in with the question or ask for a meeting where you can do a screen share. I have helped others that had super basic questions a new person to Power BI would ask, and I’ve been on 2-3 hour screen shares where I and a co-worker untangled a gnarly problem. Sometimes it was my problem, sometimes theirs.

There are also the public forums, like the Power BI Community. It is especially important in those communities to give adequate sample data in a usable format - CSV or XLSX, and a mock up of the desired results. Don’t take shortcuts with your question. The people you are talking to know nothing about your scenario, so where you can take shortcuts with your co-workers that know the data and the company, forum members don’t. So carefully explain the issue.

One of the latest fads is to ask ChatGPT. That is ok as long as you don’t trust the code for the correct answer. I’ve found ChatGPT doesn’t always return code that even works, let alone code that returns an accurate result. However, it can help directionally, and suggest functions, or different ways to shape and filter the data you might not have thought of.

Regardless of where you might get an answer, verify it is correct. It may return a number, but verify it is the correct number.

Hopefully the above tips will help you rethink how you approach a problem you are stuck on. Of course, learn the debugging tools and techniques mentioned above. The best time to learn those tools is when you don’t need them. Have a complex measure that already works? Awesome. Throw a table visual with that measure into Tabular Editor 3 and see how the debugger works, or wrap some of the variables in EVALUATEANDLOG and see what the DAXDebugOutput tool generates. Then when you do have a problem the output from these tools will be familiar to you.

Why You Should Avoid Calculated Columns in Power BI

First off I want to make sure it is clear that I am not saying “never use calculated columns.” I am not. What I am saying is they should be avoided, and there are several reasons. I am going to focus on the model size, and thus performance. Note that the data I am going to show is a contrived set of data for demonstration purposes only to explain.

Power BI, which includes Power Pivot models and SSAS Tabular models, compresses the data in tables as it is loaded to the service. It never uncompresses that data. All calculations are done on the compressed data. If you want to know how this works, I suggest you read Chapter 17 in The Definitive Guide to DAX, Second Edition. It does a thorough job of explaining how the VertiPaq engine works.

Here is the scenario. I have a 200M record table with three columns:

  1. A list of numbers from 0 to 100,000,000 in 0.5 increments. This has 200M unique values.

  2. A list of numbers that rounds those numbers to whole numbers. It has 100M unique numbers. 0.5 and 1.0 from the first column are 1 in the second, 1.5 and 2.0 are 2 in the second, etc.

  3. A third column is a text value of “Even” or “Odd” that is based on the second column. This field only has 2 values.

The column to pay attention to is the [Even or Odd] column. I created this column two different ways. First in Power Query and then as a Calculated Column in DAX. This is important to understand.

All data in the model is compressed, but only data that is imported is taken into account when creating the optimal sorting algorithm to determine the best possible compression. So when I loaded the table with [Even or Odd] as an imported column, the VertiPaq engine considered that column when compressing the data. When it created it as a Calculated Column, it was compressed, but only by taking the [Index] and [Integer] columns into account for the best possible compression. Compression optimization happens at import and load. It does not get re-optimized and re-compressed when you add new columns.

Using the VertiPaq Analyzer from DAX Studio, we can see how much data each of these columns consumes.

Focus on the Total Size column. There are only 2 values in the [Even or Odd] column - “Even” or “Odd.” Note: The scenario gets a bit more complex here as these text values create a dictionary, which is a further optimization I am not going to get into here because the effect is the same on both tables - again, see the Definitive Guide to DAX for more on this.

On the bottom table, called “Imported”, the [Even or Odd] column consumes roughly 18KB. Remember, there are 200M records here. However, the VertiPaq engine looked at this column and realized it had a low cardinality, only 2, and sorted by that column first. This means theoritically it could represent this by just two values - “Even” for the first 100M records, and “Odd” for the second 100M records, and that is, in effect, what it did. At 18KB that is certainly more than 2 text values, but again, this gets deeper in to how VeritiPaq splits the data, and this 200M record table is broken up into 191 segments in Power BI. Again, read the book.

Now look at the top table - Calculated Column. In this table, the Vertipaq engine only looked at the [Index] and [Integer] columns when optimizing for compression, and there isn’t a lot of compression going on here. The [Index] isn’t compressed at all. There are 200M unique values. The [Integer] column can be compressed, but not by much, as there are still 100M unique values, which is why it is not quite half the size of the [Index] column - roughly 0.86GB vs 1.49GB.

But the [Even or Odd] column had to resign itself to be compressed at whatever compression algorithm was used by VertiPaq for the imported columns. As a result, even with 2 values, it is almost 25MB in size, or 1,338 times larger than the imported version!

What does that mean in practical terms? As one example, if you do any calculations based on the [Even or Odd] column, the computer hosting the model has to use 25MB to hold the Calculated Column version vs 18KB for the imported version. The speed of calculations will be faster on the smaller set of data. Again, this specific example is contrived, so the speed of calculations for 18kb vs 25MB will not be very large, or perhaps even noticeable.

Note too that when processing a model, the entire model is loaded in RAM. This is very different to how relational databases like SQL Server or Oracle work. They only load what they need from disk to do the necessary calculations. VertiPaq doesn’t work that way. It loads it all.

Now consider what happens if you add 10 more columns to each of these tables. Doing it using DAX will greatly increase the size of the table vs doing it in Power Query or in your data source.

And that is the point. Calculated Columns will increase the model size, which in turn increases the amount of RAM needed to hold the model, and the time it takes to run calculations on it.

There are other reasons to transform your data in Power Query or further upstream. In this example, I could have removed the “Integer” column from the table before loading keeping only two columns. You cannot do that with DAX. You can add to the model, you cannot take away without destroying data and any Calculated Columns that depend on it. This is why I avoid at all costs any LEFT/RIGHT/MID logic in a Calculated Column. I get what I need in Power Query, then remove the source column. Saves space, refreshes are faster, and calculations are faster.

So when should you use a Calculated Column? Here are the questions I ask myself:

  1. Am I transforming data or adding data based on other columns? Do this in Power Query or earlier. This is often possible. Exceptions are when the DAX is looking at multiple groupings of data via a SUMMARIZE() or data in other tables. Power Query is inefficient here, but it may still be possible in an earlier source like SQL Server. This is especially important if you don’t need the original data you are transforming. You can delete unnecessary columns in Power Query when loading data. You cannot delete columns in the model without breaking Calculated Columns that rely on that data.

  2. Does this need to be in a slicer or filter? This has to be a column - it cannot be a measure. But if it can be done in PQ or earlier, do it there.

  3. Is the formula really complex and won’t work in a measure? This usually isn’t a good reason. I’ve yet to see a DAX formula in a Calculated Column that could not be reworked to be a measure. Calculated Columns have row context and not filter context. Measures are the opposite. They have filter context and not row context. This is where it becomes necessary to really understand evaluation context, how to create row context in a measure using iterators, and when to invoke context transition via CALCULATE. It can be done, it just requires some work. Reach out for help in the Power BI community or talk to co-workers.

  4. If the formula in a measure is inefficient, then a Calculated Column helper column may be necessary. I had a particularly gnarly measure that used PERCENTILE.INC() three times over three groupings of data with SUMMARIZE(). The measure worked, but was horribly slow. PERCENTILE.INC() required the formula engine, and that was simply inefficient over 20M records as users filtered data. It was also impractical to attempt to do this in Power Query or the source systems, some of which were just CSV files. So I let a Calculated Column do the heavy lifting by moving all of the PERCENTILE.INC logic there, reducing the visual rendering from over 8,000ms to under 500ms when done. But it was a last resort, not the first thing I reached for.

  5. Am I just developing right now? I use Calculated Columns every day during development. Super easy to analyze data and start the logic of the DAX code. But I make it a point to convert to imported columns to measures and remove the Calculated Columns from the code before I even push a model to the Test environment.

Remember, Calculated Columns aren’t inherently bad. They are there for a reason. But they aren’t meant to be used like you’d use a column in Excel. Most of the time, I find they are not needed and have adverse impacts on the model, but there are definitely use cases for them and some of my models have them for very particular reasons. The key as a good Power BI developer is knowing when to use them and when to avoid them.

Enhancing Power BI Apps with Emoji

Your report page names, and in turn, the Power BI app can be enhanced with the judicious use of emoji. I was surprised to find out that the characters came through in full color, and that can help your users find the important pages faster. This can be especially useful in a large Power BI app with dozens of reports and potentially hundreds of pages.

Adding emoji is relatively straight-forward in Windows 10 and 11. Below are the steps for Windows 11.

  1. Double-click on the report page name in Power BI Desktop and position your cursor at the end (or beginning if you prefer).

  2. Press and hold the Windows key, 🪟 then the period key and the emoji picker comes up.

  3. In the search box, type the name or a description of the emoji you are looking for.

  4. Double-click it to add to the tab.

  5. Press enter to save it.

Windows 10 is slightly different. When you launch the emoji picker, there will be a magnifying glass. 🔎Click that, then type the word you are searching for in the report page tab itself. It doesn’t have its own search box. But once you type “star” for example, when you double-click the star, your typed text will be replaced by the emoji.

Note that once you create the app, you can also add emoji to the report names themselves in the “Content” section of the app creation.

A published app might look like this:

This will help draw users’ attention to the most important pages or reports in the app, but as I said at the beginning, you want to be judicious about this. Less is more here. If every page has an emoji, then it is just clutter and harder to read, and then the emoji to express the look of the app would be 💩.

Additionally, do not carry this into the data model itself. I’ve seen people add emoji to table and field names, and that is a train wreck. It makes it harder to use intellisense when typing table/field names, and emoji may not carry their beautiful look through to the DAX editor or M editor, nor through to external tools like Analyze in Excel, DAX Studio, or Tabular Editor. Sometimes the image remains but is just black and white, or worse, you just get a generic ⏹️ icon.

The one exception to this is when I am in Power Query, I will often put an emoji in a step name to remind me to go back and fix that step later. I’d never push that nonsense to production though. 😉

So, spruce up your Power BI Apps, but just a tiny bit!

Working With Multiple Windows in Tabular Editor 3

If you have started using Tabular Editor 3, and I highly recommend you do if you do any serious development work with Power BI/Tabular models, you should spend some time working with the tabs and making the most of the screen real estate.

This doesn’t really deserve a full blog post, but the animated GIF I created to demonstrate what I am talking about is too big for Twitter, so here is a small post explaining it.

Below there are 3 tabs: Expression Editor, a DAX Query, and a Diagram. I was working on a laptop where screen real estate was precious so wanted to toggle between the diagram and the DAX query using the Windows ALT-TAB shortcut.

Simply drag the Diagram tab out until it is free floating. Now it is part of the Windows ALT-TAB process. When done, drag it back into Tabular Editor until it is in the middle of that tiny control the red arrow is pointing to.

Of course, if you have multiple monitors or one of those crazy 49” monitors, you could drag tabs out to different areas to look at at the same time.

Working With Sparklines In Power BI - All About The Filter Context

With the December 2021 version of Power BI Desktop, as well as authoring reports in the Power BI Service, we can now add Sparklines to our Table and Matrix visuals. With just a few clicks you can make visuals that look like this:

This isn’t a tutorial on how to create sparklines as Microsoft has you covered there. What I want to discuss though is how Sparklines are impacted by the filter context in a visual. First though you need to enable this feature. It is in preview mode at launch, so go to File, Options, Options and Settings, and look for the Preview section. Click Sparklines, then close and restart Power BI Desktop.

In the visual above, I added the “Month” field to the Sparkline settings and that created an additional filter on the visual. Before I added the Sparkline, Year was the only filter affecting the numbers. By adding the month to the sparkline, shown below, it breaks out the Total Net Sales figure by month.

But what if I don’t want the sparkline to be the full range of data. For example, I just want the sparkline to show the last 3 months of the year, so 3 data points, not 12. As a measure, this is simple.

Sales Last 3 Months =
CALCULATE(
    [Total Net Sales],
    'Date'[Month] IN { 101112 }
)

As you can see, this returns the correct value as a value in the matrix, but the sparkline created for it using months is flat. Each of the data points are the same, and there are 12 of them, not 3. What is going on?.

This is context transition. When you alter a measure within Calculate, it does several things. This article isn’t a deep dive into context transition, but the key issue here is when I told it in the measure above to only show me when the ‘Date’[Month] column is within the values between 10 and 12, it replaced the existing filters coming from the Date table, which for months would be the values 1-12. You can see that here if I add the Month field to the table.

When only the Year is showing in the visual, you don’t see this. But the adding the Months field makes it clear what the model is doing. When I added the Month field to the sparkline setting, the exact same thing happened. Now, if you are pretty good at DAX, you’ve already figured out how to fix this. What we need to do is restore the filters on the Sales Last 3 Months measure so the total remains $2.7M, but the individual monthly amounts for October - December are retained. There are several ways to do this. First I’ll do it by restoring the values for month within the measure, and the VALUES() function is perfect for this.

Sales Last 3 Months =
CALCULATE(
    [Total Net Sales],
    'Date'[Month] IN { 101112 },
    VALUES( 'Date'[Month] )
)

Now for each row in the matrix, it looks at the month. For January, there is a 1. VALUES(‘Date’[Month]) returns 1, but the predicate ‘Date’[Month] IN {10, 11, 12}) is returning only 10 - 12. Remember that filters in DAX are tables, so it is a mini-table of 10, 11, and 12. And multiple filters in CALCULATE are combined using AND logic. So the table of 10, 11, 12, AND the table of 1, from the VALUES(‘Date’[Month]) function, returns nothing, because they don’t intersect at all.

This process repeats, and gives us blanks for months 1-9. But in October, VALUES(‘Date’[Month]) returns 10. That with AND logic for 10, 11, and 12 returns 10. So now it will show October sales. The same for months 11 and 12. We get this:

Now we have the correct value for months 10-12, and nothing for months 1-9. The total is still $2.7M, so it is working. Let’s remove the Months field from the visual and take a look at the sparklines now.

Bingo. That is exactly what we were looking for. Using VALUES is the easiest way for me to visualize what is going on in the model, but there is a better way, There is another function in DAX called KEEPFILTERS that is most often used within CALCULATE. Change the measure to this:

Sales Last 3 Months =
CALCULATE(
    [Total Net Sales],
    KEEPFILTERS( 'Date'[Month] IN { 101112 } )
)

KEEPFILTERS() will keep CALCULATE() from overwriting the filter for the ‘Date’[Month] column, and, well, keep the filters from that column, which is being supplied by the sparkline settings.

Sparklines has been a long requested feature, and I’m glad to see it has finally arrived. If you see unexpected results though, you have to step back and think about the filter that the sparkline added, and see if you have a CALCULATE in that field impacting how the filters work. KEEPFILTERS() is likely the way to fix it.

Add an Animated GIF to Your Power BI Reports

It is easy to add an animated GIF to your Power BI Reports. However, if you just add it as an image, it won’t animate. You’ll just get a static image.

Animated GIFs can be useful in reports to explain new features to users, or on hidden developer pages showing how certain things were done.

To add a GIF so it animates, do the following:

  1. Add a blank button.

  2. Make it huge.

  3. Select the Fill section for the button and turn it on.

  4. Select Add Image, and select the image from your computer.

  5. Set the Transparency level to 0%.

  6. Change Image Fit to “Fit” - this will allow you to resize the button and scale the image. Leaving it at “Normal” will require the button to be the size of the image, or it will truncate or use extra space on the page.

  7. Resize the button as desired to scale the image appropriately.

Then publish.

I have an example report here that shows this in use that shows the below animation in a report.

I use ScreenToGif to create my animations. Note: I had already posted this on Twitter, and wanted to add it to my blog so it would be more easily searchable. Since then I’ve also found out you can add animated GIF images to page backgrounds which are ideal for tool tip pages.

Be Careful When Filtering for Blanks in DAX

It is often the case in Power BI you need to filter your data to include or exclude rows where there are blanks. Be careful though, because you might get unexpected results. Consider this table:

Blanks0.png

It has 11 records, 2 of which are blank, and 3 that have $0 sales. Why would I have both blanks and 0’s in the same table? It depends on the data. Maybe blanks represent orders that are currently not finished, or were improperly filed out, whereas $0 represents orders where a customer ordered and item, but we were out of stock, so sales are $0 and we want that recorded. Let’s assume the data is correct, but we want to report on it in different ways.

Here is how you might try count the number of records that are not complete - the blanks.

COUNTROWS( FILTER( 'Sales', 'Sales'[Sales] = BLANK() ) )`

However, this may return different results than expected.

Blanks1.png

Instead, it returns 5. Why? If you want a deep dive into how Power BI handles blanks, see this excellent SQLBI article. The quick answer though is, when DAX evaluates BLANK() like this, it considers it to be zero. So it is combining the two blank and 3 zero value records, thus 5 records.

The proper way to filter for blanks in this case is to use the ISBLANK() function.

Blanks2.png

Conversely, if you want to exclude only blanks, you wouldn’t write <>BLANK() in your filter, but you’d use the NOT operator.

Blanks3.png

If you run across a model where someone has used =BLANK() or <>BLANK(), be it in a measure or calculated column, be sure those results are correct. It is possible they knew it would also include or exclude zero values, but more than likely, they were unaware of this. When I use =BLANK() in my measures knowing it will include zero values, I put a comment in the measure explaining this is the expected result, mainly so future me doesn’t see that and think past me messed up somehow. 😀

You can download a copy of the PBIX file I used here.

Quickly Format All DAX Code in Tabular Editor

If you aren’t using Tabular Editor to format all of your DAX code for you, you are missing out. I’ve written a super short script that will format all of your DAX code in measures, and make it look picture perfect in Power BI Desktop.

First, you need to download Tabular Editor. The free version is here. There is a commercial version, which I highly recommend, available here. If you don’t have Tabular Editor yet, start with the free version for now. This is the code you need:

/* Cycle over all measures in model format using DAX Formatter with Short Lines, then add 1 line feed to the start of the measure */ FormatDax(Model.AllMeasures, true); foreach (var m in Model.AllMeasures) { m.Expression = '\n' + m.Expression; }

In Tabular Editor 2.x, paste that code in the advanced scripting window as shown. Tabular Editor 3 users just click on the New Script button on the ribbon and paste the code there. For both, press the green Run button.

TabularEditor2.png

This will do 2 things.

The FormatDax() function will take all of your DAX code for measures only, and send it to DAXFormatter.com for formatting, then get the results. It does this in one big batch to lessen the load on the DAX Formatter servers, vs hitting it 100 times if you have 100 measures. The ‘true’ parameter will use the Short line format. If you like long lines, just get rid of the “, true” part. But really, who likes long lines? 😉 In both versions it will tell you how many changes were made to the model in the status bar at the bottom.

The ‘foreach’ loop will then go through all of the measures and add a line feed. An example of the result as seen in Power BI Desktop is below.

FormatDAXTabularEditor1.png

Without the line feed added, it will start the measure on the top row, and it looks like this:

FormatDAXTabularEditor2.png

It still works, it just doesn’t look as nice. The line feed forces the measure to start on the row under the measure name.

Now you need to save these changes back to Power BI Desktop, or in the case of TE/3, to a dataset via an XMLA endpoint in a premium workspace, or SSAS/AAS models depending on your license. Just click the icon that has the floppy disk on top of a cube. (It is 2021. Will we ever get rid of the floppy disk icon for save operations?)

FormatDAXTabularEditor4.png

To save this script in Tabular Editor 2, either keep a copy of the text somewhere like Evernote or OneNote, or save it to a file. Tabular Editor 3 users can save it as a macro so it is always available to you. This is one of the productivity enhancements of TE/3.

FormatDAXTabularEditor3.png

You’ll notice I said it does this for measures. It will not format DAX in Calculated Columns nor in Calculated Tables. That is currently a limitation of what parts of the model Tabular Editor is able to change.

Now you have no excuse for having poorly formatted measures.

This is just one of the many features of Tabular Editor, but it makes it worth getting even if you are new to Power BI. Formatted DAX code is so much easier to read, understand, and edit, and now it is a simple button press away.

Working With Multiple Row Headers From Excel in Power Query

It is fairly common for users to format Excel reports with headers that are comprised of two or more rows for the header, rather than using a single cell with word wrap on. I’ve seen text files with similar issues as well. Consider the following example:

ExcelMultiRowHeader1.png

Getting this info into Power Query can sometimes be a challenge. I’m going to show you two ways to do it. The first way will be to do it manually mostly using the Power Query user interface. The second way will be a custom function that will do all of the work for you. For simplicity’s sake, I’ll use Power Query in Excel for this since my data is in Excel already, but the same logic would hold if you were importing the data into Power Bi. We do not want to consolidate the headers in Excel. You’ll just have to do it manually again the next time you get a file from someone. Remember - never pre-transform your data before you transform it in Power Query.

The Manual Method Using the Power Query Interface

If you are using Excel, just put your cursor on the data somewhere and then the Data ribbon, select “From Sheet” if you are using Office 365. If you have an older version of Excel it may say From Table or Range. Make sure you uncheck “My table has headers” because it will only make the top row the header, and we don’t want that. Once you click OK, your data in Power Query window should look something like this:

Original table

Original table

If your Applied Steps to the right of the table have a Promoted First Row to Header and Change Type step, delete both of them. We just want everything to be in the data area, with columns named Column1, Column2, etc.

This example has two rows that need to be made the header. The process below though can handle any number of header rows. Just change the 2 to whatever it needs to be.

We need to break this table into two pieces - the first two rows, and everything but the first two rows. First, on the Power Query Home ribbon, select the Keep Rows button, Keep Top Rows, then type 2.

ExcelMultiRowHeader3.png

Now go to the Transform ribbon, and select Transpose. Your data should look like this:

So we’ve kept the top two rows, and rotated it 90 degrees so it is on its side. Now we just need to combine those two columns, transpose them again, then we need to go back and get the actual data. Here is how we will do that.

First, select both columns, then on the Transpose ribbon, select Merge Columns, and call it New Header. I chose a space as my separator.

ExcelMultiRowHeader5.png

Now to ensure there are no extra spaces, right-click on the new column and select the Transform, Trim command, and then again on the Transpose ribbon, the Transpose button. Now we have this:

ExcelMultiRowHeader6.png

We are almost there. We just need our data that we got rid of. It is still there in the Source line of our applied steps. This will require a little bit of modification of the M code in the formula bar.

To make this easier in a few minutes, you probably have a step called “Transposed Table1” in your applied steps. Rename that “NewHeader” with no spaces.

Next, right-click on NewHeader and select Insert Step After. In your formula bar it will say =NewHeader. Change that to read =Source. You’ll see we have our entire table back. Now, go to the Home ribbon, and select Remove Rows, Remove Top Rows, and type 2 - the same number of header rows our original Excel table had. Now we should just have our data with no headers at all.

I would rename this step “DataOnly” (again, no spaces) rather than the Removed Top Rows name. Here is where it gets interesting.

Go to the Home Ribbon and select Append Queries. Select the same query you are in. We are going to append this query to itself. It will show (Current) in the table name.

Right now our data is doubled up and we have no headers. Look at the formula bar. It should show this:

=Table.Combine({DataOnly, DataOnly})

Change the first DataOnly parameter to NewHeader. Now it looks more reasonable, and we have one header row at the top.

Now, on the Home ribbon, select Use First Row as Headers and change your data types. This is the final result:

At this point, you would probably get rid of the Total Change column, unpivot the Population columns to get it into a nice Fact table for Power BI’s data model, but that is another article. You can use this method for headers of any number of rows. I’ve seen some ERP systems export data with 2, 3, or 4 header rows. Just change the number of rows you keep or skip above to correspond to that.

The Custom Function Method

Now that you understand the logic, here is a custom function you can use to do all of that work for you. I am not going to walk through the code, but it is commented. I did it totally differently so it would be dynamic depending on how many header rows there were, handling different data types in the column headers, and how many columns were in the original data set. See below on how to put this in your model.

(OriginalTable as table, HeaderRows as number, optional Delimiter as text) => let DelimiterToUse = if Delimiter = null then " " else Delimiter, HeaderRowsOnly = Table.FirstN(OriginalTable, HeaderRows), /* Convert the header rows to a list of lists. Each row is a full list with the number of items in the list being the original number of columns*/ ConvertedToRows = Table.ToRows(OriginalTable), /* Counter used by List.TransformMany to iterate over the lists (row data) in the list. */ ListCounter = {0..(HeaderRows - 1)}, /* for each list (row of headers) iterate through each one and convert everything to text. This can be important for Excel data where it is pulled in from an Excel Table and is kept as the Any data type. You cannot later combine numerical and text data using Text.Combine */ Transformation = List.TransformMany( ListCounter, each {ConvertedToRows{_}}, (Counter, EachList) => List.Transform(EachList, Text.From) ), /* Convert the list of lists (rows) to a list of lists (each column of headers is now in a list - so you'll have however many lists you originally had columns, and each list will have the same number of elements as the number of header rows you give it in the 2nd parameter */ ZipHeaders = List.Zip(Transformation), /* Combine those lists back to a single value. Now there is just a list of the actual column header, combined as one value, using a space, or the chosen delimiter. */ CombineHeaders = List.Transform( ZipHeaders, each Text.Trim(Text.Combine(_, DelimiterToUse)) ), /* Convert this list back to a single column table. */ BackToTable = Table.FromList(CombineHeaders, Splitter.SplitByNothing(), null, null, ExtraValues.Error), /* Transpose this table back to a single row. */ TransposeToRow = Table.Transpose(BackToTable), /* Append the original data from the source table to this. */ NewTable = Table.Combine( { TransposeToRow, Table.Skip(OriginalTable, HeaderRows) } ), /* Promote the new first row to the header. */ PromoteHeaders = Table.PromoteHeaders(NewTable, [PromoteAllScalars=true]) in PromoteHeaders

To add this to Power Query do the following:

  1. Create a new blank query. In Excel’s Power Query it is New Source, Other, Blank Query. In Power BI it is New Source, Blank Query.

  2. Right-click on that new query and select Advanced Editor

  3. Remove all of the code there so it is 100% empty.

  4. Copy the code above - everything from the first row with “(OriginalTable” to the last row that says “PromoteHeaders” - and paste it into the Advanced Editor.

  5. Press Done to save it

  6. Rename it. It is probably called something useless like Query1. I usually use “fn” as a prefix to custom functions, and you want no spaces in it. So fnConsolidateHeaders as an example.

It should look like this in the Power Query window.

Now to use this. It is very simple. You want to start at the point your data in Power Query looks like the image above called “Original Table.” Way at the top. Yes, that is correct - it is immediately after getting your Excel grid of data into Power Query, and should be either the Source step or possibly the Navigation step depending on how your data is coming in. Please note: if you have aChanged Type or Promoted Headers step after the Source row, I recommend you delete them.

Now, right-click on that last step and select Insert Step After. Whatever is showing in the formula bar is the name of the table you need to modify. I will assume it is Source for this. In the formula bar, wrap that with this new function.

Here is the syntax of the function:

fnConsolidateHeaders(Table as table, Number of Header Rows as number, optional Delimiter as text)
  • Table - this is the table name - so Source, or whatever the name of the immediate step above is. Again, right-click that step and select insert Step After and the table name will be there in the formula bar for you.

  • Number of Header Rows - type in an integer of how many header rows there are in the data. 2, 3, whatever.

  • optional Delimiter - this is what you want to be between the header rows when they are concatenated. This is optional. The formula will use a space if you put nothing there, but you can use any character(s) you want. You can even have no delimiter if you type two quotes with nothing in between - ““. This is a text field, so you must enclose your delimiter in quotes.

That’s it. It should return this:

Now you have one final step. Change all of the columns to the correct data types.

Hopefully this will help you when dealing with Excel reports, or even some CSV/Text files, where there are multiple rows used for the header.
Finally - here is the Excel file with the sample data and full code for doing this manually or with the custom function.

Use Data Profiling in Power Query

I often give demos or training in how to use Power BI, and when I am in Power Query, my screen usually looks something like this:

2021-05-08_5-06-11.png

One of the first questions I get asked is “How did you get those bars and graphs across the top?” Those are called data profiling tools and they can be incredibly helpful when you are cleaning your data. Let’s take a look.

By default, you usually just have the thin bar shown below that gives you a little info on your data.

2021-05-08_5-16-47.png

A solid green bar is usually best. It means there are no issues in that column, as shown in the Discounts field. On the far right in the COGS field, there is a tiny bit of gray. That means there is no data, or a null, in one or more records. The bigger the gray bar, the higher percentage of nulls there are. In the middle though we have a problem. Anything in red is an error and must be fixed. These may cause more errors in further transformations downstream, and will usually cause refresh errors.

Before we address this, let’s get a bit more information about our data. Go to the View ribbon, and there are 3 settings we want to look at.

2021-05-08_5-24-07.png
  1. Column Quality - this will give us some additional statistics about the column

  2. Column Distribution - this will give us the bar graph with info on how many distinct and unique values there are

  3. Column Profile - this will open up a large pane that will have a ton of stats about the column, like the min and max values, average value, value distribution, and more.

Let’s look at them one at a time. I’m going to go ahead and check both Column Quality and Column Distribution for now.

2021-05-08_5-32-45.png

For the Discounts column we can see now that the data is 100% valid, meaning no errors, and no empty or null records. There are 515 distinct values, 384 of which are unique. If you are working with a field that will be the 1 side of a 1 to many relationship in the DAX model, you want to make sure your data profile looks like this:

2021-05-08_5-36-32.png

You want the distinct and unique values to be the same, and there can be no empty values. This is a small dataset with only 700 records. When you are working with more data, it is likely it will tell you you have 1,000 distinct and unique values. Data profiling by default only works with the first 1,000 records to avoid a performance hit. More on that later.

For the Sales column with an error, you’ll need to figure out what the error is and fix it. The fastest way to do this is to hover over the thin bar with the red in it and you’ll get this popup:

2021-05-08_5-42-10.png

Click on the ellipses menu and you can take action:

2021-05-08_5-42-29.png

Unless I already know what the issue is, I will almost always select “Keep Errors” and investigate to see if I can safely delete that record. Then I’ll decide if “Remove Errors” is appropriate, or see if I need to fix an issue with the source data. Note that Remove Errors will remove the entire row. You might want to use the “Replace Errors” feature and replace with null or 0. To see exactly what the error is, click to the right of the word “Error” - not on the word itself - in the cell and Power Query will bring up a window at the bottom of your screen that will show you what the error is for that specific cell.

2021-05-08_5-45-27.png

In this case, there was some text in the Sales field. So whether I replace it with 0 or null, remove the row entirely, or fix the source data, it has to be fixed.

As for the tiny gray bar for the COGS column, that means there are some empty, or null, values. Those may or may not be ok in your data. Some fields like “Date Completed” may have a lot of nulls where items aren’t finished, but things like Sales or COGS would seem to indicate an error of some sort if there were missing data there, perhaps indicating someone hadn’t set up the standard cost or sales price in the source system correctly. You have to determine how to proceed here. Power Query is just helping to bring the issue up.

The final checkbox in the View ribbon is the Column Profile. I generally leave this off and turn it on when I need it. Here is it is on, and I’ve selected the Sales column

2021-05-08_5-56-37.png

This gives a tremendous amount of information, but it also takes up half of the screen. This is why I turn it on and off as needed. You can see there are 700 records, 6 of which are distinct. There are no unique values, and the prices are distributed as shown in the graph to the right.

Remember though that no matter how large your data source is, this will only profile the first 1,000 records. If you want to see the stats for your entire dataset, you can. In the lower left corner of your screen, there is a small text box that says “Column profiling based on top 1000 rows.” Click it and you’ll see this:

2021-05-08_6-01-26.png

You can change it to look at your entire dataset, but be forewarned, this can be a severe performance hit. If your data has a few million records, every time you make a change to your data, Power Query will rescan the entire dataset and generate the data profile statistics. That can take several minutes with each click. For me, it is slow enough that I would just rather load what I have into the data model and use DAX to return the stats I need.

One more thing. Take a look at this column:

2021-05-08_6-05-55.png

You’ll notice there are no bars analyzing the data. This is because there is no data type for Power Query to work with. The data type is set to Any, or ABC/123. You should never let an Any data type column make it to the DAX model and this is a good indicator that you have a column data type not properly set.

Hopefully that helps. Every time I get a fresh install of Power BI or Excel, I always go to the View ribbon and make sure that Column Quality and Column Distribution are on. And while you are there, go ahead and make sure “Formula Bar” is checked as well. I believe fresh installations in early 2021 and later have this on by default, but if you installed your version of Power BI or Excel before that, the Formula Bar will be off, and it will not get turned on automatically by monthly updates.

Change The Day Your Week Starts In Power Query

By default, Power BI starts its week on Sunday, but that doesn’t work for every situation. Some companies prefer their week start on Monday, and I recently helped someone that needed their week to start on Thursday. Fortunately, by tweaking your date table in Power Query, you can make this change with no calculations at all. Here is what the first few days of 2021 looks like by default:

DayOfWeek.png

You can see that Sunday is the start of the week, being 0 for the week day. The formula to return the day of the week is:

= Date.DayOfWeek([Date])

The Date functions in Power Query that use day and week calculations have an optional parameter at the end. Simply add the desired day as the second parameter of the Date.DayOfWeek function, so it would be the following to change it to Monday:

= Date.DayOfWeek([Date], Day.Monday)

Now our Day of Week column looks like this:

DayOfWeek2.png

To change it to start on Thursday, simply use Day.Thursday and it will adjust accordingly. Day.Sunday, Day.Monday, etc. are just easier ways to tell Power Query when to start. Day.Sunday returns 1, so you could use that in your formula, but that is not as easy to read. Date.DayOfWeek([Date], 3), for example, isn’t immediately obvious that the week is starting on Tuesday.

This is only meaningful in functions that are Week based. Having your first day of the week start on Monday has no bearing on monthly, quarterly, or yearly functions. But functions like Date.DayOfWeek, Date.StartOfWeek, Date.EndOfWeek, and Date.WeekOfMonth can use this first day of week setting.

In the Power Query date table I linked to above, I have a variable called varFirstDayOfWeek that is preset to Day.Sunday. If you change that line, every relevant column in the Date table will adjust to your preferred day of the week.

DayOfWeek3.png

This will also make your DAX calculations easier as well, since you can just use the week number, day number, start of week date, etc. from the Date table rather than having to remember to add/subtract days to adjust the calendar to your needs.

Working with Unix Epoch Time In Power BI

You may need to write a Power BI report that works with Unix Epoch Time, converting either to or from it. Epoch Time is simply the number of seconds that have passed since January 1, 1970, at 12:00 am UTC. So at 1 am of that date, Epoch Time would be 3600, since 3,600 seconds have transpired. Today Epoch Time is around 1610644465, or 1,610,644,465 to make it easier to read, though you will never see the commas in a database.

Converting to and from this time format is deceptively simple, as long as you understand a few functions in Power Query and the concept of Duration. To convert a date to Epoch Time you need to use one of the Duration functions. We want to:

  1. Find the difference between a normal datetime value and Jan 1, 1970, which is #datetime(1970,1,1,0,0,0) in Power Query.

  2. Return that difference in seconds.

This is the result:

2021-01-14 09_18_37-Untitled - Power Query Editor.png

#datetime(1970,1,1,0,0,0) is Jan 1, 1970 at 12am. In other words, this is #datetime(Year, Month, Day, Hour, Minute, Seconds)

Without the Duration.TotalSeconds() function, you will get 18641.09:05:00, which is technically correct. That is 18,641 days, 9 hours and 5 minutes based on my sample time stamp shown in the image. Duration.TotalSeconds() converts that to seconds, which gives you a valid Epoch Time.

The interesting thing is your results in the desktop may differ from what is in the service. If you use something like DateTime.LocalNow() to return the current time, then convert that to Epoch Time using the following formula:

= Duration.TotalSeconds(DateTime.LocalNow() - #datetime(1970,1,1,0,0,0))

Your PC will have your local time with proper offsets from UTC. However, once you publish to the service, DateTime.LocalNow() will calculate at UTC +0, so that is actually correct. If you want to adjust for the timezone, simply add or subtract 3,600 from the result for each hour you are offset from UTC. So if you are in the Pacific timezone, which is UTC -8 (I’m ignoring DST adjustments - that is a different post entirely!) you could subtract 28.800 seconds. In this case you should not do that as Epoch Time and DateTime.LocalNow() are both UTC +0 in the service. But if you are converting from dates from a datasource that are fixed to your time zone, you may need to adjust to get it back to UTC.

Let’s go the other direction now, from Epoch Time to a normal time stamp. Use the following formula:

= #datetime(1970,1,1,0,0,0) + #duration(0,0,0,[TimeStamp])

This takes the date/time of Jan 1, 1970 at 12am and the total number of seconds from the [TimeStamp] field, which is the Epoch Time.

2021-01-14 09_39_41-.png

Duration returns a time here, which was in the format of 18641.17:04:58, so it added 18,641 days, 17 hours, 4 minutes, and 58 seconds to Jan 1, 1970 12am. That becomes Jan 14, 2021 at just after 5pm. That is UTC time. Here, I would need to convert this back to my local time. So I’d use this more complete function below.

= DateTimeZone.SwitchZone(
     #datetimezone(1970,1,1,0,0,0,0,0) + #duration(0,0,0,[TimeStamp]),
     -8
)

I added two more zeros in the #datetimezone() function which represents the hours and minutes offset from UTC, and I want no offset initially, but then in DateTimeZone.SwitchZone I adjust the final calculation by -8 hours to get to the Pacific Time Zone.

Remember, when dealing with time zones, Epoch Time is always UTC+0. The only conversions you need to do are to get from a local time zone to UTC+0 before converting to Epoch Time, or after you’ve converted Epoch Time to a regular time, and you want that regular time to be a local time zone. In many cases, you never need to do any conversion. Talk to your DBA about the time zone data is stored in. Many systems use UTC+0 for all times to keep conversions to a minimum.

2021-01-14 09_50_22-Untitled - Power Query Editor.png

In my case though, I wanted the time local, so my DateTimeZone.SwitchZone() function adjusted the time back 8 hours to just after 9am here.

One amusing side note: Epoch Time is a 32 bit integer. That means on Jan 19, 2038, the time will grow larger than a 32 bit integer can store. Don’t worry. Power Query uses a 64 bit integer for this calculation. So Unix systems all over the world may crash and burn, but your Power BI report will continue to refresh without error. 😉

Replace Power BI Alerts with Power Automate Alerts for SQL Server Data

If you’ve use Power BI for a bit, you have probably realized you can create alerts that will email you when a value hits a certain level, or you can use Power Automate to handle the alert and trigger a workflow that will do whatever you want. You can read more about that process here if you like, but the basic procedure is:

  • Create a Card visual in your report. This works with a Gauge visual too.

  • Publish your report.

  • Pin the card to a dashboard.

  • Go to the tile in the dashboard, select the ellipses menu, and select Manage Alerts

  • Set your alert as desired, or click the Power Automate link at the bottom to start a workflow.

That is all fine and good, but if your data is in SQL Server, you can avoid creating the report and generate the alert within Power Automate. What I am going to show you how to do is create an HTML table and then send that data to a user in Teams. Why would you want to do that though? A few reasons:

  1. It is one less report to publish an maintain.

  2. You have to add the visual to a Dashboard for alerts to work.

  3. To do anything fancy with the alert, you still have to use Power Automate, so Power BI here is extra work to create and maintain.

The type of alert this is ideal for is when if a query has records, there is something wrong and needs to be dealt with. Examples:

  • Number of orders past due

  • Number of inventory items with negative quantities

  • Number of invoices for customers with past due amounts

I’m going to show you how to do this in Power Automate with just a few steps. Let’s get started. In my example, I am going to return a table when a customer has placed an order where the order quantity will not divide evenly into the case pack. So if they order 100 units and the cases contain 24 each, I want to alert the order entry person to tell them the customer has effectively ordered 4.1667 cases, which isn’t allowed. They will need to order either 96 units or 120 units to get 4 or 5 cases.

1) You need to create a new flow, and it should usually be scheduled. That is the easiest to deal with. So navigate to Power Automate in your Office 365 portal and add a new Scheduled Cloud Flow.

2021-01-06 15_36_45-Epic Pen Content Surface_ __._DISPLAY1.png

Just put anything in the opening box it shows and click next. One there, you will see a Recurrence bar. Click that and it will look something like this.

2021-01-06 15_38_43-Epic Pen Toolbar.png

Here you set how often it happens. Above shows this will run every day in the Pacific time zone at 8:15am, 12:15pm, and 4:15pm. This dialog box is dynamic, so if you change Frequency to Week, a list of days will also show up, so you could tell it to run only on Monday, Wedneday, and Friday for example. Save when done.

2) Add the SQL Server Power Query connector. Search for Power Query and you should see results similar to below. Even though Power Query in Excel and Power BI can access a wide variety of sources, in Power Automate, we are currently limited to SQL Server.

2021-01-06 15_39_49-Epic Pen Content Surface_ __._DISPLAY1.png

Click Transform Data using Power Query. You’ll note that it is a premium connector. This means you need the appropriate Power Automate license. You might already have it based on your Office 365 subscription. You can read more here on licensing.

Now, at this point, click Edit Query create do your query. You should see a list of tables in your SQL database. Select as many as are needed to get the result you want. I am assuming your have a connection to a gateway. Setting up gateways and connections goes beyond the scope of this article.

I will say that the online query editor is pretty good, but if you are editing multiple queries, sometimes the UI just returns an error. I’ve always been able to fix that by saving the query, then going back in. If you are good with M code, you can do most of your transformations in Power Query in Power BI, then copy the relevant M code to Power Automate’s Power Query using the Advanced editor.

Note that you can connect to multiple tables and perform merges, appends, groupings, etc. The key to remember is you must have one and only one query that has “Enable Load” checked. This is your final query that will return the number of records you need to alert your users to, or will return no records if there are no problems.

Now is where the fun begins. Power Query has returned some data, so how do we get that info to the end user?

3) Add a new step and search for the Compose action. We want Compose-Data Operation action. Here we will count how many records Power Query returned. Sadly, this is not as simple as you would think. Here is what you do:

3a) In the Compose box will be an Inputs field. Click in that and the Dynamic Content/Expression editor will open up.

3b) In the expression box, type length(). The length() function is how Power Automate counts rows. It isn’t RowCount like in Power Query or COUNTROWS like DAX, but it does the same thing. Just like Power Query on your desktop, it is case sensitive, so it length, not Length or LENGTH.

3c) Click inside the parenthesis, then go back to the Dynamic Content tab and select the body/value object. Do not click body. It must be body/value.

2021-01-07 10_11_51-Epic Pen Content Surface_ __._DISPLAY1.png

In the image above you’ll notice I have been renaming my steps. II recommend you do this as well so you can tell what each step is really doing. You can do this via each step’s ellipses menu. Your final formula should look like this, depending on what your SQL Power Query step is named.

length(outputs('Get_Shipments_Out_of_Case_Pack')?['body/value'])

When you clicked on the body/value object, it created the outputs('Get_Shipments_Out_of_Case_Pack')?['body/value'] code. Whatever you named the Power Query step is what will be between the single quote marks, and all spaces replaced by an underscore. The length() function around it will return an integer for the number of records in the table. If there are no records, it will return zero.

4) Now we need to decide what to do with this table. In our example, 0 records is good, 1+ is bad. So we need to create a condition to do nothing if the result is 0, or send an alert if there are one or more records. So let’s add a new step and look for the Condition step.

In the Choose Value box, select the Outputs step from the RecordCount step in the Dynamic Content window - assuming that is what you renamed the purple Data Operation step. Then set your filter as desired. Here it is simply greater than 0. If it is greater than 0, then the steps we add to the green If Yes box will run. If it is 0, the steps we add to the orange If No box run.

Notice in the Condition box is an Add button. You can add more conditions and use all kinds of AND/OR logic here. It can be quite powerful.

2021-01-06 16_27_10-Epic Pen Content Surface_ __._DISPLAY1.png

For the If No box, we will do nothing. Just leave it empty. If there are no records, the flow will simply end.

For the If Yes box, we need a few more steps to get that data into a table and to Teams. Here is what that If Yes box will ultimately look like for this flow:

5) Add a step to the If Yes box by looking for the HTML table operation. In the From field, click on the same body/value object from the Dynamic Content for your Power Query data.

6) Add a Teams @Mention token action. This will allow Power Automate to mention that user. Unfortunately there is no way currently to mention a channel or team. If you need to alert multiple users, just keep adding a new @mention token for each. I strongly recommend you rename each step to include the users name so you don’t have 3 boxes, for example, that just say “Get @mention token for user/user2/user3.” In the user field of the @mention step, just put the user’s email address.

7) Now to send it to Teams. Get the “Post a message (v3)” action. This will post a message to the team and channel of your choosing as a post.

Select the Team and Channel from the dropdowns, then in the message compose your message. Here I am mentioning the user. Then I tell them how many records were returned by selecting the Outputs object from the RecordCount operation. Finally I return the table itself by selecting the Output object from the HTML table operation.

That’s it. The final flow might look like this:

2021-01-06 16_48_55-Epic Pen Content Surface_ __._DISPLAY1.png

Now, if you play around with Power Automate, you will find you can send this to a person in Teams via a message, email it to one or more people or groups, add it as a task in Planner, add the data to an Excel table in a spreadsheet, or even post it to Twitter. You are mostly limited by your imagination.

At this point, you want to save and test the flow.

I know this seems like more work than just creating a report, adding a tile to a dashboard and using that to trigger an alert, and initially, it may be. But now you can see that the entire alert process is in one place - a single flow. You do not need a dataset, report, dashboard, alert, and flow like you do with the Power BI Alert solution.

And once you’ve done this a few times, you will realize this makes a good template for future flows. Since discovering this pattern I have deleted half a dozen Power BI reports that were nothing but record monitoring report, making my Power BI environment a bit cleaner. It is also easier for the user. They don’t have to click on links to see the actual error recorded. They are there, with the alert, in Teams, the email, or however you have chosen to send the alert.

Renaming A Column In Power Query Based On Position

The easiest way to rename a column in Power Query is to do it the same way you do in Excel - just double-click it and rename it. Sometimes though the column you are renaming can have different names with each refresh, so this method won’t work. We can use a List in Power Query to make this dynamic. Consider this very simple example:

2020-11-09 16_03_18-Dynamic Column Renaming Data - Excel.png

You receive this file every day, and the column name starting in Row 6, Column B, constantly changes. We are going to extract that date and make it a column, as it should be, and then rename both columns regardless of what they are called.

The first thing we do after importing the Excel file into Power Query is to remove the first 5 rows. Use the Remove Rows button on the Home ribbon, and remove the top 5 rows. If your data has rows that move around a bit because someone inserts or deletes rows above it, see this blog post on how to dynamically find that first row and remove the appropriate number of rows. Now we have something that looks like this:

2020-11-09 16_09_43-Untitled - Power Query Editor.png

Ok, let’s get to work. NOTE: If you had any “Changed Type” step created by Power Query, delete it. That will cause problems in the refresh. We’ll add it back at the end when our column names are predictable.

1) We need that date in a new Date column so we can relate it to a Date table later. If you have a date in your date model, you always need a good date table.

2) On the Add Column ribbon click Add Custom Column. Name the column “Date” and use this formula:

try 
    Date.FromText(
        Text.BeforeDelimiter([Column2], " ")
        )
otherwise null

You don’t have to have the line breaks and indents, it just makes it easier to read. Let’s break this down from the inside out:

  • Text.BeforeDelimiter([Column2], “ “) - in the first row of data, this will return the “11/20/2020” text. In all other rows it will return errors, because there is no spaces in the other rows, and those are numbers, so there is no text to extract. That isn’t a problem and you’ll see why in a minute.

  • Date.FromText() will take the date in the first row and convert “11/20/2020” to a real date. In all other rows, it returns errors.

  • The try/otherwise construct says “Try this formula. If it works, give me the result. Otherwise, return null.” It is like IFERROR() in Excel. This is why the errors don’t matter. The try/otherwise construct takes care of it.

Now our data looks like this:

3) Right-click on the Date column and select the Fill menu, then Down. Now the date will fill the column.

4) On the Home ribbon, click Use First Row as Headers. Again, remove any automatically added “Changed Type” step if Power Query added one. Now we have this:

Both the 2nd and 3rd column need to be renamed, and we cannot double-click to rename, otherwise the refresh will fail tomorrow when the date changes to Nov 21, 2020. If you did double-click though, this would be the formula Power Query would generate:

= Table.RenameColumns(
    #"Promoted Headers",
    {
        {"11/20/2020 Production Quantity", "Production Quantity"},
        {"11/20/2020", "Date"}
    }
)

We need to replace the first part of the renaming list (the list is in the curly brackets). We can use a function called Table.ColumnNames() for this. To see how it works, remove your manual renaming step, and type in this formula, and you will get the list shown below. It is each column name in the table from the Promoted Headers step.

= Table.ColumnNames(#"Promoted Headers")

To use this we need to get the column names from the 2nd and 3rd row. To get items from a list, you can refer to the index of an item in the list using curly brackets again, so change your formula to this, and you’ll get the column name shown below.

= Table.ColumnNames(#"Promoted Headers"){1}
2020-11-09 16_29_14-Untitled - Power Query Editor.png

You’ll notice I used {1} to get the second item in the list. Power Query indexes at zero, so {0} would return the first item. Now you see where this is going?

5) Delete any steps you have up to the Promote Headers. Once again, manually rename the headers. This is so Power Query will generate the Table.RenameColumns() step for us and we will make minor tweaks. It should be this again:

= Table.RenameColumns(
    #"Promoted Headers",
    {
        {"11/20/2020 Production Quantity", "Production Quantity"},
        {"11/20/2020", "Date"}
    }
)

6) Replace the “11/20/2020 Production Quantity" with Table.ColumnNames(#"Promoted Headers"){1} and replace “11/20/2020” with Table.ColumnNames(#"Promoted Headers"){2}. It should now look like this:

= Table.RenameColumns(
    #"Promoted Headers",
    {
      {Table.ColumnNames(#"Promoted Headers"){1}, "Production Quantity"},
      {Table.ColumnNames(#"Promoted Headers"){2}, "Date"}
     }
)

It is dynamically finding the column name of the 2nd and 3rd columns and replacing it with specific text!

2020-11-09 16_42_54-Untitled - Power Query Editor.png

As always, set the data types for all columns at this point too now that we know what our column names will be.

Use List.PositionOf() To Find The First Row of Data In An Excel File

Excel can be one of the worst data sources for Power BI. Its greatest strength for users is also its greatest weakness for Power BI developers getting data out of it via Power Query - it is almost unlimited in its flexibility. So end users will do things that cause your refreshes to fail. Things like inserting columns, rows, merging cells, etc. I’m going to show you how to find data in a spreadsheet when users add or remove rows above the table you need. Consider the spreadsheet data below:

2020-11-06 07_52_45-Using List PositionOf - Excel.png

This is a simple grid of data. Ideally this would be in an Excel table, but often a Power BI developer cannot control how the data comes in, and most users still do not use Excel tables. So I call this a grid of data. And the problem with a grid of data is when it moves and you use Power Query to get the data, it will be missing information. For example, when you bring this Excel file into Power Query, it looks like this:

2020-11-06 08_02_13-Untitled - Power Query Editor.png

Fixing this is simple:

  1. On the Home ribbon in Power Query, select Remove Rows, then Remove Top Rows, then type in the number 5.

  2. On the Home ribbon, select “Use first row as headers”

Now it looks like this:

Perfect! We have good data. Next week, however, the user accumulating the production data add a few rows at the top of the file to include some notes about one of the facilities being down for maintenance one of the days.

2020-11-06 08_08_10-Using List PositionOf - Excel.png

Now the grid of data starts on row 8, not row 6 like before. When Power Query does the refresh, you get this:

2020-11-06 08_09_05-Untitled - Power Query Editor.png

Now you have a mess. the column names will load into Power BI as Column1, Column2, and Column3 if it loads at all. More likely you’ll just get a refresh error. If you had changed the data types (as you should) it would have broken right off of the bat when it tried to change the “Material” column to text. The Material column no longer exists! It is just Column1. The Production Date column is worse as if you tried to change that to a Date data type, that itself will generate errors because there is text in that column. In other words, this is a total mess. You could edit your Power Query code to skip the first 7 rows instead of the first 5, but that just becomes a cat and mouse game when the users adds or removes rows again in the future.

The better solution is to dynamically find the first row of data. Power Query has a function that will help with this. It is called List.PositionOf(). Basically what it does is it takes a list and tell you what position your search item is on in the list. Let’s walk through it with our new production data with the extra rows.

First, delete the Promoted Headers and Removed Top rows steps you created earlier so we are back to the below image. If there are any Changed Type steps, get rid of those as well. You should only have the Source and Navigation steps in the Applied Steps at the right side of the Power Query window.

2020-11-06 08_16_54-Untitled - Power Query Editor.png

We want to find the word “Material” here. Right-Click on the “Navigation Step” in the Applied Steps pane and click Insert After. It will create a Custom1 step that will simply have this in the formula bar: = Sheet1_Sheet - which is just pointing to the Navigation step. Change that to this:

= Sheet1_Sheet[Column1]

We told it to only show us [Column1] from the Sheet1_Sheet step. Sheet1_Sheet is just the name of the table. Confusingly Power Query hides this in the Navigation step. Here we can clearly see it. Note: If you are using this technique at any time after the Navigation step, the table name is just the name of the Applied Step. For example, if you had selected the first three columns and then selected Remove Other Columns from the ribbon, your step would be “Remove Other Columns” and the table name is #”Remove Other Columns” - the # and quotes are necessary if any table name has spaces in it.

Most Applied Steps in Power Query are just tables. But this one is not. You’ll notice after adding [Column1] after it, the name of the first column, it is now a list.

2020-11-06 08_29_56-Untitled - Power Query Editor.png

Now we are getting somewhere. Time to use List.PositionOf. Edit the formula above to this, and you’ll get the following screen.

= List.PositionOf(Sheet1_Sheet[Column1], "Material")
2020-11-06 08_32_00-Untitled - Power Query Editor.png

It found the word “Material” on row 7. Wait a minute… on the image above with the list, it shows Material on row 8. Power Query indexes at 0, so the first row isn’t 1, but 0. So Material is in position 7 in the list. This actually helps us with the next step.

Right-click on the Custom1 step and rename it something useful, like PositionOfMaterial. I recommend you use no spaces for this step. It makes referencing it later easier.

Right-click on PositionOfMaterial again and Insert Step After. Now, we need to get our original table and skip the first 7 rows, right? When you use the Remove Top Rows feature, Power Query creates a step using the Table.Skip function. We are going to do that manually. In the formula bar, you probably see =PositionOfMaterial and it still show the value. Get rid of that and type this:

= Table.Skip(Sheet1_Sheet, PositionOfMaterial)

Remember, our original table was Sheet1_Sheet. Table.Skip has this syntax: Table.Skip(TableName, RowsToSkip). TableName is Sheet1_Sheet in this case, and instead of hardcoding the RowsToSkip, we are using a variable, our PositionOfMaterial step. Now our data looks like this:

2020-11-06 08_40_18-Untitled - Power Query Editor.png

Now you can promote the first row as headers, change your data types, and continue on. Next week when our production supervisor turns in the report, it won’t matter if Material on on row 1 or 100, List.PositionOf() will find it for us and dynamically set the top of the table. Even if all of the header is deleted, List.PositionOf() will return 0 for the position of Material, and Table.Skip will effectively do nothing, skipping 0 rows. So it will still work for us.

This method will work with any kind of data, but Excel is the most common use for it. Most other data sources, like databases, SharePoint Lists, true Excel tables, and even CSV files, don’t have data moving up and down on you.

Group By In Power Query When Data In Column Changes

If you’ve used Power Query long enough, you’ve needed to group data and provide some sort of summary. For example, grouping by customer and summarizing sales. The normal way the Table.Group function works is to group the entire table by the field(s) you are grouping by, then providing whatever aggregations you need.

However, you can alter that behavior and group data by a column, and have it provide a separate group at each change in the data. Let me illustrate using some data a user posted in the Power BI forum:

2020-10-13 08_03_05-Untitled - Power Query Editor.png

They wanted to find the start and end date of each section in the OnOff column. So in the first group, the start and end dates would be 1/1/2020 and 1/11/2020. The On periods would be 1/12/2020 and 1/27/2020. Then, the next blank section would start at 1/28/2020

A normal grouping would look like this in Power Query:

  1. Select the OnOff column and press Group By in the ribbon.

  2. Add a new column called AllRows and use the “All Rows” operation.

  3. Group By will return the table shown. OnOff has two values, blank and “On” and then two nested tables.

  4. If you click on the small white area next to the yellow word “Table” you can see what is in that nested table in the bottom view. It has all records, or all rows, for the blank value. As you can see, there is no way this way to separate the first period that ends 1/11/2020 with the start of the next period on 1/28/2020.

2020-10-13 08_07_36-Untitled - Power Query Editor.png

The formula that is generated by Power Query uses Table.Group, and it looks like this:

= Table.Group(#"Changed Type", {"OnOff"}, {{"AllRows", each _, type table [Start of Month=nullable date, EmpID=nullable text, Fleet=nullable text, OnOff=nullable text, Rotation Date=nullable date]}})

If you look at the documentation though, for Table.Group you will see the 4th parameter is an optional setting called groupkind. It then proceeds to give you no info on what that means. If you search around a bit, you will find two settings that might work. One is GroupKind.Global, and the other is GroupKind.Local. Looking at the documentation for these two provides absolutely no help.

2020-10-13 08_20_38-GroupKind.Local - PowerQuery M _ Microsoft Docs and 59 more pages - Work - Micro.png

Well, that why I am writing this. I learned about this several months ago but forgot it. If you looked at the Power BI Community thread above, you’ll notice I did the Excel thing:

  • Bring in one of the columns into the table but offset by one row

  • Do some if/then analysis to determine when data changes

  • Do a grouping

  • Get the start/end dates.

But Jimmy801 posted the more elegant solution, reminding me of this optional and rarely used parameter, and that is the main reason for this blog post. To remind me of this parameter. 😁

By default, Table.Group uses GroupKind.Global as the 4th parameter. It scans the whole table and groups by each distinct value in the grouping columns, the OnOff field in our example. However, add GroupKind.Local to the function as shown below: (I’ve added line breaks for readability) and you’ll see a very different table:

Table.Group(
    #"Changed Type",
    {"OnOff"},
    {{"AllRows", each _, type table [Start of Month=nullable date, EmpID=nullable text, Fleet=nullable text, OnOff=nullable text, Rotation Date=nullable date]}},
    GroupKind.Local
    )
2020-10-13 08_26_14-Untitled - Power Query Editor.png

Now, every time the OnOff field changes, it does a grouping. So in the sample data, there are two sections that are blank, and two that are “On” and you can see that in the first blank section, it correctly shows 1/1/2020 through 1/11/2020. You can then use the Table.Min() and Table.Max() functions to get the start and end dates from each local grouping, and finally re-expand the necessary columns from the AllRows field.

2020-10-13 08_32_50-Untitled - Power Query Editor.png

The only caveat here is once you add the GroupKind.Local parameter, the little gear icon next to the Grouped Rows step goes away and you can no longer alter the grouping using the Group By dialog box. You would either need to edit the M code manually, or remove GroupKind.Local, then edit the group settings, then add GroupKind.Local back to the formula manually.

2020-10-13 08_36_08-Untitled - Power Query Editor.png

You can see the full PBIX file here or paste the full M code below in to a blank query of your own to play with.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdOxDYMwGEThXVwj2b7f4KTNAhkAsQL7l1iCQ1TJkyy78Nc93bqmmmtWUUlTij6P+9NbG88499c2/XGCLqBr0M3QLdB16F7QvaGrhUJQ5LufFEQxBV1MQRpTUMcUBDIFjUxBJlNQ6qICrUx5LfFa4rXEa4nXEq8luijRSYluKnineHTSL6rMpbAMLJ1zOwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Start of Month" = _t, EmpID = _t, Fleet = _t, OnOff = _t, #"Rotation Date" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start of Month", type date}, {"Rotation Date", type date}}),
    #"Grouped Rows" = 
        Table.Group(
            #"Changed Type",
            {"OnOff"},
            {{"AllRows", each _, type table [Start of Month=nullable date, EmpID=nullable text, Fleet=nullable text, OnOff=nullable text, Rotation Date=nullable date]}},
            GroupKind.Local
            ),
    #"Added Start Date" = Table.AddColumn(#"Grouped Rows", "Start Date", each Table.Min([AllRows], "Rotation Date")[Rotation Date], type date),
    #"Added End Date" = Table.AddColumn(#"Added Start Date", "End Date", each Table.Max([AllRows], "Rotation Date")[Rotation Date], type date),
    #"Expanded AllRows" = Table.ExpandTableColumn(#"Added End Date", "AllRows", {"Start of Month", "EmpID", "Fleet", "Rotation Date"}, {"Start of Month", "EmpID", "Fleet", "Rotation Date"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded AllRows",{"Start of Month", "EmpID", "Fleet", "OnOff", "Rotation Date", "Start Date", "End Date"})
in
    #"Reordered Columns"

Conditional Merge in Power Query

Doing a merge operation in Power Query is pretty simple, just click the Merge Queries button, select your tables and field(s) and away you go. But what if you want to merge Table A to a subset of Table B? You can do it, but you need to tweak your M code manually in the Advanced Editor.

In this example I will use the WideWorldImportersDW sample database Microsoft offers. You should have a copy of SQL Server installed to play with for Power BI. The Developer version of SQL Server is 100% functional and free forever. You just cannot use it in production.

I am going to use two tables for this example, Fact.Sales and Dim.Customer. I only want my Fact Sales table to have customer info for the Buying Group Tailspin Toys. That info is in the Dim Customer table. I can do this several ways.

  1. Do the merge, expand the Buying Group column from the Customer table, then filter to only show those rows.

  2. Pre-filter the Customer table for Tailspin Toys in the Buying Group column, then do the merge. It would need to be an Inner Join, otherwise you will get nulls in the Sales table when you expand, and then you have to filter those out. Which you can do, but it is more steps.

  3. You can bypass all of that and do it in one step.

I’m not going to bore you with how to do methods 1 and 2. Let’s do method 3, a conditional join!

First, create the merge as you normally would. In the Sales table, select Merge Queries from the ribbon, then select the Customer table. Select the Customer Key fields in both. Also make sure this is an inner join so it will remove all records from the Sale Table that are not Tailspin Toys. Your Merge dialog box will look like this:

Inner join dialog box

Inner join dialog box

Now click Ok. It will do the merge. Now we have to fix this so it is a conditional join. When you did the merge, Power Query wrote the M code shown below for you, which you can see in the Advanced Editor. Note that I put in the line feeds to make this more readable. I have also already expanded the Buying Group column. The “Merged Queries” step is what we want to dig into

let
    Source = Sql.Databases("localhost"),
    WideWorldImportersDW = Source{[Name="WideWorldImportersDW"]}[Data],
    Fact_Sale = WideWorldImportersDW{[Schema="Fact",Item="Sale"]}[Data],
    #"Merged Queries" = 
        Table.NestedJoin(
            Fact_Sale, 
            {"Customer Key"}, 
            #"Dimension Customer", 
            {"Customer Key"}, 
            "Dimension Customer", 
            JoinKind.Inner
        ),
    #"Expanded Dimension Customer" = Table.ExpandTableColumn(#"Merged Queries", "Dimension Customer", {"Buying Group"}, {"Buying Group"})
in
    #"Expanded Dimension Customer"

Table.NestedJoin has the following parameters per MS documentation:

  1. table1 as table,

  2. key1 as any,

  3. table2 as any,

  4. key2 as any,

  5. newColumnName as text,

  6. optional joinKind as nullable number,

  7. optional keyEqualityComparers as nullable list) as table

We want to tweak 3rd parameter - table2, which is our Customer table, We don’t want the full table2, we only want table2, filtered for Topspin Toys in the Buying Group field.

So in the code above, we need to replace the 3rd parameter which is just a reference to #”Dimension Customer” - the name of the Customer Table. (It is just Dimension Customer - the #”name here” syntax is just how Power Query references objects with spaces and special characters.)

We want to replace that with this code:

Table.SelectRows(#"Dimension Customer", each [Buying Group] = "Tailspin Toys")

So the full M code in the advanced editor becomes this:

let
    Source = Sql.Databases("localhost"),
    WideWorldImportersDW = Source{[Name="WideWorldImportersDW"]}[Data],
    Fact_Sale = WideWorldImportersDW{[Schema="Fact",Item="Sale"]}[Data],
    #"Merged Queries" = 
        Table.NestedJoin(
            Fact_Sale, 
            {"Customer Key"}, 
            Table.SelectRows(#"Dimension Customer", each [Buying Group] = "Tailspin Toys"),
            {"Customer Key"}, 
            "Dimension Customer", 
            JoinKind.Inner
        ),
    #"Expanded Dimension Customer" = Table.ExpandTableColumn(#"Merged Queries", "Dimension Customer", {"Buying Group"}, {"Buying Group"})
in
    #"Expanded Dimension Customer"

That’s it. You are done. And if you are doing this on SQL Server, you can right-click the Expanded Dimension Customer step, the final step where you expand the Buying Group column and see that View Native Query is showing. This means query folding is working. It generates the following SQL code. (I’ve removed some columns from the Sales to make the code shorter)

select [$Outer].[Sale Key] as [Sale Key],
    [$Outer].[City Key] as [City Key],
    [$Outer].[Customer Key2] as [Customer Key],
    [$Outer].[Bill To Customer Key] as [Bill To Customer Key],
    [$Outer].[Stock Item Key] as [Stock Item Key],
    [$Outer].[Invoice Date Key] as [Invoice Date Key],
    [$Outer].[Delivery Date Key] as [Delivery Date Key],
    [$Outer].[Salesperson Key] as [Salesperson Key],
    [$Outer].[WWI Invoice ID] as [WWI Invoice ID],
    [$Outer].[Description] as [Description],
    [$Inner].[Buying Group] as [Buying Group]
from 
(
    select [Sale Key] as [Sale Key],
        [City Key] as [City Key],
        [Customer Key] as [Customer Key2],
        [Bill To Customer Key] as [Bill To Customer Key],
        [Stock Item Key] as [Stock Item Key],
        [Invoice Date Key] as [Invoice Date Key],
        [Delivery Date Key] as [Delivery Date Key],
        [Salesperson Key] as [Salesperson Key],
        [WWI Invoice ID] as [WWI Invoice ID],
        [Description] as [Description]
    from [Fact].[Sale] as [$Table]
) as [$Outer]
inner join 
(
    select [_].[Customer Key],
        [_].[WWI Customer ID],
        [_].[Customer],
        [_].[Bill To Customer],
        [_].[Category],
        [_].[Buying Group],
        [_].[Primary Contact],
        [_].[Postal Code],
        [_].[Valid From],
        [_].[Valid To],
        [_].[Lineage Key]
    from [Dimension].[Customer] as [_]
    where [_].[Buying Group] = 'Tailspin Toys'
) as [$Inner] on ([$Outer].[Customer Key2] = [$Inner].[Customer Key])

Since 100% of the work is done on the SQL Server, it will be very fast compared to how it would run if the Power Query mashup engine on your PC or On-Premise Gateway would process it.

2020-09-04 16_37_41-.png

If you are using some other source, like text files or Excel files, folding won’t happen of course, but it should still perform well as this is single filter applied, not a filter applied for every row of either table.

I cannot say that this method is any faster than doing it the longer ways (methods #1 and #2 above) but it is shorter code. Plus, you can get fancier with the Table.SelectRows() function we used by filtering on multiple fields using and/or criteria. You may break folding if you get too fancy and the Power Query engine cannot figure the SQL out, so be careful. If performance is a must, methods #1 or #2 are more likely to fold with many filters in the condition, but you won’t know until you try. Happy Querying!

Add a Refresh Time Stamp To Your Power BI Reports

It is often useful to tell your users when the report refreshed so they have some idea of how current, or stale, the data is. Unfortunately, you cannot just add a TODAY() formula to your report as that will change each time they open the report. This method will give you the refresh time stamp and compensate for Daylight Savings Time.

First, let’s get the time and date in Power Query. This time will only change on the report refresh, which is what you want.

  1. Open Power Query. You can do this using Get Data on the main Power BI menu, and we want a Blank Query.

  2. Click on Advanced Editor on the Home ribbon of Power Query.

  3. Remove the default text.

  4. Replace it with this code

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VZDLCQAhDAV78Sxonv9axP7bWLNPMN4mDMYhc7oUBAFRmvNOJBTy8r8RnTpNJh8TdRo0iUzT94BIIeTzRBdAaBr5GF0A0FTyMZqGdNM2mwDkG7CZZuhQKEA2ZdWI+pQ1U9ae/7v5v9vTYNzTYNiyFG/Z5rU+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dtDSTStart = _t, dtDSTEnd = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source, {{"dtDSTStart", type date}, {"dtDSTEnd", type date}}, "en-US"),
    varCurrentDate = DateTime.Date(DateTimeZone.SwitchZone(DateTimeZone.LocalNow(),-8)),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [dtDSTStart] < varCurrentDate and [dtDSTEnd] > varCurrentDate),
    varDSTOffset = Table.RowCount(#"Filtered Rows"),
    #"Last Refresh Date" = #table(
        type table
            [
                #"RefreshDate"=datetimezone
            ],
        {
            {DateTimeZone.SwitchZone(DateTimeZone.LocalNow(),-8 + varDSTOffset,0)}
        }
        )
in
    #"Last Refresh Date"

Wow. What is all of that? Just to get a refresh timestamp? Yes. Let’s look at the steps.

Source: This is the Daylight Savings Time Start and End dates for the US. You will need to change this for your region as applicable. This goes through 2030 right now, and is subject to change at the whims of politicians.

You can manually edit this table by clicking on the gear icon in the Source step.

You can also get rid of all of that and point the source to a table you have somewhere. A table on your SQL server, a SharePoint List, Excel File, whatever. Swapping out the code for that is beyond the scope of this article. I wanted to provide a complete solution in one query without having to rely on external data sources. I know there are websites with this data you can link to, but I don’t like relying on websites for my reports, and don’t want to add those sites to my on-prem gateway just for a DST table.

Changed Type: This just changes the data keyed in the Source line to dates. It is always text by default. Those are in the US Date format and I told Power Query to use the en-US culture setting to do the type conversion.

varCurrentDate: This gets today’s date, with an offset for my UTC timezone, which is -8 - Pacific Time. You’d need to use Google for yours, or use this handy article on Wikipedia. You can use a decimal number here, so if you are in the Marquesas Islands, your offset is -9:30, which means you would use -9.5 here.

Filtered Rows: This will return 1 or 0 rows when the varCurrentDate variable is greater than the Start Date and less than the End Date for when DST occurs.

varDSTOffset: This is a variable that counts the rows returned in the Filtered Rows step. It is either 1 or 0. This will add 1 or 0 hrs to the time calculated next, adjusting for DST.

Last Refresh Date: This uses the formula below to get the current date and time of the refresh, adjusting for the varDSTOffset calculated above. It then puts it into a table and changes the type to “datetimezone” so it can actually be loaded into the DAX data model. Note, again, you’ll need to adjust that -8 to your UTC offset.

=DateTimeZone.SwitchZone(DateTimeZone.LocalNow(),-8 + varDSTOffset,0)

The final result is a date with a timezone offset that looks like this. Remember, I am in the -8 Pacific Timezone, and DST is in effect, so it added 1, making the net UTC Offset -7.

Note: If you do not observe DST in your area, just remove the +varDSTOffset part of the formula. You can also remove about 90% of this code, but I’ll leave that to you. Leaving the DST code in and removing varDSTOffset from your formula will not impact performance at all.

The DAX model will understand that and will adjust the time shown to the user. Finally, rename the query to “Refresh Time Stamp” and make sure it is set to load. Close Power BI and apply changes.

In DAX, create a new measure:

Last Update = 
     "Last Update on " 
     & UNICHAR(10) 
     & FORMAT(
         MAX('Refresh Time Stamp'[RefreshDate]),
         "MMM DD, YYYY H:MM AM/PM"
     )

Now drop that measure in to card. It will look something like this:

Resize and position the card as desired. If you do not like the forced line feed before the date, get rid of the UNICHAR(10) function. You can also customize the format of the time shown using the codes from this article by MIcrosoft.

Finally, to you can right-click on the Refresh Time Stamp table in Power BI and hide it. The measure will still work, but the table won’t clutter up your list of measures, tables, and fields.

One final bit of minutia. The varCurrentDate variable itself does not adjust itself for DST, so if your report runs between midnight and 3am on the day DST switches, it may be off one hour for that refresh. It will adjust though if it is run after the 3am switch.

Return Row Based on Max Value From One Column when Grouping

When you group data in Power Query, you are able to do a number of aggregations, like count, sum, min, max, etc. The problem is each of those aggregations only operate on that column. Let me show you what I mean. Suppose you have this data:

The Data

The Data

You need to return a table to the DAX model (or an Excel table) that only has one record per product, and you want the latest date, and the sales person associated with that date. So for product A, it is June 18, 2020 and Suresh, for B it is June 14, 2020 and Ana, and so on.

The Goal

The Goal

If you do a Group By in Power Query and group on the product, then select the Max aggregation for the date and sales person, with the Group By dialog box configured as shown below, you get the following table:

20200715 - Incorrect Grouping.png

The first row is correct, but that is just a coincidence. The max value for that row for product A is “Suresh” because that is alphabetically the highest name, starting with the letter S. The second row though you can see is wrong. It picked up Sally, the last name alphabetically in the Sales Person column. Sally is not associated with the June 14 record though, Ana is.

So, how do we fix this? There are two ways. Let’s do the long way. First, change the group by dialog box by removing the Max aggregation for the name and replace it with the All Rows aggregation as shown. It will return a new column called “All Rows” that has a nested table.

20200715 - Correct Grouping.png

Looking at the image below, if you click to the right of the word Table you can see at the bottom of your screen the contents of that field. Do not click directly on the word Table itself as that will cause the table to expand. If you do that, just delete the last step Power Query generated to get back to this view. You can see below it has all 3 sales records for product B, and we want to return Ana.

20200715 - All Rows Table Contents.png

You need to add a new column at this point, so go to the Add Columns tab and add a new column. Enter this formula:

= Table.Max([All Rows], "Sale Date")

Table.Max will take the table you give it, which is [All Rows] in this case, and return the record that has the maximum value for the field you specify, which is “Sale Date'“. You now have a new column that will have the word Record in yellow. You can see the formula at location 1 below. If you click to the right of it on the Product B row (2), at the bottom of the screen you will see it returned the June 14 record and has Ana’s name (3). We are getting there!

20200715 - the Table Max Result.png

Now just click the Expansion icon in the upper right of the “Max Record” field and select Sales Person.

20200715 - Table Max almost final.png

Now each product record has the correct sales person that corresponds to the latest sales date. At this point, just remove the All Rows column and change the data type of Sales Person to text.

But that is 6 steps. Why can’t we do this in a single grouping step and radically cut down all of the extra steps? We can! But that requires a bit of manual M code. Let’s see how that works. First, let’s take a look at the entire M code for what we just did through the expansion of the final record:

20200715 - M Code Before Tweaking.png

You can ignore the Source line. That is just how Power Query stores data when you use the Enter Data function. I pasted the data in from Excel vs linking directly to Excel so I could share the PBIX file with at the bottom of this post.

The second step is “Grouped Rows” and I’ve circled that in Red. I added a lot of line feeds and indentation to make it easier to read. Remember, it isn’t M code unless it is formatted. What we are interested in is what starts on line 9 that is marked “All Rows.” This is how that nested table gets built. We don’t want the nested table though. We just need someone’s name. So, that little bit of code is as shown below:

{"All Rows", each _, type table [Product=nullable text, Sale Date=nullable text, Sales Person=nullable text]}

What this is doing is saying for each “_” return a table. What does that mean though, the “each _” part? Well, if you want to do a deep dive, you can read more about that at Ken Puls site. For our purposes, the _ character is a temporary variable that stores the entire table for for the current grouping being passed by the Table.Group function, which is at the Product field level. It is the same as:

{"All Rows", each _}

The rest of the statement is just classifying the table fields. If you want to go down that rabbit hole, be my guest. You will see though that we don’t need that table formatting. What we want to do is take that temporary _ variable with our table in it, and return the right sales person for the maximum date. All we have to do is wrap the variable with Table.Max, as follows:

{"All Rows", each Table.Max(_, "Sale Date"), type table [Product=nullable text, Sale Date=nullable text, Sales Person=nullable text]}

Now, instead of a nested table, we’ve jumped straight to the single record we want within the Table.Group function, as shown below. Now we just need to get the sales person’s name.

20200715 - M Code 02 with Table Max Record Showing.png

To get a single value from a field in a record, just append the field name in square brackets, as follows:

{"All Rows", each Table.Max(_, "Sale Date")[Sales Person], type table [Product=nullable text, Sale Date=nullable text, Sales Person=nullable text]}

Our final tweak will be to rename the “All Rows” field to “Sales Person”, then get rid of the table format code. That leaves you with this very short statement below. You’ll notice that instead of the type table with the field formatting, I told it to just set this value to text.

{"Sales Person", each Table.Max(_, "Sale Date")[Sales Person], type text}

20200715 - Final Result - Just Needs Data Type.png

That’s it. The table above is what is returned, and you’ll see that Ana was properly returned for product B. Just set the Sales Person data type to text, and you are done. The full M code is below. You’ll notice that there are only two steps here, the Source step, and the Table.Group step.

20200715 - Final M Code.png

The beauty of this is, let’s say you have four columns you need to pull out this way, perhaps including the sales person’s last name, the region, and their commission rate. With the long method above, you have to add a new column with the Table.Max function, then extract the field you need, then expand the record for each one. You’d be well over 20 steps for the entire query.

With this method, you just keep adding columns to the Table.Group function, like this:

{"Last Name", each Table.Max(_, "Sale Date")[Last Name] type text},
{"Region", each Table.Max(_, "Sale Date")[Sales Region] type text},
{"Commission Rate", each Table.Max(_, "Sale Date")[Commission Rate], Percentage.Type}

That is still just two steps! I have confirmed this runs much faster than the multiple step process. I came up with this when I had a 20K record table that I thought was taking way too long to finish. This updated method took a fraction of the time, it just required a bit of work on my part.

I hope that helps you. The entire example can be found in this PBIX file.