Root cause analysis in Power BI
Microsoft Power BI has some great AI visuals which can provide an in-depth analysis of your data. In our last post we talked about an AI visual – Key Influencer visual. This visual helps in identifying factors that can impact an outcome. In that post, we analyzed what factors influence employee attrition. We also deep-dived into segments and clusters contributing to employee attrition with graphs and charts.
In this post, we will analyze and play with another AI visual – Decomposition tree.
The decomposition tree breakdowns a numerical measure into parts and analyzes what factors cause the measure to be high/low.
From Microsoft documentation:
The decomposition tree visual in Power BI lets you visualize data across multiple dimensions. It automatically aggregates data and enables drilling down into your dimensions in any order. It is also an artificial intelligence (AI) visualization, so you can ask it to find the next dimension to drill down into based on certain criteria. This makes it a valuable tool for ad hoc exploration and conducting root cause analysis.Microsoft
Let’s take a well known example of employee attrition and understand why attrition is high. From the decomposition tree visual we plan to get answers to the following question:
What causes employee attrition to be high?
At the end of this post you will have an idea of how to use this visual for exploratory and visual analysis, decomposition of values by factors, and how you can use AI splits to dynamically split and understand the next factor for drill down.
Our final output could look like:
We install the latest version of Power BI Desktop, and click on the decomposition tree visual.
You see two input fields “Analyze” and “Explain by”. In the Analyze field we put “Attrition %” and in Explain by we put several other fields say “Overtime”, “Department”, “BusinessTravel”, “MaritalStatus”, “Gender” etc. How to choose these fields in the first place? That’s a tricky question and we will answer this later.
Our decomposition tree when we drag Attrition % looks like:
Attrition % overall is 16.12%. Our next step once we have added our metric is to understand:
- Which of the factors cause attrition % to be high?
- Which of the factors cause attrition % to be low?
Remember we have dragged several fields in “Explain by” section? Let’s click on the “+” sign next to the Attrition % bar.
You see the fields you have dragged. In addition, you see two more fields – High value and Low value.
Exploratory and Ad-hoc analysis
We begin with exploratory analysis by analyzing Attrition % by OverTime. Attrition % is 30.53% if OverTime is Yes. This means when OverTime is high attrition will be high.
Let’s expand this level and understand when OverTime is Yes then what’s the next factor which contributes to attrition%? Let’s explore Marital Status.
Attrition is high among unmarried individuals and these are the ones who over time. Let’s try adding another level to this analysis, say Department.
Unmarried individuals in the Sales department who over time contribute to 65.31% attrition! We can also verify this number by adding tooltips.
Out of 49 employees in Dept Sales with Marital Status Single and OverTime Yes, 32 of them left the company.
What if we start our analysis, not with OverTime? Let’s pick monthly income as the starting factor
The visual flow is quite different here! Attrition is highest when monthly income is low and in the Sales department when OverTime is High.
With the decomposition tree, you can perform root cause and exploratory analysis by playing with the multiple factors and dimensions. You not only get a deep understanding of what’s happening in your data set, but you can also visually understand the data in a tree format.
We started analyzing the factors based on our domain knowledge and understanding of the dataset. What was our rationale for choosing OverTime as the starting point of our tree?
The decomposition tree comes with another option to split the tree using AI algorithms. Remember we had two more options in our tree “High value” and “Low value”? It’s time to utilize them.
Let’s start with a blank slate and this time instead of selecting OverTime, let’s select “High value”.
As we keep selecting High value at each level of the tree, the algorithm identifies the next level on its own. In the example above the levels chosen were Monthly Income followed by OverTime, Education Field and JobSatisfication. Attrition % is high when monthly income is between 0-2800, and so on.
In AI splits you see a bulb icon next to the level name. Once you hover on the bulb icon you get to see why this level was chosen.
You can also select “Low value”. Once you select the low value you will observe that the factor and analysis changes.
How to choose fields in “Explain by”?
Should we choose AI split or manual split?
How do we choose the fields in “Explain by”?
The best way to start analyzing the tree is using manual split based on the domain context and your understanding of the data. After 2-3 levels of manual split, you can then split the tree further using AI splits and understand the factors responsible for making a metric high or low.
There’s also a smart alternative to this. You can use “Key Influencer Visual” to understand what factors lead to Attrition = Yes. The visual will provide top factors impacting an outcome (attrition = yes), and you can put those factors in “Explain by” section of the Decomposition tree. When you run key influencer analysis on the employee attrition data set you will get the results as explained and shown in the previous blog post.
You can put Age, OverTime, JobLevel, MonthlyIncome, YearsInCompany and others in the Explain by section of the decomposition tree visual and start drilling down the data.
The decomposition tree is a smart visual to breakdown a numerical measure into components. This AI visual aids in root cause and deeper analysis as shown above. You can perform ad-hoc analysis for the problem in question, understand the breakdown of values using manual and AI splits, and combine it with other Power BI AI visuals to strengthen your analysis.
One last note: to get the best of the output and results from this visual, you may want to convert numerical attributes like age, income, etc into categorical values (or bins – Example above: monthly income is broken down into 0-2800, 2800-5000 etc. bins).
PS: AI splits in the decomposition tree comes with two analysis mode: absolute and relative. We will cover this in detail in next blog post.
If you are looking to explore the possibility of applying AI in your dataset or looking to evaluate the use of Power BI in your organization, don’t hesitate to contact us today.