Analytics Needs Explanation: Helping Users to Understand Underlying Data and Processes

The data available for analytics is rapidly growing and, simultaneously, methods are becoming more sophisticated. This leads to more extensive insights than ever, but at the same time, a large data pool and complicated processing algorithms and data models make analytics harder to understand than ever. For instance, if a prediction model is based on a proprietary algorithm or its training data is unknown, it is very hard for a casual user to reconstruct how a certain prediction was calculated [1]. Besides, there are usually only a few experts in a company who really know how insights are computed ­– all other users must have a great amount of trust.

Figure 1. Understanding the data versus getting the message

Figure 1 illustrates the difference between “getting the message,” which means that a user conceives the insights (e.g., sales will go up), and “understanding the data,” which means a user actually comprehends how the insights were produced (e.g., sales will go up because customer feedback shows that the target audience is expanding). Hence, the problem is not the outcome, as a user can easily arrive at the right conclusion and make great decisions without actually understanding the underlying data. However, a lack of proof and understanding may lead to doubts and decrease acceptance of the system. Consequently, Business Intelligence and Analytics (BIA) systems should ensure that users not only get the message but also understand the underlying data and processes.

Interestingly, there is extensive work on the “getting the message” part. Companies use visual analytics, information design guidelines, or storytelling to communicate the results in an easy and comprehensible way [2,3]. The explanation of underlying data and processes, however, is often neglected. This is probably because delving into the inner core of insights can be tedious, and many users simply don’t care about the process as long as the results are positive.

Nevertheless, an understanding of underlying data and processes is an important factor for the long-term success of analytics in a company. A casual reporting user should not only get the meaning of a KPI, but also know how it was fundamentally derived, to spot incorrect data or to get a feeling about other consequences or biases. However, the increasing complexity of data and analytics makes it harder to trust one’s gut about data. And it gets even harder with the rise of prescriptive analytics, where computer-based methods not only provide data for a decision but also make the decision. It is not without reason that users become wary of non-comprehensible insights from black boxes that they don’t understand. This is why, modern BI solutions need to incorporate adequate ways to explain to users the underlying data and processes in order to ensure long-term acceptance and trust of analytics.

Five Pillars of Understanding Analytics

Figure 2 shows five factors that influence the understanding of analytics data and processes. Here it becomes obvious that this definition of understanding is not just about comprehension but also about trust and transparency. The factors and ways for actual implementations are explained in more detail in the following section.

Figure 2. Factors that help users "understand the data"


1. Understandability: Explain KPIs and Business Terms in a Glossary

An initial step toward better understanding is a common language. A way to achieve this is a comprehensive glossary that explains all KPIs and related business terms. Here, a KPI explanation should encompass at least the following three parts:

  • Meaning: What does the KPI represent?
  • Composition: What data or sub-KPIs are used to come up with this KPI?
  • Usage: Where and for what decisions is this KPI used?

Moreover, the glossary should also explain business terminology because many terms have different meaning in different domains or departments. Usually, such a glossary is part of an overarching data governance or data strategy [4].

2. Traceability: Track and Visualize What Happens to Data over Time

Besides the semantic meaning of values, it is also important to track those values over time. Data lineage (or data provenance) tracks data across its life cycle, which includes everything that happens to the data set over time. This is often necessary in compliance and auditing processes, but it also increases the transparency of analytics. For instance, it becomes possible to trace a certain insight back to its origins and thereby understand it more fully.

3. Credibility: Be Honest about the Reliability of Data

The increasing complexity in analytics has rendered the vision of a “single point of truth” (SPOT) less realistic. As a matter of fact, most companies come up with two or more varying answers to the same question. This is not surprising as values stem from different systems with varying aggregation rules, different batch periods, divergent terminology, and many other uncontrollable factors.

This is why it is easier to be honest about the credibility of datasets and tell users if a value might not be 100% accurate, instead of trying to create a SPOT. This approach is also more efficient, as not all scenarios require total accuracy and consistency.

4. Responsibility: Link Data to People When Possible

There are always questions that can’t be answered with the information in databases and information systems. Here, a very simple but powerful solution approach is to link data to appropriate people in an organization, who can provide deeper insights and additional information.

Usually this is done by designating data stewards [5] who are responsible for certain data sources, data sets, or processes. They are able to explain how things work and can help users when necessary. In more complicated scenarios, it also helps to distinguish between technological stewards, who can answer all questions about systems and processes, and a domain steward, who can handle all business-domain-related issues.

5. Transparency: If You Can’t Explain What Happens, Be Transparent

Lastly, some use cases simply can’t be explained, for instance, predictions created by self-learning neural networks. There is currently extensive research occurring on this topic, but an elegant and simple way to illustrate what happens inside such a black box is unlikely to appear soon [6,7].

However, one thing that helps garner acceptance is to be as transparent as possible and tell users what inputs and methods are used and how they work on an abstract level, just like Facebook tells users why a certain ad might be interesting to them with a simple two-liner. Admittedly, this does not enable users to check the actual correctness of an output, but it provides some understanding and a mild feeling of control.

Conclusion

This article showed the fundamental difference between “getting the message” and “understanding the data,” and also how important actual understanding is to ensure long-term success of analytics. Even if getting the message might be enough for many of today’s users, understanding underlying data and processes becomes increasingly important as modern analytics solutions become more sophisticated and harder to see through.

As a consequence, BI systems and engineers should provide as much explanation as possible to engender user acceptance and trust. The five pillars introduced in this article can be a starting point for a more transparent and understandable world of analytics.

References

[1] (2013) Bostrom, N. & Yudkowsky, E.: “The Ethics of Artificial Intelligence”
https://intelligence.org/files/EthicsofAI.pdf

[2] (2017) Wells, D.: “Beyond Data Visualization: The Power of Data Storytelling”
https://www.eckerson.com/articles/beyond-data-visualization-the-power-of-data-storytelling

[3] (2015) Eckerson, W. W.: “See, Know, Act: How Visual Design Standards Improve Analytical Literacy”
https://www.eckerson.com/articles/see-know-act-how-visual-design-standards-improve-analytical-literacy-700345e0-46ba-47d0-8ec5-e2f79da8ecfe

[4] (2014) Eckerson, W. W.: “Data Governance Part II: How to Create a Common Data Vocabulary”

https://www.eckerson.com/articles/data-governance-part-ii-how-to-create-a-common-data-vocabulary

[5] https://en.wikipedia.org/wiki/Data_steward

[6] (2013) Muehlhauser, L.: “Transparency in Safety-Critical Systems”
https://intelligence.org/2013/08/25/transparency-in-safety-critical-systems/

[7] https://optimizingmind.com

Julian Ereth

Julian Ereth is a researcher and practitioner in the field of business intelligence and data analytics.

In his role as researcher he focuses on new approaches in the area of big...

More About Julian Ereth