Explainable ML Models: what are explanations and why do we need them? – Part I
Interpretability is a key element of trust for AI models. An explanation is an interpretable description of a model behavior. For an explanation to be valid it needs to be faithful to the model and it needs to be understandable to the user.

This series of posts are a digest of the AAAI 2021 Tutorial: Explaining Machine Learning Predictions: State-of-the-art, Challenges, and Opportunities by Julius Adebayo - MIT, Hima Lakkaraju - Harvard University, and Sameer Singh - UC Irvine.
*Post image by Daniele Levis Pelusi on Unsplash

Machine learning (ML) and Artificial Intelligence (AI) are widespread and applied in almost any area we can think of, for example: predicting the outcome of elections, recommending what book to read next or what products to buy, and determining the approval or rejection of loans.
Given the sensibility of the fields and topics in which ML and AI models assist humans in decision-making processes, the ability to understand and interpret these models has become clear in avoiding catastrophic consequences. In many cases, black box predictive models have led to serious societal problems that deeply affect health, freedom, racial bias, and safety. It is now universally agreed that interpretability is a key element of trust for AI models.
What is an explanation?
An interpretable description of a model behavior.
For an explanation to be valid it needs to be faithful to the model (i.e., not express things that are completely off) and it needs to be understandable to the user.
Types of Explanations
- Local. Interpretable description of the model behavior in a local neighborhood. They explain individual predictions.
- Global. Explain complete behavior of the model.
Why model understanding?
- Facilitate debugging In many cases, models make the correct decision but based on the wrong features. For example, given an image, a model can correctly predict whether a specific animal is present; however, its prediction is based on the surroundings in the image (e.g., snow, water) rather than on the features of the animal. If we understand how the model works, we can fix it so it focuses on the right features.
- Facilitate bias detection. In situations such as loan approval or deciding if a defendant can be deemed ready for release, it is necessary to give the decision-maker (e.g., a judge or financial advisor) the whole picture of not only the final prediction but also the features in which the prediction was based. For example, it is well known that there have been cases in which a model deems an individual as not ready for release or not deserving of a loan purely based on race or gender. If these features are surfaced, the judge or financial advisor will have a more complete picture of the individual in turn and know that the decision is based on race and/or gender, instead on merely their financial history or credit details.
- If and when to trust models when making decisions. Understanding a model's process and the features the output prediction is based on, can help experts decide if and when to base their decisions on such predictions. For example, some models might make better predictions on certain population - African-American females - than in others. Such information can also help experts decide whether a model is suitable for wider deployment in the real world.
- Recourse. Knowing how and based on what a model makes its predictions, is not only useful for a decision maker but also for the person being affected by the decision. For example, if a person whose loan has been denied knows what to improve based on what the model considers as relevant features (e.g., income, employment conditions, etc...), they can get better prepared in order to have more chances for approval the next time they apply.
How to understand a model?
- Build models that are deemed interpretable from the beginning. Linear models or shallow decision trees are considered interpretable because they are easy for humans to understand.
- Explain complex models in a post-hoc manner. Models that are too complex for humans to easily interpret, can be explained by passing them through explainers which can produce explanations in the form of linear models, shallow decision trees, or even visualizations that help stakeholders understand how the model works.
TL;DR
Interpretability is a key element of trust for AI models. An explanation is an interpretable description of a model behavior. For an explanation to be valid it needs to be faithful to the model and it needs to be understandable to the user. there are two types of explanations: local and global. Models can be explained either by making them interpretable from the beginning - simple and shallow models, or by passing them through explainers which can produce explanations in a post-hoc manner.
In the next post of this series we will cover local post hoc explanations.