A key barrier to broad AI/ML adoption stems from the opacity for users of the reasons behind the insights generated. People, especially SMEs, are naturally skeptical of AI/ML results when first encountering them; model interpretability – explainability – is critical to helping drive change management and adoption.
Furthermore, interpretability helps evaluate and troubleshoot machine learning models. Exposing model interpretability helps users to understand why a model is predicting certain outcomes and how input features influence predictions.
In general, the more complex a machine learning model is, the harder it is for a human to interpret the results. For example, deep learning models include many hidden layers of a neural network. With current AI approaches, it is not possible to identify what the nodes in each layer really represent, and what their relative importance is. In contrast, simpler models like regressions or trees support clearer interpretability because it is possible to determine the relative importance of each decision element for every predicted output.
In many cases, you may face a small marginal improvement in performance when a more complex model is employed. In those cases, managers may want to explicitly consider whether the more complex model is “worth it” or whether a simpler model with better explainability works best within the overall business process. In several situations, the organization can get started with a simpler model while it builds trust in AI/ML techniques. More complex models can be deployed later to take advantage of the associated additional business benefits after there is a foundation for building trust.
In some cases, more complex models like deep neural networks may be needed to achieve the required performance. While it is still possible to interpret these models to some extent, there are significant limitations given current understanding and capabilities.
The following section presents design elements of interpretability and how these can be incorporated into a new machine learning effort.
Earlier chapters explain that machine learning models identify the feature weights, θ, that minimize a training loss function. Interpretability techniques introspect θ to give relative importance to the weights.
When performed on the aggregate trained model, we consider the outputs as “global” interpretability. This contrasts to “local” interpretability that is performed on a specific model prediction (e.g., a specific customer attrition score). In addition, there are interpretability techniques that are machine learning model-specific or model-agnostic. These techniques are rapidly evolving in scope and function, but they already open up the algorithm “black box” to give users guidance on what the model deems important, both globally and locally.
Some machine learning frameworks include interpretability packages that expose the feature contributions for each model. Feature contribution percentages tell you the relative importance of the inputs that are used by the model to generate predictions.
A more in-depth treatment of interpretability is beyond the scope of this guide, and we would direct readers to other references. However, some of the techniques we use to provide interpretability as a part of model prototyping and ongoing operations include:
When evaluating models, it is best practice to review the local interpretability for model outputs across true positives, false positives, and false negatives, where possible. A business user with context should be able to read the interpretability outputs and understand how they would use the information to make an informed decision based on the AI insights provided.
Figure 35: Example of a risk score from an AI/ML model charted over time. Details of the local feature contribution at a specific point in time appear in the table below.
In addition to exposing the feature contribution percentages, model interpretability can be improved by using human interpretable feature names. Data scientists may be tempted to use shorthand in their code to label features with names like “X1” and “X2,” but this shortcut limits the ability to understand the model results easily. Instead, encourage use of descriptive names like “Days Since Last Interest Rate Change” or “Value of Credit Transactions in Last 30 Days.”