Chapter 5 Local Surrogate (LIME)
Local Interpretable Model-agnostic Explanations (LIME) [7] is a model-agnostic method to explain the predictions of black-box models. It is based on the idea that a model’s predictions can be explained by a local linear approximation of the model around the instance of interest. The local linear approximation is obtained by fitting a linear model to datapoints around an instance of interest. The subset is chosen by sampling instances from the training data according to a kernel density estimate of the training data around the instance of interest. The local linear approximation is then used to explain the predictions of the model.
Here is the basic process of LIME. Assuming that we need to explain why a complex black-box model (original model) makes a prediction for an instance which we are interested in. Evading from probing into the internal structure of the model, LIME instead uses a local linear approximation of the model to explain the prediction.
LIME is given three pieces of information: an instance of interest \(x_i\), a prediction \(y_i\) from the black-box model on \(x_i\), and a training dataset\(X\). We need a new dataset to train LIME, which is sampled from \(X\), and is then weighted by the proximity of the sampled instances to \(x_i\). The black-box model is then predict the labels for the new dataset, which are the “ground-truth” labels for the “simple” interpretable model, which in turn, is chosen from the Chapter of intrinsic interpretable models. LIME tries to fit the “simple” model to the new “ground-truth” labels
Mathematically, LIME is formulated as follows: \[\text{explanation}(x)=\arg\min_{g\in{}G}L(f,g,\pi_x)+\Omega(g)\] , where \(x\) is the instance of interest and \(f\) is the black-box model. \(g\) is the “simple” interpretable model chosen from the potential explanation family \(G\) (e.g. linear model, decision tree, etc.). \(\pi_x\) determines how we perturb \(x\). It is the sampling distribution over the close points around \(x\), determined by the kernel density estimate of the training data. \(L\) is the loss function (e.g., mean squared error), to goal of which is to make the “simple” model to mimic the prediction of the black-box model. \(\Omega\) is the regularization term that restricts the complexity of the “simple” model.
The process of LIME is:
- Choose an instance of interest that you desire an explanation for its prediction from the black-box model.
- Gain a perturbed dataset over the instance of interest and get the black-box model’s prediction on the perturb dataset.
- Weight the perturbed dataset by the proximity of the sampled instances to the instance of interest.
- Train a “simple” model on the weighted perturb dataset.
- Use the “simple” model to explain the prediction of the black-box model on the instance of interest.
It is worth noting that the “simple” model should be a good approximation of the black-box model locally rather than globally.
5.0.1 Examples
LIME can be applied to tabular data, text, images, and time series data. We focus on text data here. The black-box model here is a deep decision tree. We randomly pick up an instance of interest.
title | label |
---|---|
House Dem Aide: We Didn’t Even See Comey’s Letter Until Jason Chaffetz Tweeted It | 1 |
House | Dem | Aide | We | Didn’t | Even | Comey’s | prob | weight |
---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 1 | 0 | 0 | 1 | 0.17 | 0.57 |
0 | 1 | 1 | 1 | 1 | 0 | 1 | 0.17 | 0.71 |
1 | 0 | 0 | 1 | 1 | 1 | 1 | 0.99 | 0.71 |
1 | 0 | 1 | 1 | 1 | 1 | 1 | 0.99 | 0.86 |
0 | 1 | 1 | 1 | 0 | 0 | 1 | 0.17 | 0.57 |
Each column is a feature name, in this case, it is a word in the text. Each row is a perturbation that 1 represents the word is present in the text, and 0 represents the word is not present in the text. For instance, the first row represents “House Aide We Comey’s”. The “prob” column is the prediction of the black-box model on the perturbed text, and the “weight” column is the weight of the perturbed text. The weight is calculated by the kernel density estimate of the training data around the instance of interest.
Next, we train a linear model on the perturbed dataset. and use the regression model to explain the prediction of the black-box model on the instance of interest.case | label_prob | feature | feature_weight |
---|---|---|---|
1 | 0.170 | good | 0.000 |
1 | 0.170 | a | 0.000 |
1 | 0.170 | is | 0.000 |
2 | 0.994 | channel! | 6.181 |
2 | 0.994 | For | 0.000 |
2 | 0.994 | ;) | 0.000 |
The label_prob is the prediction of the white-box model on the perturbed text, and the feature_weight is the weight of the feature importance regarding label_prob. 0 indicates that the word does not affect the prediction of the white-box model.