RBX: REGION-BASED EXPLANATIONS OF PREDICTION MODELS

Abstract

We introduce region-based explanations (RbX), a novel, model-agnostic method that uses only query access to quantify the sensitivity of the scalar predictions from a black-box model to local feature perturbations. RbX is based on a greedy algorithm for building a convex polytope that approximates a region of feature space where predictions are close to the prediction at some target point x 0 . The geometry of the learned polytope -specifically the change in each coordinate necessary to escape the polytope -then explains the local importance of each feature in changing the model predictions. In particular, these "escape distances" can be standardized and ordered to rank the features by local importance. The RbX method is informed by a goal of detecting as many relevant features as possible in locally sparse prediction models, without including any features that do not enter in to the model. We provide a real data example and synthetic experiments to illustrate the encouraging performance of RbX in this respect.

1. INTRODUCTION

Suppose we have a prediction model f (x) to estimate a scalar outcome y given a set of features x ∈ R d . We assume we do not have any knowledge about the functional form of f . Rather, we consider f as a black-box function to which we only have query access, meaning we may compute the value of f (x) for any desired input x. After making a prediction at a target point x 0 , we seek to quantify the local importance of each feature on the prediction. Perturbations in certain features will be more influential than others in changing predictions from f near x 0 . We would like a systematic way of identifying these features. We distinguish our problem, which we call local prediction importance, from the questions of feature selection and feature importance. Feature selection methods, such as the LASSO for linear models (Tibshirani, 1996) and modern extensions like LassoNet for black-box models (Lemhadri et al., 2021) , aim to select a small subset of features to generate a predictive model with greater accuracy and/or interpretability. In our setting, the prediction model f is fixed, and we seek only to faithfully explain the predictions of that model, without regard to the unknowable data-generating process that created the features and response. Feature importance methods include popular permutation-based approaches introduced by Breiman (2001) for random forests, which were extended to generic black-box models by Fisher et al. (2019) and to a local method by Casalicchio et al. (2018) . These approaches also fix the prediction function f , but provide importance measures based on changes in the predictive performance of f as various features are ignored, permuted, or otherwise perturbed. That is, they consider the extent to which changing features impacts the ability of f to approximate some ground truth function f . By contrast, the term prediction importance emphasizes the singular role of the numerical outputs of f , setting aside how well these predictions approximate reality. The distinction between local prediction importance and local feature importance is not always made in the literature. However, it is relevant for a user who only cares about understanding the output of a given black-box model, and does not want prediction explanations conflated with the underlying signal the model is trying to approximate. Our proposed approach to local prediction importance is via region-based explanations (RbX). The method is "model-agnostic," meaning it does not use any knowledge about the structure of f , relying only on query access. We defer a detailed description of the algorithm to Section 3.1, but the main idea is to construct a polytope that approximates the region in feature space with prediction values "close" to the prediction at a target point x 0 (Section 2.1). We then argue that distances from x 0 to the boundaries of this polytope in directions parallel to the coordinate axes inform the local sensitivities of f to each feature in desirable ways according to our evaluation properties defined in Section 2.

1.1. PREVIOUS WORK

Existing approaches to local prediction importance can be broadly divided into two categories: surrogate methods and gradient-based methods. Surrogate methods locally approximate f by fitting a simpler prediction model that treats the predictions of f in a region near x 0 as the response. The weights assigned to each feature in this model are then used for local importance. For instance, LIME (Ribeiro et al., 2016) draws feature instances from a density centered at the target point x 0 and uses a linear surrogate. Lundberg & Lee (2017) propose Kernel SHAP (hereafter just SHAP), which they showed is an algorithmic approximation to fitting an additive surrogate model with weights corresponding to Shapley values. Gradient-based methods consider infinitesimal regions on the decision surface and use the resulting first-order approximation to derive local feature importance. For example, Baehrens et al. ( 2010) provide local prediction importances based on the absolute value of the components of the gradient vector ∇ f (x 0 ); their approach for estimating this gradient is by fitting a global surrogate model using Parzen windows. Integrated gradients (Sundararajan et al., 2017) considers the line integral of the components of the gradient of f over a straight line path in feature space from a baseline point x to x 0 . Other gradient methods are not model-agnostic. For instance, DeepLIFT (Shrikumar et al., 2017) relies on backpropagation to estimate gradients in neural networks.

2. WHY REGION-BASED EXPLANATIONS?

In general, the "ground truth" local prediction explanation for a given model f and target point x 0 is ill-defined. While the explanations from procedures like SHAP and IG are derived based on some particular set of axioms, evidently there is not a consensus as to which axioms are more "desirable". Thus, we choose to develop RbX based on two less restrictive but likely less controversial properties which we call sparsity and detection power: Property 1. (Sparsity) A feature not involved in the prediction model f is assigned no importance. Property 2. (Detection power) The locally relevant features for f are assigned highest importance. Sparsity requires that any feature that cannot change the predictions from f is assigned no importance. Of course, sparsity is not sufficient for a good local prediction importance method, though we view it as necessary. Conversely, a method that always assigns zero importance to every feature trivially satisfies sparsity, but fails to specify any potentially important features, hence the need for detection power. In designing RbX, we seek to maximize detection power while preserving sparsity. Ideas similar to Property 1 and 2 are not new. Indeed they are used by the authors of LIME and L2X (Chen et al., 2018) , a method that computes local feature scores by maximizing a variational relaxation of the mutual information between y and the features x encoded by a classifier f , to evaluate their methods. For instance, the experiments in Ribeiro et al. (2016) show that LIME does a better job than some baseline methods in finding the features used in sparse logistic regression models and decision trees. SHAP, IG, and other gradient methods satisfy sparsity axiomatically, yet LIME and L2X do not. While there is some subjectivity in the definition of locally relevant, we believe a reasonable sufficient condition for local relevance of feature j is for the j-th component of ∇ f (x 0 ), the gradient of the prediction at x 0 , to be nonzero. Then in the case that f is an additive regression model, Property 1 corresponds to assigning zero importance to all features with zero coefficients, while Property 2 means assigning nonzero importance to all other features. A simple gradient-based method using finite differences would then always perfectly satisfy both properties, as the set of features with nonzero gradients would always be precisely the relevant features. By contrast, LIME only does this 90%-92% of the time in the experiments from Ribeiro et al. (2016) .

