RBX: REGION-BASED EXPLANATIONS OF PREDICTION MODELS

Abstract

We introduce region-based explanations (RbX), a novel, model-agnostic method that uses only query access to quantify the sensitivity of the scalar predictions from a black-box model to local feature perturbations. RbX is based on a greedy algorithm for building a convex polytope that approximates a region of feature space where predictions are close to the prediction at some target point x 0 . The geometry of the learned polytope -specifically the change in each coordinate necessary to escape the polytope -then explains the local importance of each feature in changing the model predictions. In particular, these "escape distances" can be standardized and ordered to rank the features by local importance. The RbX method is informed by a goal of detecting as many relevant features as possible in locally sparse prediction models, without including any features that do not enter in to the model. We provide a real data example and synthetic experiments to illustrate the encouraging performance of RbX in this respect.

1. INTRODUCTION

Suppose we have a prediction model f (x) to estimate a scalar outcome y given a set of features x ∈ R d . We assume we do not have any knowledge about the functional form of f . Rather, we consider f as a black-box function to which we only have query access, meaning we may compute the value of f (x) for any desired input x. After making a prediction at a target point x 0 , we seek to quantify the local importance of each feature on the prediction. Perturbations in certain features will be more influential than others in changing predictions from f near x 0 . We would like a systematic way of identifying these features. We distinguish our problem, which we call local prediction importance, from the questions of feature selection and feature importance. Feature selection methods, such as the LASSO for linear models (Tibshirani, 1996) and modern extensions like LassoNet for black-box models (Lemhadri et al., 2021) , aim to select a small subset of features to generate a predictive model with greater accuracy and/or interpretability. In our setting, the prediction model f is fixed, and we seek only to faithfully explain the predictions of that model, without regard to the unknowable data-generating process that created the features and response. Feature importance methods include popular permutation-based approaches introduced by Breiman (2001) for random forests, which were extended to generic black-box models by Fisher et al. ( 2019) and to a local method by Casalicchio et al. (2018) . These approaches also fix the prediction function f , but provide importance measures based on changes in the predictive performance of f as various features are ignored, permuted, or otherwise perturbed. That is, they consider the extent to which changing features impacts the ability of f to approximate some ground truth function f . By contrast, the term prediction importance emphasizes the singular role of the numerical outputs of f , setting aside how well these predictions approximate reality. The distinction between local prediction importance and local feature importance is not always made in the literature. However, it is relevant for a user who only cares about understanding the output of a given black-box model, and does not want prediction explanations conflated with the underlying signal the model is trying to approximate. Our proposed approach to local prediction importance is via region-based explanations (RbX). The method is "model-agnostic," meaning it does not use any knowledge about the structure of f , relying

