A SAMPLE-BASED METHOD FOR SEMANTIC UNDER-STANDING OF NEURAL NETWORK DECISIONS Anonymous authors Paper under double-blind review

Abstract

Interpretability in deep learning is one of the largest obstacles to its more widespread adoption in critical applications. A variety of methods have been introduced to understand and explain decisions made by Deep Models. A class of these methods highlights which features are most influential to model predictions. These methods have some key weaknesses. First, most of these methods are applicable only to the atomic elements that make up raw inputs to the model (e.g. pixels or words). Second, these methods generally do not distinguish between the importance of features individually and their importance due to interactions with other features. As a result, it is difficult to explore high-level questions about how models use features during decision-making. We tackle these issues by proposing Sample-Based Semantic Analysis (SBSA). We use Sobol variance decomposition as our sample-based method which allows us to quantify the importance of semantic combinations of raw inputs and highlight the extent to which these features are important individually as opposed to due to interactions with other features. We demonstrate the ability of Sobol-SBSA to answer a richer class of questions about the behavior of Deep Learning models by exploring how CNN models from AlexNet to DenseNet use regions when classifying images. We present three key findings. 1) The architectural improvements from AlexNet to DenseNet manifested themselves in CNN models utilizing greater levels of region interactions for predictions. 2) These same architectural improvements increased the importance that CNN models placed on the background of images 3) Adversarially robust CNNs reduce the reliance of modern CNNs on both interactions and image background. Our proposed method is generalizable to a wide variety of network and input types and can help provide greater clarity about model decisions.

1. INTRODUCTION

Deep learning models are becoming endemic in various applications. As models are increasingly used for critical applications in medicine such as detecting lung nodules (Schultheiss et al., 2021) or autonomous driving (Li et al., 2021) , it is important to either create interpretable models or to make opaque models human interpretable. This paper focuses on the latter. Existing methods developed over the last decade for doing this can be broken down into model agnostic vs model dependent. Model agnostic methods, such as Shapley values (Kononenko et al., 2013) and Integrated Gradients (Sundararajan et al., 2017) weigh the importance of input features without relying on the structure of the model. In contrast, methods such as GradCam (Selvaraju et al., 2017) and GradCam++ (Chattopadhay et al., 2018) are heavily dependent on model architecture. While these methods yield valuable information about models, they share common gaps. First, they do not distinguish between the features in input space that are individually important and features that are important because of their interaction with other features. Second, the above methods are generally applied to inputs at their most granular level (pixels, words, etc..) . The combination of these gaps limits the conclusions that Machine Learning practitioners can make about the behavior of models as a whole. We address these limitations in two key ways. First, we introduce a two-part framework called Sample-Based Semantic Analysis (SBSA). The first part of the framework is a function that generates semantic representations of inputs and associates these semantic representations with real numbers. The second part of the framework is a black-box sample-based sensitivity method. In this case, the Sobol method which reports the importance of individual features and their interactions. Second, we demonstrate the ability of Sobol-SBSA to answer a richer set of questions than standard interpretability methods by applying it to CNN models in the context of ImageNet. The key results and contributions of this paper are as follows: 1. We present a general-purpose framework for using sample-based sensitivity methods to analyze the importance of semantic representations of inputs and test it using a variety of black-box methods. 2. We demonstrate that the Sobol method outperforms other popular black box methods, Integrated Gradients, Shapley (Kernel Shap), and LIME, for selecting both the most and least important regions to CNN predictions. 3. We show, through direct measurement, that the main impacts of the evolution of CNN architectures were increasing the extent to which they used region interactions and by which they relied on background information in images. Similarly, we show that adversarially robust versions of CNNs reduce both of these effects for modern CNNs. To our knowledge, Sobol-SBSA is the first pipeline to facilitate the direct measurement of such trends, and to do so within a single pipeline.

2. METHODOLOGY

In this section, we describe the two components of SBSA and specify how we use it to analyze the importance of image regions in ImageNet. In particular, we describe how we associate image regions to quantities that can be analyzed with a sampling-based method, and the specifics of Sobol as a sampling-based sensitivity method.

2.1. SAMPLE-BASED SEMANTIC ANALYSIS (SBSA)

Let us define the following variables. x ∈ R d is an input to a model, f : x → y ∈ R s is a model that takes x as an argument and produces y, x[i] ∈ R d is a sample of x, and N ∈ Z is a prescribed integer that helps to determine the number of x[i] samples generated. Most sample-based sensitivity methods operate by generating a number of samples that is some function of N and d. The model is then evaluated on these samples and the resulting model outputs are used by Sensitivity analysis methods, such as Sobol, to determine the importance of components of x to the model output, y. One thing that immediately becomes clear is that for deep learning applications with highdimensional inputs, such as images, videos, and long documents, applying this process naively is prohibitively expensive. This issue can be greatly minimized if one turns to semantic representations of inputs instead. In this paper, a semantic representation of an input is defined as follows. A semantic representation of an input, x, is some combination of the raw components of that input which yields a human recognizable higher order feature, such as the colors in an image, image regions, or grammatical parts of sentences. We define this semantic representation as {S 1 , . . . S l }, S k ∈ R m , where m < d. Recalling that most sample-based sensitivity methods operate on real numbers, we define three mapping objects. G : x → {S 1 , . . . , S l }, G -1 : {S 1 , . . . , S l } →≈ x x ∈ R d , S k ∈ R m , l < d, m < d (1) H : {S 1 , . . . , S l }, → {r 1 , . . . , r l }, S k ∈ R m , r k ∈ R, (2) R : {(r 1 , S 1 ), . . . , (r l , S l )) → {S * 1 , . . . , S * l }, S * k ∈ R m G maps the raw input, x, to l semantic representations, S k , H associates the semantic representation to some lower dimension vector of real numbers, r ∈ R l , and R creates new semantic representations based on r k and S k . G is invertible. SBSA generates samples of r, [r [1] , . . . , r[n] ]. From these samples, R is used to generate samples of the original semantic representations, R r [i] k , S k = S[i] k ,

