BEYOND SINGLE PATH INTEGRATED GRADIENTS FOR RELIABLE INPUT ATTRIBUTION VIA RANDOMIZED PATH SAMPLING

Abstract

Input attribution is a widely used explanation method for deep neural networks, especially in visual tasks. Among various attribution methods, Integrated Gradients (IG) (Sundararajan et al., 2017) is frequently used because of its model-agnostic applicability and desirable axioms. However, previous work (Smilkov et al., 2017; Kapishnikov et al., 2019; 2021) has shown that such method often produces noisy and unreliable attributions during the integration of the gradients over the path defined in the input space. In this paper, we tackle this issue by estimating the distribution of the possible attributions according to the integrating path selection. We show that such noisy attribution can be reduced by aggregating attributions from the multiple paths instead of using a single path. Inspired by Stick-Breaking Process (Sethuraman, 1991), we suggest a random process to generate rich and various sampling of the gradient integrating path. Using multiple input attributions obtained from randomized path, we propose a novel attribution measure using the distribution of attributions at each input features. We identify proposed method qualitatively show less-noisy and object-aligned attribution and its feasibility through the quantitative evaluations.

1. INTRODUCTION

Along with the steep improvement and the real world application of the deep learning models (Caruana et al., 2015; Yurtsever et al., 2020) , discovering the evidence of the black-box model decision is considered to be important for debugging the malfunction (Lapuschkin et al., 2019) and promise the safety and the fairness (Doshi-Velez & Kim, 2017) of the models. Within the vast literature of explaining the decision of the deep models, input attribution (Simonyan et al., 2013; Bach et al., 2015; Shrikumar et al., 2016; Sundararajan et al., 2017) is one of widely used methods to quantify the relative contribution of each features to the model output. Input attribution provides the explanation in the form of heatmaps, which is useful to indicate the spatial existence of evidences, especially in visual tasks. Among various approaches to compute the input attributions, Integrated Gradient (IG), one of widely used methods, and its variants (Sundararajan et al., 2017; Pan et al., 2021; Kapishnikov et al., 2021) are of particular interest in our work. These methods explore the input space along the predefined path and integrate the gradients to provide the reliable attributions. The integration path of such methods consists of a baseline which represents the missingness of features and a connecting line between the input and the baseline. With different desired properties, various paths can be used to compute the attribution. For example, Guided IG (Kapishnikov et al., 2021) proposes the adaptive path to alleviate the high and noisy gradients unrelated to the prediction. The selection of baseline can also affect to the attribution results ( Štrumbelj & Kononenko, 2014) . While the above methods address the importance of selecting appropriate integration path, in this paper, we claim that the single path is not reliable enough to interpret the decision of neural networks. We provide a simple example that the attribution computed by a single path provides high variance according to different path selection. For better reliability, we propose a novel attribution method to take the expectation of the path-integrated attribution over the distribution of possible paths. To sample from the distribution over the vast variety of possible paths, we adopt the notion of Stick- From the sampled paths, we apply the gradient integration along each path to gather the multiple attribution samples. By taking the average, the attribution of SPI can be obtained. Breaking Process, which is one sort of stochastic processes that samples the probability distribution. The main contributions of our work are summarized as, • Address the inconsistency of attribution according to the selection of the integration path, and propose a novel attribution method that takes the expectation over the distribution of random paths to retain the reliability of attribution. • Propose a sampling method to generate a random integration path inspired by the Stick-Breaking Process. From the proposed method, we can generate the vast integration paths efficiently. • Evaluate the attribution in qualitative and quantitative measure to validate the reliability of the proposed method on various of architecture of the networks.

2. RELATED WORK

Attribution methods The input attribution methods aims to measure the relative sensitivity of the model output with respect to the input features. Saliency method (Simonyan et al., 2013) is a simple approach to use the gradient as attribution. Then Grad*Input method (Shrikumar et al., 2016) is proposed to multiply the input with the gradient for better input alignment. FullGrad (Srinivas & Fleuret, 2019) proposes to use the bias gradient in addition to Grad*Input. Guided Backpropagation (Springenberg et al., 2014) suggests to consider only features that positively contributes to the prediction by ignoring the negative backpropagated gradients. Layerwise Relevance Propagation (LRP) (Bach et al., 2015; Nam et al., 2020) is another method to modify the backward propagation. LRP proposes the relevance propagation rules based on the Taylor decomposition. There exists a family of attribution methods which do not require the access to the internal properties (e.g., gradients, parameters). LIME (Ribeiro et al., 2016) trains a surrogate linear model, which resembles the original model for the data that features are masked out from the input to be explained. RISE (Petsiuk et al., 2018) computes the attribution by aggregating the model outputs from the multiple random masked inputs. Path-based attribution methods Integrated Gradients (IG) (Sundararajan et al., 2017) is one of widely used input attribution method. It is built upon the game-theoretic notion of pay-off distribution method, Aumann-Shapley value (Aumann & Shapley, 2015) . IG has several desirable properties, called axioms, which supports the reliability of the attribution. IG is calculated by integrating the gradients along the path from the baseline to the input. Based on this work, several extended research has been performed. To reduce the noise in the attribution, SmoothGrad (Smilkov et al., 2017) takes the average of the multiple random noise added inputs and NoiseGrad (Bykov et al., 2021) inject



Figure 1: An illustration of Stick-breaking Path Integration (SPI) for the given input x.Using the realized distribution G from SBP, we randomly generate the integration path in the input domain by taking CDF of each distribution (colored lines in the left-bottom). From the sampled paths, we apply the gradient integration along each path to gather the multiple attribution samples. By taking the average, the attribution of SPI can be obtained.

