BEYOND SINGLE PATH INTEGRATED GRADIENTS FOR RELIABLE INPUT ATTRIBUTION VIA RANDOMIZED PATH SAMPLING

Abstract

Input attribution is a widely used explanation method for deep neural networks, especially in visual tasks. Among various attribution methods, Integrated Gradients (IG) (Sundararajan et al., 2017) is frequently used because of its model-agnostic applicability and desirable axioms. However, previous work (Smilkov et al., 2017; Kapishnikov et al., 2019; 2021) has shown that such method often produces noisy and unreliable attributions during the integration of the gradients over the path defined in the input space. In this paper, we tackle this issue by estimating the distribution of the possible attributions according to the integrating path selection. We show that such noisy attribution can be reduced by aggregating attributions from the multiple paths instead of using a single path. Inspired by Stick-Breaking Process (Sethuraman, 1991), we suggest a random process to generate rich and various sampling of the gradient integrating path. Using multiple input attributions obtained from randomized path, we propose a novel attribution measure using the distribution of attributions at each input features. We identify proposed method qualitatively show less-noisy and object-aligned attribution and its feasibility through the quantitative evaluations.

1. INTRODUCTION

Along with the steep improvement and the real world application of the deep learning models (Caruana et al., 2015; Yurtsever et al., 2020) , discovering the evidence of the black-box model decision is considered to be important for debugging the malfunction (Lapuschkin et al., 2019) and promise the safety and the fairness (Doshi-Velez & Kim, 2017) of the models. Within the vast literature of explaining the decision of the deep models, input attribution (Simonyan et al., 2013; Bach et al., 2015; Shrikumar et al., 2016; Sundararajan et al., 2017) is one of widely used methods to quantify the relative contribution of each features to the model output. Input attribution provides the explanation in the form of heatmaps, which is useful to indicate the spatial existence of evidences, especially in visual tasks. Among various approaches to compute the input attributions, Integrated Gradient (IG), one of widely used methods, and its variants (Sundararajan et al., 2017; Pan et al., 2021; Kapishnikov et al., 2021) are of particular interest in our work. These methods explore the input space along the predefined path and integrate the gradients to provide the reliable attributions. The integration path of such methods consists of a baseline which represents the missingness of features and a connecting line between the input and the baseline. With different desired properties, various paths can be used to compute the attribution. For example, Guided IG (Kapishnikov et al., 2021) proposes the adaptive path to alleviate the high and noisy gradients unrelated to the prediction. The selection of baseline can also affect to the attribution results ( Štrumbelj & Kononenko, 2014) . While the above methods address the importance of selecting appropriate integration path, in this paper, we claim that the single path is not reliable enough to interpret the decision of neural networks. We provide a simple example that the attribution computed by a single path provides high variance according to different path selection. For better reliability, we propose a novel attribution method to take the expectation of the path-integrated attribution over the distribution of possible paths. To sample from the distribution over the vast variety of possible paths, we adopt the notion of Stick-

