QUANTIFYING STATISTICAL SIGNIFICANCE OF NEU-RAL NETWORK REPRESENTATION-DRIVEN HYPOTHE-SES BY SELECTIVE INFERENCE Anonymous

Abstract

In the past few years, various approaches have been developed to explain and interpret deep neural network (DNN) representations, but it has been pointed out that these representations are sometimes unstable and not reproducible. In this paper, we interpret these representations as hypotheses driven by DNN (called DNN-driven hypotheses) and propose a method to quantify the reliability of these hypotheses in statistical hypothesis testing framework. To this end, we introduce Selective Inference (SI) framework, which has received much attention in the past few years as a new statistical inference framework for data-driven hypotheses. The basic idea of SI is to make conditional inferences on the selected hypotheses under the condition that they are selected. In order to use SI framework for DNN representations, we develop a new SI algorithm based on homotopy method which enables us to derive the exact (non-asymptotic) conditional sampling distribution of the DNN-driven hypotheses. In this paper, we demonstrate the proposed method in computer vision tasks as practical examples. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method can successfully control the false positive rate, has decent performance in terms of computational efficiency, and provides good results in practical applications.

1. INTRODUCTION

The remarkable predictive performance of deep neural networks (DNNs) stems from their ability to learn appropriate representations from data. In order to understand the decision-making process of DNNs, it is thus important to be able to explain and interpret DNN representations. For example, in image classification tasks, knowing the attention region from DNN representation allows us to understand the reason for the classification. In the past few years, several methods have been developed to explain and interpret DNN representations (Ribeiro et al., 2016; Bach et al., 2015; Doshi-Velez & Kim, 2017; Lundberg & Lee, 2017; Zhou et al., 2016; Selvaraju et al., 2017) ; however, some of them have turned out to be unstable and not reproducible (Kindermans et al., 2017; Ghorbani et al., 2019; Melis & Jaakkola, 2018; Zhang et al., 2020; Dombrowski et al., 2019; Heo et al., 2019) . Therefore, it is crucially important to develop a method to quantify the reliability of DNN representations. In this paper, we interpret these representations as hypotheses that are driven by DNN (called DNNdriven hypotheses) and employ statistical hypothesis testing framework to quantify the reliability of DNN representations. For example, in an image classification task, the reliability of an attention region can be quantified based on the statistical significance of the difference between the attention region and the rest of the image. Unfortunately, however, traditional statistical test cannot be applied to this problem because the hypothesis (attention region in the above example) itself is selected by the data. Traditional statistical test is valid only when the hypothesis is non-random. Roughly speaking, if a hypothesis is selected by the data, the hypothesis will over-fit to the data and the bias needs to be corrected when assessing the reliability of the hypothesis. Our main contribution in this paper is to introduce Selective Inference (SI) approach for testing the reliability of DNN representations. The basic idea of SI is to perform statistical inference under the condition that the hypothesis is selected. SI approach has been demonstrated to be effective in the context of feature selections such as Lasso. In this paper, in order to introduce SI for DNN representations, we develop a novel SI algorithm based on homotopy method, which enables us to derive the exact (non-asymptotic) conditional sampling distribution of the DNN-driven hypothesis. We use p-value as a criterion to quantify the reliability of DNN representation. In the literature, pvalues are often misinterpreted and there are various source of mis-interpretation has been discussed (Wasserstein & Lazar, 2016) . In this paper, by using SI, we address one of the sources of misinterpreted p-values; the p-values are biased when the hypothesis is selected after looking at the data (often called double-dipping or data dredging). We believe our approach is a first significant step to provide valid p-values for assessing the reliability of DNN representations. Figure 1 shows an example that illustrates the importance of our method. Related works. Several recent approaches have been developed to visualize and understand a trained DNN. Many of these post-hoc approaches (Mahendran & Vedaldi, 2015; Zeiler & Fergus, 2014; Dosovitskiy & Brox, 2016; Simonyan et al., 2013) have focused on developing visualization tools for the activation maps and/or the filter weights within trained networks. Others have aimed to identify the discriminative regions in an input image, given a trained network (Selvaraju et al., 2017; Fong & Vedaldi, 2017; Zhou et al., 2016; Lundberg & Lee, 2017) . In parallel, some recent studies have showed that many popular methods for explanation and interpretation are not stable with respect to the perturbation or the adversarial attack on the input data and the model (Kindermans et al., 2017; Ghorbani et al., 2019; Melis & Jaakkola, 2018; Zhang et al., 2020; Dombrowski et al., 2019; Heo et al., 2019) . However, there are no previous studies that quantitatively evaluate the stability and reproducibility of DNN representations with a rigorous statistical inference framework. In the past few years, SI has been actively studied for inference on the features of linear models selected by several feature selection methods, e.g., Lasso (Lee et al., 2016; Liu et al., 2018; Duy & Takeuchi, 2020) . The basic idea of SI is to make inference conditional on the selection event, which allows us to derive the exact (non-asymptotic) sampling distribution of the test statistic. Besides, SI has also been applied to various problems (Bachoc et al., 2014; Fithian et al., 2015; Choi et al., 2017; Tian et al., 2018; Chen & Bien, 2019; Hyun et al., 2018; Bachoc et al., 2018; Loftus & Taylor, 2014; Loftus, 2015; Panigrahi et al., 2016; Tibshirani et al., 2016; Yang et al., 2016; Suzumura et al., 2017; Duy et al., 2020) . However, to the best of our knowledge, there is no existing study that provides SI for DNNs, which is technically challenging. This study is partly motivated by Tanizaki et al. (2020) where the authors provide a framework to compute p-values for image segmentation results provided by graph cut and threshold-based segmentation algorithms. As we demonstrate in this paper, our method can be also used to assess the reliability of DNN-based segmentation results.



Figure 1: Examples of the proposed method on brain tumor image classification. Given a CNN trained to classify tumor versus non-tumor brain images in advance, our method provides the statistical significance of the attention region for each test image in the form of p-values by comparing the pixel information in the attention and non-attention regions. Since the attention region is selected by the input image, the p-value obtained by the naive comparison of the two regions (naive p-value) is highly biased. In the left-hand side figure where there is no brain tumor, the naive p-value is nearly zero (indicating false positive-incorrectly identifying tumor region), while the proposed selective p-value is large (indicating true negative). On the other hand, in the right-hand side figure where there actually exist a brain tumor, both the naive p-value and the selective p-values are very small (indicating true positive). The proposed selective inference method can provide valid exact (non-asymptotic) p-values for DNN representations such as attentions.

