VALID P -VALUE FOR DEEP LEARNING-DRIVEN SALIENT REGION

Abstract

Various saliency map methods have been proposed to interpret and explain predictions of deep learning models. Saliency maps allow us to interpret which parts of the input signals have a strong influence on the prediction results. However, since a saliency map is obtained by complex computations in deep learning models, it is often difficult to know how reliable the saliency map itself is. In this study, we propose a method to quantify the reliability of a salient region in the form of p-values. Our idea is to consider a salient region as a selected hypothesis by the trained deep learning model and employ the selective inference framework. The proposed method can provably control the probability of false positive detections of salient regions. We demonstrate the validity of the proposed method through numerical examples in synthetic and real datasets. Furthermore, we develop a Keras-based framework for conducting the proposed selective inference for a wide class of CNNs without additional implementation cost.

1. INTRODUCTION

Deep neural networks (DNNs) have exhibited remarkable predictive performance in numerous practical applications in various domains owing to their ability to automatically discover the representations needed for prediction tasks from the provided data. To ensure that the decision-making process of DNNs is transparent and easy to understand, it is crucial to effectively explain and interpret DNN representations. For example, in image classification tasks, obtaining salient regions allows us to explain which parts of the input image strongly influence the classification results. Several saliency map methods have been proposed to explain and interpret the predictions of DNN models (Ribeiro et al., 2016; Bach et al., 2015; Doshi-Velez & Kim, 2017; Lundberg & Lee, 2017; Zhou et al., 2016; Selvaraju et al., 2017) . However, the results obtained from saliency methods are fragile (Kindermans et al., 2017; Ghorbani et al., 2019; Melis & Jaakkola, 2018; Zhang et al., 2020; Dombrowski et al., 2019; Heo et al., 2019) . Therefore, it is important to develop a method for quantifying the reliability of DNN-driven salient regions. Our idea is to interpret salient regions as hypotheses driven by a trained DNN model and employ a statistical hypothesis testing framework. We use the p-value as a criterion to quantify the statistical reliability of the DNN-driven hypotheses. Unfortunately, constructing a valid statistical test for DNN-driven salient regions is challenging because of the selection bias. In other words, because the trained DNN selects the salient region based on the provided data, the post-selection assessment of importance is biased upwards. To correct the selection bias and compute valid p-values for DNN-driven salient regions, we introduce a conditional selective inference (SI) approach. The selection bias is corrected by conditional Note that, since the salient region is selected based on the data, the degree of saliency in the selected region is biased upward. In the upper image where there is no true brain tumor, the naive p-value which is obtained without caring about the selection bias is nearly zero, indicating the false positive finding of the salient region. On the other hand, the selective p-value which is obtained by the proposed conditional SI approach is 0.43, indicating that the selected saliency region is not statistically significant. In the lower image where there is a true brain tumor, both the naive p-value and the selective p-value are very small, which indicate a true positive finding. These results illustrate that naive p-value cannot be used to quantify the reliability of DNN-based salient region. In contrast, with the selective p-values, we can successfully identify false positive and true positive detections with a desired error rate. SI in which the test statistic conditional on the event that the hypotheses (salient regions) are selected using the trained DNNs is considered. Our main technical contribution is to develop a method for explicitly deriving the exact (non-asymptotic) conditional sampling distribution of the salient region for a wide class convolutional neural networks (CNNs), which enables us to conduct conditional SI and compute valid p-values. Figure 1 presents an example of the problem setup. Related works. In this study, we focus on statistical hypothesis testing for post-hoc analysis, i.e., quantifying the statistical significance of the salient regions identified in a trained DNN model when a test input instance is fed into the model. Several methods have been developed to visualize and understand trained DNNs. Many of these post-hoc approaches (Mahendran & Vedaldi, 2015; Zeiler & Fergus, 2014; Dosovitskiy & Brox, 2016; Simonyan et al., 2013) have focused on developing visualization tools for saliency maps given a trained DNN. Other methods have aimed to identify the discriminative regions in an input image given a trained network (Selvaraju et al., 2017; Fong & Vedaldi, 2017; Zhou et al., 2016; Lundberg & Lee, 2017) . However, some recent studies have shown that many of these saliency methods of these saliency methods are not stable against a perturbation or adversarial attack on the input data and model (Kindermans et al., 2017; Ghorbani et al., 2019; Melis & Jaakkola, 2018; Zhang et al., 2020; Dombrowski et al., 2019; Heo et al., 2019) . To the best of our knowledge, no study to date has succeeded in quantitatively evaluating the reproducibility of DNN-driven salient regions with a rigorous statistical inference framework. In recent years, conditional SI has emerged as a promising approach for evaluating the statistical reliability of data-driven hypotheses. It has been actively studied for making inferences on the features of linear models selected by various feature selection methods, such as Lasso (Lee et al., 2016) . The main concept behind conditional SI is to make inference based on the sampling distribution of the test statistic conditional on a selection event. This approach allows us to derive the exact sampling distribution of the test statistic. After the seminal work of Lee et al. (2016) , conditional SI has also



Figure1: Examples of the problem setup and the proposed method on the brain tumor dataset. By applying a saliency method called CAM(Zhou et al., 2016)  on a query input image, we obtain the salient region. Our goal is to provide the statistical significance of the salient region in the form of p-value by considering two-sample test between the salient region and the corresponding region in the reference image. Note that, since the salient region is selected based on the data, the degree of saliency in the selected region is biased upward. In the upper image where there is no true brain tumor, the naive p-value which is obtained without caring about the selection bias is nearly zero, indicating the false positive finding of the salient region. On the other hand, the selective p-value which is obtained by the proposed conditional SI approach is 0.43, indicating that the selected saliency region is not statistically significant. In the lower image where there is a true brain tumor, both the naive p-value and the selective p-value are very small, which indicate a true positive finding. These results illustrate that naive p-value cannot be used to quantify the reliability of DNN-based salient region. In contrast, with the selective p-values, we can successfully identify false positive and true positive detections with a desired error rate.

