VALID P -VALUE FOR DEEP LEARNING-DRIVEN SALIENT REGION

Abstract

Various saliency map methods have been proposed to interpret and explain predictions of deep learning models. Saliency maps allow us to interpret which parts of the input signals have a strong influence on the prediction results. However, since a saliency map is obtained by complex computations in deep learning models, it is often difficult to know how reliable the saliency map itself is. In this study, we propose a method to quantify the reliability of a salient region in the form of p-values. Our idea is to consider a salient region as a selected hypothesis by the trained deep learning model and employ the selective inference framework. The proposed method can provably control the probability of false positive detections of salient regions. We demonstrate the validity of the proposed method through numerical examples in synthetic and real datasets. Furthermore, we develop a Keras-based framework for conducting the proposed selective inference for a wide class of CNNs without additional implementation cost.

1. INTRODUCTION

Deep neural networks (DNNs) have exhibited remarkable predictive performance in numerous practical applications in various domains owing to their ability to automatically discover the representations needed for prediction tasks from the provided data. To ensure that the decision-making process of DNNs is transparent and easy to understand, it is crucial to effectively explain and interpret DNN representations. For example, in image classification tasks, obtaining salient regions allows us to explain which parts of the input image strongly influence the classification results. Several saliency map methods have been proposed to explain and interpret the predictions of DNN models (Ribeiro et al., 2016; Bach et al., 2015; Doshi-Velez & Kim, 2017; Lundberg & Lee, 2017; Zhou et al., 2016; Selvaraju et al., 2017) . However, the results obtained from saliency methods are fragile (Kindermans et al., 2017; Ghorbani et al., 2019; Melis & Jaakkola, 2018; Zhang et al., 2020; Dombrowski et al., 2019; Heo et al., 2019) . Therefore, it is important to develop a method for quantifying the reliability of DNN-driven salient regions. Our idea is to interpret salient regions as hypotheses driven by a trained DNN model and employ a statistical hypothesis testing framework. We use the p-value as a criterion to quantify the statistical reliability of the DNN-driven hypotheses. Unfortunately, constructing a valid statistical test for DNN-driven salient regions is challenging because of the selection bias. In other words, because the trained DNN selects the salient region based on the provided data, the post-selection assessment of importance is biased upwards. To correct the selection bias and compute valid p-values for DNN-driven salient regions, we introduce a conditional selective inference (SI) approach. The selection bias is corrected by conditional ⇤ Equal contribution † Corresponding author 1

