OUTPUT DISTRIBUTION OVER THE ENTIRE INPUT SPACE: A NOVEL PERSPECTIVE TO UNDERSTAND NEURAL NETWORKS

Abstract

Understanding the input-output mapping relationship in the entire input space contributes a novel perspective to a comprehensive understanding of deep neural networks. In this paper, we focus on binary neural classifiers and propose to first uncover the histogram about the number of inputs that are mapped to certain output values and then scrutinize the representative inputs from a certain output range of interest, such as the positive-logit region that corresponds to one of the classes. A straightforward solution is uniform sampling (or exhaustive enumeration) in the entire input space but when the inputs are high dimensional, it can take almost forever to converge. We connect the output histogram to the density of states in physics by making an analogy between the energy of a system and the neural network output. Inspired by the Wang-Landau algorithm designed for sampling the density of states, we propose an efficient sampler that is driven to explore the under-explored output values through a gradient-based proposal. Compared with the random proposal in Wang-Landau algorithm, our gradientbased proposal converges faster as it can propose the inputs corresponding to the under-explored output values. Extensive experiments have verified the accuracy of the histogram generated by our sampler and also demonstrated interesting findings. For example, the models map many human unrecognizable images to very negative logit values. These properties of a neural model are revealed for the first time through our sampled statistics. We believe that our approach opens a new gate for neural model evaluation and shall be further explored in future works.

1. INTRODUCTION

Understanding the input-output mapping relationship in the entire input space contributes a novel perspective to a comprehensive understanding of deep neural networks. Existing methods approximate such mapping relations through the evaluation on a certain subset of the entire input space, such as measuring the accuracy on in-distribution test sets Dosovitskiy et al. ( 2021 2015), out-ofdistribution (OOD) test sets (Liu et al., 2020; Hendrycks & Gimpel, 2016; Hendrycks et al., 2019; Hsu et al., 2020; Lee et al., 2017; 2018) , and adversarial test sets Szegedy et al. (2013); Rozsa et al. (2016); Miyato et al. (2018); Kurakin et al. (2016) . However, none of the existing evaluations can offer a comprehensive understanding that covers the entire input space, including all kinds of inputs mentioned above and even those human unrecognizable inputs as shown in Fig 1a . As a pilot study, we focus on binary classification -given a trained binary classifier, we aim to uncover a histogram that counts how many samples in the entire input space are mapped to certain logit values, i.e., the distribution of the output values, as shown in Fig 1b . A straightforward solution is uniform sampling (or exhaustive enumeration) in the entire input space but when the inputs are high dimensional, it can take almost forever to converge. Therefore, it calls for a novel efficient sampling method over a neural model's output space. Note that, as a side product of the sampling procedure, one can expect that this histogram also offers fine-grained information such as some representative input samples corresponding to a certain range of output values. With the help of this new sampler, we can reveal some new understanding of the models for the entire input space. First, in our experiments on a real-world dataset, the dominant output values are very negative and correspond to the human-unrecognizable inputs. This indicates the models may map an overwhelmingly large number of unrecognizable images to the overconfident prediction probabilities. Second, we can derive the relative difference between the dominant peak of output values and the other output values, especially those where the in-distribution inputs correspond to. The output values where the in-distribution inputs correspond to are also dominated by the humanunrecognizable inputs. This result presents significant challenges to the OOD detection problems. Third, we observe a clear trend of the representative samples in a CNN model and speculate it simply utilizes the background to predict the labels of the digits. Our contributions are summarized as follows. • We work on the challenging problem to uncover the output distribution over the entire input space. Such output distributions offer a novel perspective to understand deep neural networks. • We connect this output distribution problem to the density of states problem in physic and successfully tailor Wang-Landau algorithm using a gradient-based proposal, which is a must-have component to sample the entire output space as much as possible, improving the efficiency. • We conduct extensive experiments on toy and real-world datasets based on to confirm the correctness of our proposed sampler and discover novel and interesting findings. We believe that our approach opens a new gate for neural model evaluation and shall be further explored in future works. For example, one can can utilize our sampler to estimate the intrinsic ratio of in-distribution samples given a range of interest with human evaluation as shown in Sec. 5.3.



); Tolstikhin et al. (2021); Steiner et al. (2021); Chen et al. (2021); Zhuang et al. (2022); He et al. (

Figure1: Input types and the example output histogram when the task is binary classification between digits 0 and 1. The entire input space covers all possible gray-scale images of the same shape. y is the output (logit) of input x.

