OUTPUT DISTRIBUTION OVER THE ENTIRE INPUT SPACE: A NOVEL PERSPECTIVE TO UNDERSTAND NEURAL NETWORKS

Abstract

Understanding the input-output mapping relationship in the entire input space contributes a novel perspective to a comprehensive understanding of deep neural networks. In this paper, we focus on binary neural classifiers and propose to first uncover the histogram about the number of inputs that are mapped to certain output values and then scrutinize the representative inputs from a certain output range of interest, such as the positive-logit region that corresponds to one of the classes. A straightforward solution is uniform sampling (or exhaustive enumeration) in the entire input space but when the inputs are high dimensional, it can take almost forever to converge. We connect the output histogram to the density of states in physics by making an analogy between the energy of a system and the neural network output. Inspired by the Wang-Landau algorithm designed for sampling the density of states, we propose an efficient sampler that is driven to explore the under-explored output values through a gradient-based proposal. Compared with the random proposal in Wang-Landau algorithm, our gradientbased proposal converges faster as it can propose the inputs corresponding to the under-explored output values. Extensive experiments have verified the accuracy of the histogram generated by our sampler and also demonstrated interesting findings. For example, the models map many human unrecognizable images to very negative logit values. These properties of a neural model are revealed for the first time through our sampled statistics. We believe that our approach opens a new gate for neural model evaluation and shall be further explored in future works.

1. INTRODUCTION

Understanding the input-output mapping relationship in the entire input space contributes a novel perspective to a comprehensive understanding of deep neural networks. Existing methods approximate such mapping relations through the evaluation on a certain subset of the entire input space, such as measuring the accuracy on in-distribution test sets Dosovitskiy et al. ( 2021 2015), out-ofdistribution (OOD) test sets (Liu et al., 2020; Hendrycks & Gimpel, 2016; Hendrycks et al., 2019; Hsu et al., 2020; Lee et al., 2017; 2018) 2016). However, none of the existing evaluations can offer a comprehensive understanding that covers the entire input space, including all kinds of inputs mentioned above and even those human unrecognizable inputs as shown in Fig 1a . As a pilot study, we focus on binary classification -given a trained binary classifier, we aim to uncover a histogram that counts how many samples in the entire input space are mapped to certain logit values, i.e., the distribution of the output values, as shown in Fig 1b . A straightforward solution is uniform sampling (or exhaustive enumeration) in the entire input space but when the inputs are high dimensional, it can take almost forever to converge. Therefore, it calls for a novel efficient sampling method over a neural model's output space. Note that, as a side product of the sampling procedure, one can expect that this histogram also offers fine-grained information such as some representative input samples corresponding to a certain range of output values.



); Tolstikhin et al. (2021); Steiner et al. (2021); Chen et al. (2021); Zhuang et al. (2022); He et al. (

, and adversarial test sets Szegedy et al. (2013); Rozsa et al. (2016); Miyato et al. (2018); Kurakin et al. (

