OF DEEP NEURAL NET-WORKS WITH APPLICATIONS TO SAFETY

Abstract

To apply an algorithm in a sensitive domain it is important to understand the set of input values that result in specific decisions. Deep neural networks suffer from an inherent instability that makes this difficult: different outputs can arise from very similar inputs. We present a method to check that the decisions of a deep neural network are as intended by constructing the exact, analytical preimage of its predictions. Preimages generalize verification in the sense that they can be used to verify a wide class of properties, and answer much richer questions besides. We examine the functioning and failures of neural networks used in robotics, including an aircraft collision avoidance system, related to sequential decision making and extrapolation. Our method iterates backwards through the layers of piecewise linear deep neural networks. Uniquely, we compute all intermediate values that correspond to a prediction, propagating this calculation through layers using analytical formulae for layer preimages.

1. INTRODUCTION

Folk wisdom holds that although deep neural networks (DNNs) can achieve excellent predictive accuracy, reasoning about their performance is difficult, even for experts. Our goal is to enable nonexpert stakeholders, such as clinical health workers, investors, or military commanders to build trust a statistical model in high-stakes environments. To do this, we posit that decisionmakers want to understand a model in both directions, both from inputs to outputs, but also being able to start with hypothetical outputs, and understand the inputs that lead to them. In this paper, we develop an equivalent, but much simpler, representation of a certain class of DNN classifiers. This representation, which requires only a basic numeracy to productively interact with, can be used by domain experts to build intuition and trust. We apply this method to a reinforcement learning agent trained to solve the cart-pole problem, and find that a DNN implementing a successful policy makes a particular type of mistake on 24% of the mass of the 1/8th of the state space for which we know the optimal action (Section 3.2). We also show how using the preimage in place of verification can yield a more efficient and interpretable end-to-end system for analyzing aircraft collision avoidance systems (Section 3.3).

1.1. PREVIOUS WORK

DNNs have the property that knowing the output tells us very little about the input it corresponds to. This is most apparent in image classifiers, where totally different outputs can arise from inputs that are visually indistinguishable (Szegedy et al. (2014) ). We build upon the mathematical framework developed for verifying DNNs that grew out of a desire to prove the absence of adversarial examples, for example Tjeng et al. (2017) and Wong & Kolter (2017) . However, we depart from these studies along with Katz et al. (2017) , being more oriented towards small DNNs that map to and from lowdimensional spaces with considerable structure. These DNNs arise especially in systems which interoperate with the physical world, for example mapping measurements of positions and velocities to movements. Table 1 orients our work to the literature.

