OF DEEP NEURAL NET-WORKS WITH APPLICATIONS TO SAFETY

Abstract

To apply an algorithm in a sensitive domain it is important to understand the set of input values that result in specific decisions. Deep neural networks suffer from an inherent instability that makes this difficult: different outputs can arise from very similar inputs. We present a method to check that the decisions of a deep neural network are as intended by constructing the exact, analytical preimage of its predictions. Preimages generalize verification in the sense that they can be used to verify a wide class of properties, and answer much richer questions besides. We examine the functioning and failures of neural networks used in robotics, including an aircraft collision avoidance system, related to sequential decision making and extrapolation. Our method iterates backwards through the layers of piecewise linear deep neural networks. Uniquely, we compute all intermediate values that correspond to a prediction, propagating this calculation through layers using analytical formulae for layer preimages.

1. INTRODUCTION

Folk wisdom holds that although deep neural networks (DNNs) can achieve excellent predictive accuracy, reasoning about their performance is difficult, even for experts. Our goal is to enable nonexpert stakeholders, such as clinical health workers, investors, or military commanders to build trust a statistical model in high-stakes environments. To do this, we posit that decisionmakers want to understand a model in both directions, both from inputs to outputs, but also being able to start with hypothetical outputs, and understand the inputs that lead to them. In this paper, we develop an equivalent, but much simpler, representation of a certain class of DNN classifiers. This representation, which requires only a basic numeracy to productively interact with, can be used by domain experts to build intuition and trust. We apply this method to a reinforcement learning agent trained to solve the cart-pole problem, and find that a DNN implementing a successful policy makes a particular type of mistake on 24% of the mass of the 1/8th of the state space for which we know the optimal action (Section 3.2). We also show how using the preimage in place of verification can yield a more efficient and interpretable end-to-end system for analyzing aircraft collision avoidance systems (Section 3.3).

1.1. PREVIOUS WORK

DNNs have the property that knowing the output tells us very little about the input it corresponds to. This is most apparent in image classifiers, where totally different outputs can arise from inputs that are visually indistinguishable (Szegedy et al. (2014) ). We build upon the mathematical framework developed for verifying DNNs that grew out of a desire to prove the absence of adversarial examples, for example Tjeng et al. (2017) and Wong & Kolter (2017) . However, we depart from these studies along with Katz et al. (2017) , being more oriented towards small DNNs that map to and from lowdimensional spaces with considerable structure. These DNNs arise especially in systems which interoperate with the physical world, for example mapping measurements of positions and velocities to movements.  (f, X, Y ) → 1 f -1 (Y )∩X=∅ (= 1 f (X)∩Y =∅ ) Wong & Kolter (2017) Reachability (f, X) → f (X) Yang et al. (2020) Inversion (f, y) → f -1 ({y}) Carlsson et al. (2017) Preimage (f, Y ) → f -1 (Y ) → R n L is a DNN, X ⊆ R n1 , x ∈ R n1 , Y ⊆ R n L , and y ∈ R n L . f -1 is its inverse in the sense that f -1 (Y ) = {x : f (x) ∈ Y }. We have phrased verification in this unusual fashion to facilitate comparison with the other points. Stated in the familiar application to image classifiers X would be an epsilon ball around an input, and Y would be the halfspace where one coordinate is higher than all others. Verification ultimately amounts to a simple yes or no, and so answering higher-level questions typically requires many verifications: for example, Katz et al. ( 2017) describes a suite of 45 tests, and image classifiers often wish to verify the absence of adversarial examples around the entire training set. Yang et al. ( 2020) is an interesting extension to verification in that it computes the entire image of, say, an epsilon ball around a data point, and not just whether it intersects with a decision boundary. Reasoning forward, about the outputs that can arise from inputs, is only half of the picture. analyzes the preimage of a single point through the repeated application of a nonlinearity, purely theoretically. Our paper looks at the preimage of non-singleton subsets of the codomain, which is much more practically useful, and requires considerable extension to their approaches.

2. METHOD

Our method is easily stated: build up the preimage of a DNN from the preimage of its layers, using simple analytical formulae. We start by developing some properties of the preimage operator, then we describe the class of sets that we compute the preimage of, and finally we discuss the class of DNNs that our algorithm addresses.

2.1. PROPERTIES OF PREIMAGES

Lemma 1 shows how to build up the preimage of a DNN from the preimages of its constitutent layers. Lemma 1 (Preimage of composition is reversed composition of preimages). For functions f j : R nj → R nj+1 , (f +k • f +k-1 • . . . • f ) -1 = f -1 • . . . • f -1 +k-1 • f -1 +k . Secondly, we mention an intuitive property of f -1 that is handy for building up the preimage of any set from the preimages of any partition of that set. Lemma 2 (Preimage of union is union of preimages). f -1 ∪ N i=1 S i = ∪ N i=1 f -1 (S i ).

2.2. POLYTOPES

Our method is not applicable to arbitrary sets Y , but rather sets that, roughly, have piecewise linear boundaries. The basic building block of these sets are polytopes.



Carlsson  et al. (2017) and Behrmann et al. (2018)  are oriented backwards, they attempt to reconstruct the inputs that result in an output. These related papers study the statistical invariances that nonlinear layers encode.Behrmann et al. (2018)  examines the preimage of a single point through a single ReLU layer, analyzing stability via an approximation-based experiment.Carlsson et al. (2017)

Table 1 orients our work to the literature.

A taxonomy of previous work on inversion and verification. Here f : R n1

