PROVABLE ROBUSTNESS BY GEOMETRIC REGULARIZATION OF RELU NETWORKS

Abstract

Recent work has demonstrated that neural networks are vulnerable to small, adversarial perturbations of their input. In this paper, we propose an efficient regularization scheme inspired by convex geometry and barrier methods to improve the robustness of feedforward ReLU networks. Since such networks are piecewise linear, they partition the input space into polyhedral regions (polytopes). Our regularizer is designed to minimize the distance between training samples and the analytical centers of their respective polytopes so as to push points away from the boundaries. Our regularizer provably optimizes a lower bound on the necessary adversarial perturbation required to switch an example's label. The addition of a second regularizer that encourages linear decision boundaries improves robustness while avoiding over-regularization of the classifier. We demonstrate the robustness of our approach with respect to ∞ and 2 adversarial perturbations on multiple datasets. Our method is competitive with state-of-the-art algorithms for learning robust networks while involving fewer hyperparameters. Moreover, applying our algorithm in conjunction with adversarial training boosts the robustness of classifiers even further.

1. INTRODUCTION

Neural networks have been very successful in tasks such as image classification and speech recognition. However, recent work (Szegedy et al., 2014; Goodfellow et al., 2015) has demonstrated that neural networks classifiers can be arbitrarily fooled by small, adversarially-chosen perturbations of their inputs. Notably, Su et al. (2017) demonstrated that neural network classifiers which can correctly classify "clean" images may be vulnerable to targeted attacks, e.g., misclassify those same images when only a single pixel is changed. Previous work demonstrating this fragility of neural network classifiers to adversarial noise has motivated the development of many heuristic defenses including adversarial training (Madry et al., 2018) as well as certifiably robust classifiers such as randomized smoothing (Cohen et al., 2019; Salman et al., 2019) which characterize the robustness of a classifier according to its smoothness. The intrinsic relationship between smoothness, or Lipschitz continuity-and their corresponding local variants-and robustness has motivated a variety of techniques to encourage uniform and local smoothness through the explicit regularization of approximations of the global and local Lipschitz constants (Zhang et al., 2018; Weng et al., 2018a; b) In this work, we propose a novel regularizer for feedforward piecewise-linear neural networks, including convolutional neural networks, to increase their robustness to adversarial perturbations. Our Geometric Regularization (GR) method is based on the fact that ReLU networks define continuous piecewise affine functions and is inspired by classical techniques from convex geometry and linear programming. We provide a novel robustness certificate based on the local polytope geometry of a point and show that our regularizer provably maximizes this certificate. We evaluate the efficacy of our method on three datasets. Notably, our method works regardless of the perturbation model and relies on fewer hyperparameters compared with related approaches. We demonstrate that our regularization term leads to classifiers that are empirically robust and are comparable to the state of the art algorithms with respect to clean and robust test accuracy under 1 and ∞ -norm adversarial perturbations.

2. PRELIMINARIES

In this section, we briefly present background terminology pertaining to polytopes and their characterizations, adversarially robust classification, and the polytope decomposition of the domain induced by an ReLU network and its linearization over a given polytope.

2.1. PIECEWISE-LINEAR NETWORKS

An ReLU network is a neural network such that all nonlinear activations are ReLU functions, where we denote the ReLU activation by σ : R → R, σ(x) = max{0, x}. Informally, we define σ : R d → R d by σ(x) = [σ(x 1 ), . . . , σ(x d )]. Let f : R d → [0, 1] k be a feedforward ReLU network with L hidden layers; for example, f may map from a d-dimensional image to a k-dimensional vector corresponding to likelihoods for k classes. Let n l be the number of hidden units at layer l, the input layer is of size n 0 = d, and let W (l) ∈ R n l-1 ×n l and b (l) ∈ R n l denote the weight matrix and bias vector at layer l, respectively. Since f may be represented as the composition of L + 1 linear transformations and L continuous piecewise-affine functions, f must necessarily be continuous and piecewise-affine (for brevity, we will say piecewise-linear). The half-space representation, or H-representation, of convex polytopes is defined as follows: Definition 2.1 (Convex polytope). A convex polytope K is the convex hull of finitely many points. Alternatively, a convex polytope may be expressed as an intersection of m half-spaces. The Hrepresentation of a polytope is defined as the solution set to a system of linear inequalities Ax ≤ b: K = {x : ∀j ∈ [m], a j • x ≤ b j } Following definition 3.1 from Croce et al., a function is piecewise-linear if there exists a finite set of convex polytopes {Q r } m r=1 (referred to as linear regions of f ) such that ∪ m r=1 Q r = R d and f is affine when restricted to each Q r , i.e., can be expressed on Q r as f (x) = V x + a. Given a feedforward ReLU network f and an input x ∈ R d , we intend to recover the polytope Q conditioned on x and the linear restriction of f on Q. Therefore, we need to find A and b such that Ax ≤ b, where A and b define the intersection of k half-spaces, i.e. a polytope, and V and a corresponding to the linearization of f within this polytope such that f (x) = V x + a when restricted to Q. We follow the formulation and notations of Croce et al.. If x ∈ R d and g (0) (x) = x we recursively define the pre-and post-activation output of every layer: f (l) (x) = W (l) g (l-1) (x) + b (l) g (l-1) (x) = σ(f (l) (x)) The resulting classifier is then: f (L+1) = W (L+1) g (L) (x) + b (L+1) .



. Recently, Lecuyer et al. (2019); Li et al. (2018); Cohen et al. (2019); Salman et al. (2019) proposed and extended a simple, scalable technique-randomized smoothing-to transform arbitrary functions (e.g. neural network classifiers) into certifiably and robust classifiers on 2 perturbations. Alternatively, previous work has also addressed adversarial robustness in the context of piecewiselinear classifiers (e.g., feedforward neural networks with ReLU activations). Wong & Kolter (2018); Jordan et al. (2019) propose to certify the robustness of a network f at an example x by considering a bound on the radius of the maximum p -norm ball contained within a union of polytopes over which f predicts the same class. Related to our work, Croce et al.; Liu et al. (2020) propose maximum margin regularizers (MMR) which quantifies robustness of a network at a point according to the local region in which it lies and the distance to the classification boundary. Recent work also includes recovery and analysis of the piecewise linear function learned by an ReLU neural network during a training process(Arora et al., 2018; Montúfar et al., 2014; Croce & Hein, 2019). Typically, work in this area centers around studying the complexity, interpretation, and improvement of stability and robustness of neural networks. For example,Montúfar et al. (2014); Serra et al. (2017)  studied piecewise linear representations of neural networks and proposed the "activation profile" to characterize the linear regions.

