PROVABLE ROBUSTNESS BY GEOMETRIC REGULARIZATION OF RELU NETWORKS

Abstract

Recent work has demonstrated that neural networks are vulnerable to small, adversarial perturbations of their input. In this paper, we propose an efficient regularization scheme inspired by convex geometry and barrier methods to improve the robustness of feedforward ReLU networks. Since such networks are piecewise linear, they partition the input space into polyhedral regions (polytopes). Our regularizer is designed to minimize the distance between training samples and the analytical centers of their respective polytopes so as to push points away from the boundaries. Our regularizer provably optimizes a lower bound on the necessary adversarial perturbation required to switch an example's label. The addition of a second regularizer that encourages linear decision boundaries improves robustness while avoiding over-regularization of the classifier. We demonstrate the robustness of our approach with respect to ∞ and 2 adversarial perturbations on multiple datasets. Our method is competitive with state-of-the-art algorithms for learning robust networks while involving fewer hyperparameters. Moreover, applying our algorithm in conjunction with adversarial training boosts the robustness of classifiers even further.

1. INTRODUCTION

Neural networks have been very successful in tasks such as image classification and speech recognition. However, recent work (Szegedy et al., 2014; Goodfellow et al., 2015) has demonstrated that neural networks classifiers can be arbitrarily fooled by small, adversarially-chosen perturbations of their inputs. Notably, Su et al. (2017) demonstrated that neural network classifiers which can correctly classify "clean" images may be vulnerable to targeted attacks, e.g., misclassify those same images when only a single pixel is changed. Previous work demonstrating this fragility of neural network classifiers to adversarial noise has motivated the development of many heuristic defenses including adversarial training (Madry et al., 2018) as well as certifiably robust classifiers such as randomized smoothing (Cohen et al., 2019; Salman et al., 2019) which characterize the robustness of a classifier according to its smoothness. (Arora et al., 2018; Montúfar et al., 2014; Croce & Hein, 2019) . Typically, work in this area



The intrinsic relationship between smoothness, or Lipschitz continuity-and their corresponding local variants-and robustness has motivated a variety of techniques to encourage uniform and local smoothness through the explicit regularization of approximations of the global and local Lipschitz constants(Zhang et al., 2018; Weng et al., 2018a;b). Recently, Lecuyer et al. (2019); Li et al. (2018); Cohen et al. (2019); Salman et al. (2019) proposed and extended a simple, scalable technique-randomized smoothing-to transform arbitrary functions (e.g. neural network classifiers) into certifiably and robust classifiers on 2 perturbations. Alternatively, previous work has also addressed adversarial robustness in the context of piecewiselinear classifiers (e.g., feedforward neural networks with ReLU activations). Wong & Kolter (2018); Jordan et al. (2019) propose to certify the robustness of a network f at an example x by considering a bound on the radius of the maximum p -norm ball contained within a union of polytopes over which f predicts the same class. Related to our work, Croce et al.; Liu et al. (2020) propose maximum margin regularizers (MMR) which quantifies robustness of a network at a point according to the local region in which it lies and the distance to the classification boundary. Recent work also includes recovery and analysis of the piecewise linear function learned by an ReLU neural network during a training process

