SOAR: SECOND-ORDER ADVERSARIAL REGULARIZATION

Abstract

Adversarial training is a common approach to improving the robustness of deep neural networks against adversarial examples. In this work, we propose a novel regularization approach as an alternative. To derive the regularizer, we formulate the adversarial robustness problem under the robust optimization framework and approximate the loss function using a second-order Taylor series expansion. Our proposed second-order adversarial regularizer (SOAR) is an upper bound based on the Taylor approximation of the inner-max in the robust optimization objective. We empirically show that the proposed method improves the robustness of networks against the ∞ and 2 bounded perturbations on CIFAR-10 and SVHN.

1. INTRODUCTION

Adversarial training (Szegedy et al., 2013) is the standard approach for improving the robustness of deep neural networks (DNN), or any other model, against adversarial examples. It is a data augmentation method that adds adversarial examples to the training set and updates the network with newly added data points. Intuitively, this procedure encourages the DNN not to make the same mistakes against an adversary. By adding sufficiently enough adversarial examples, the network gradually becomes robust to the attack it was trained on. One of the challenges with such a data augmentation approach is the tremendous amount of additional data required for learning a robust model. Schmidt et al. (2018) show that under a Gaussian data model, the sample complexity of robust generalization is √ d times larger than that of standard generalization. They further suggest that current datasets (e.g., CIFAR-10) may not be large enough to attain higher adversarial accuracy. A data augmentation procedure, however, is an indirect way to improve the robustness of a DNN. Our proposed alternative is to define a regularizer that penalizes DNN parameters prone to attacks. Minimizing the regularized loss function leads to estimators robust to adversarial examples. Adversarial training and our proposal can both be formulated in terms of robust optimization framework for adversarial robustness (Ben-Tal et al., 2009; Madry et al., 2018; Wong & Kolter, 2018; Shaham et al., 2018; Sinha et al., 2018) . In this formulation, one is seeking to improve the worstcase performance of the model, where the performance is measured by a particular loss function . Adversarial training can be understood as approximating such a worst-case loss by finding the corresponding worst-case data point, i.e., x + δ with some specific attack techniques. Our proposed method is more direct. It is based on approximating the loss function (x + δ) using its second-order Taylor series expansion, i.e., (x + δ) ≈ (x) + ∇ x (x) δ + 1 2 δ ∇ 2 x (x)δ, and then upper bounding the worst-case loss using the expansion terms. By considering both gradient and Hessian of the loss function with respect to (w.r.t.) the input, we can provide a more accurate approximation to the worst-case loss. In our derivations, we consider both 2 and ∞ attacks. In our derivations, the second-order expansion incorporates both the gradient and Hessian of the loss function with respect to (w.r.t.) the input. We call the method Second-Order Adversarial Regularizer (SOAR) (not to be confused with the Soar cognitive architecture Laird 2012). In the course of development of SOAR, we make the following contributions: • We show that an over-parameterized linear regression model can be severely affected by an adversary, even though its population loss is zero. We robustify it with a regularizer that

