SOAR: SECOND-ORDER ADVERSARIAL REGULARIZATION

Abstract

Adversarial training is a common approach to improving the robustness of deep neural networks against adversarial examples. In this work, we propose a novel regularization approach as an alternative. To derive the regularizer, we formulate the adversarial robustness problem under the robust optimization framework and approximate the loss function using a second-order Taylor series expansion. Our proposed second-order adversarial regularizer (SOAR) is an upper bound based on the Taylor approximation of the inner-max in the robust optimization objective. We empirically show that the proposed method improves the robustness of networks against the ∞ and 2 bounded perturbations on CIFAR-10 and SVHN.

1. INTRODUCTION

Adversarial training (Szegedy et al., 2013) is the standard approach for improving the robustness of deep neural networks (DNN), or any other model, against adversarial examples. It is a data augmentation method that adds adversarial examples to the training set and updates the network with newly added data points. Intuitively, this procedure encourages the DNN not to make the same mistakes against an adversary. By adding sufficiently enough adversarial examples, the network gradually becomes robust to the attack it was trained on. One of the challenges with such a data augmentation approach is the tremendous amount of additional data required for learning a robust model. Schmidt et al. (2018) show that under a Gaussian data model, the sample complexity of robust generalization is √ d times larger than that of standard generalization. They further suggest that current datasets (e.g., CIFAR-10) may not be large enough to attain higher adversarial accuracy. A data augmentation procedure, however, is an indirect way to improve the robustness of a DNN. Our proposed alternative is to define a regularizer that penalizes DNN parameters prone to attacks. Minimizing the regularized loss function leads to estimators robust to adversarial examples. Adversarial training and our proposal can both be formulated in terms of robust optimization framework for adversarial robustness (Ben-Tal et al., 2009; Madry et al., 2018; Wong & Kolter, 2018; Shaham et al., 2018; Sinha et al., 2018) . In this formulation, one is seeking to improve the worstcase performance of the model, where the performance is measured by a particular loss function . Adversarial training can be understood as approximating such a worst-case loss by finding the corresponding worst-case data point, i.e., x + δ with some specific attack techniques. Our proposed method is more direct. It is based on approximating the loss function (x + δ) using its second-order Taylor series expansion, i.e., (x + δ) ≈ (x) + ∇ x (x) δ + 1 2 δ ∇ 2 x (x)δ, and then upper bounding the worst-case loss using the expansion terms. By considering both gradient and Hessian of the loss function with respect to (w.r.t.) the input, we can provide a more accurate approximation to the worst-case loss. In our derivations, we consider both 2 and ∞ attacks. In our derivations, the second-order expansion incorporates both the gradient and Hessian of the loss function with respect to (w.r.t.) the input. We call the method Second-Order Adversarial Regularizer (SOAR) (not to be confused with the Soar cognitive architecture Laird 2012). In the course of development of SOAR, we make the following contributions: • We show that an over-parameterized linear regression model can be severely affected by an adversary, even though its population loss is zero. We robustify it with a regularizer that exactly mimics the adversarial training. This suggests that regularization can be used instead of adversarial training (Section 2). • Inspired by such a possibility, we develop a regularizer which upper bounds the worst-case effect of an adversary under an approximation of the loss. In particular, we derive SOAR, which approximates the inner maximization of the robust optimization formulation based on the second-order Taylor series expansion of the loss function (Section 4). • We study SOAR in the logistic regression setting and reveal challenges with regularization using Hessian w.r.t. the input. We develop a simple initialization method to circumvent the issue (Section 4.1). 

2. LINEAR REGRESSION WITH AN OVER-PARAMETRIZED MODEL

This section shows that for over-parameterized linear models, gradient descent (GD) finds a solution that has zero population loss, but is prone to attacks. It also shows that one can avoid this problem with defining an appropriate regularizer. Hence, we do not need adversarial training to robustify such a model. This simple illustration motivates the development of our method in next sections. We only briefly report the main results here, and defer the derivations to Appendix A . Consider a linear model f w (x) = w , x with x, w ∈ R d . Suppose that w * = (1, 0, 0, . . . , 0) and the distribution of x ∼ p is such that it is confined on a 1-dimensional subspace { (x 1 , 0, 0, . . . , 0) : x 1 ∈ R }. This setup can be thought of as using an over-parameterized model that has many irrelevant dimensions with data that is only covering the relevant dimension of the input space. This is a simplified model of the situation when the data manifold has a dimension lower than the input space. We consider the squared error pointwise loss l(x; w) = 1 2 | x , w -x , w * |foot_0 . Denote the residual by r(x; w) = x , w -w * , and the population loss by L(w) = E [l(X; w)]. Suppose that we initialize the weights as w(0) = W ∼ N (0, σ 2 I d×d ), and use GD on the population loss, i.e., w(t + 1) ← w(t) -β∇ w L(w). It is easy to see that the partial derivatives w.r.t. w 2,...,d are all zero, i.e., no weight adaptation happens. With a proper choice of learning rate β, we get that the asymptotic solution is w lim r→∞ w(t) = (w * 1 , w 2 (0), w 3 (0), . . . , w d (0)) . That is, the initial random weights on dimensions 2, . . . , d do not change. We make two observations. The first is that L( w) = 0, i.e., the population loss is zero. So from the perspective of training under the original loss, we are finding the optimal solution. The second observation is that this model is vulnerable to adversarial examples. An FGSM-like attack that perturbs x by ∆x = (0, ∆x 2 , ∆x 3 , . . . , ∆x d ) with ∆x i = ε sign(w i (0)) (for i = 2, . . . , d) has the population loss of E X,W [l(X + ∆x); w)] ≈ O(ε 2 d 2 σ 2 ) under the adversary at the asymptotic solution w. When the dimension is large, this loss is quite significant. The culprit is obviously that GD is not forcing the initial weights to go to zero when there is no data from irrelevant and unused dimensions. This simple problem illustrates how the optimizer and an over-parameterized model might interact and lead to a solution that is prone to attacks. An effective solution is to regularize the loss such that the weights of irrelevant dimensions to go to zero. Generic regularizers such as ridge and Lasso regression lead to a biased estimate of w * 1 , and thus, one is motivated to define a regularizer that is specially-designed for improving adversarial robustness. Bishop (1995) showed the close connection between training with random perturbation and Tikhonov Regularization. Inspired by this idea, we develop a regularizer that mimics the adversary itself. For this FGSM-like adversary, the population loss at the perturbed point is 



1 incorporates the effect of adversary in exact form. Nonetheless, there are two limitations of this approach. The first is that it is designed for a particular choice of attack, an FGSM-like one. We would like a regularizer that is robust to a larger class



robustified (w) E [l(X + ∆x; w)] = L(w) + εE [r(X; w)] w 2:d 1 + ε Minimizing L robustified (w) is equivalent to minimizing the model at the point x = x + ∆x. The regularizer εE [r(X; w)] w 2:d 1 + ε 2 2 w 2:d

We empirically show that SOAR significantly improves the adversarial robustness of the network against ∞ attacks and 2 attacks on CIFAR-10 and SVHN. Specifically, we evaluate using a PGD1000 white-box attack(Madry et al., 2018), transferred PGD1000 attacks, AutoAttack(Croce & Hein, 2020), and SimBA (Guo et al., 2019).

