SOAR: SECOND-ORDER ADVERSARIAL REGULARIZATION

Abstract

Adversarial training is a common approach to improving the robustness of deep neural networks against adversarial examples. In this work, we propose a novel regularization approach as an alternative. To derive the regularizer, we formulate the adversarial robustness problem under the robust optimization framework and approximate the loss function using a second-order Taylor series expansion. Our proposed second-order adversarial regularizer (SOAR) is an upper bound based on the Taylor approximation of the inner-max in the robust optimization objective. We empirically show that the proposed method improves the robustness of networks against the ∞ and 2 bounded perturbations on CIFAR-10 and SVHN.

1. INTRODUCTION

Adversarial training (Szegedy et al., 2013) is the standard approach for improving the robustness of deep neural networks (DNN), or any other model, against adversarial examples. It is a data augmentation method that adds adversarial examples to the training set and updates the network with newly added data points. Intuitively, this procedure encourages the DNN not to make the same mistakes against an adversary. By adding sufficiently enough adversarial examples, the network gradually becomes robust to the attack it was trained on. One of the challenges with such a data augmentation approach is the tremendous amount of additional data required for learning a robust model. Schmidt et al. (2018) show that under a Gaussian data model, the sample complexity of robust generalization is √ d times larger than that of standard generalization. They further suggest that current datasets (e.g., CIFAR-10) may not be large enough to attain higher adversarial accuracy. A data augmentation procedure, however, is an indirect way to improve the robustness of a DNN. Our proposed alternative is to define a regularizer that penalizes DNN parameters prone to attacks. Minimizing the regularized loss function leads to estimators robust to adversarial examples. Adversarial training and our proposal can both be formulated in terms of robust optimization framework for adversarial robustness (Ben-Tal et al., 2009; Madry et al., 2018; Wong & Kolter, 2018; Shaham et al., 2018; Sinha et al., 2018) . In this formulation, one is seeking to improve the worstcase performance of the model, where the performance is measured by a particular loss function . Adversarial training can be understood as approximating such a worst-case loss by finding the corresponding worst-case data point, i.e., x + δ with some specific attack techniques. Our proposed method is more direct. It is based on approximating the loss function (x + δ) using its second-order Taylor series expansion, i.e., (x + δ) ≈ (x) + ∇ x (x) δ + 1 2 δ ∇ 2 x (x)δ, and then upper bounding the worst-case loss using the expansion terms. By considering both gradient and Hessian of the loss function with respect to (w.r.t.) the input, we can provide a more accurate approximation to the worst-case loss. In our derivations, we consider both 2 and ∞ attacks. In our derivations, the second-order expansion incorporates both the gradient and Hessian of the loss function with respect to (w.r.t.) the input. We call the method Second-Order Adversarial Regularizer (SOAR) (not to be confused with the Soar cognitive architecture Laird 2012 ). In the course of development of SOAR, we make the following contributions: • We show that an over-parameterized linear regression model can be severely affected by an adversary, even though its population loss is zero. We robustify it with a regularizer that exactly mimics the adversarial training. This suggests that regularization can be used instead of adversarial training (Section 2). • Inspired by such a possibility, we develop a regularizer which upper bounds the worst-case effect of an adversary under an approximation of the loss. In particular, we derive SOAR, which approximates the inner maximization of the robust optimization formulation based on the second-order Taylor series expansion of the loss function (Section 4). • We study SOAR in the logistic regression setting and reveal challenges with regularization using Hessian w.r.t. the input. We develop a simple initialization method to circumvent the issue (Section 4.1). • We empirically show that SOAR significantly improves the adversarial robustness of the network against ∞ attacks and 2 attacks on CIFAR-10 and SVHN. Specifically, we evaluate using a PGD1000 white-box attack (Madry et al., 2018) , transferred PGD1000 attacks, AutoAttack (Croce & Hein, 2020) , and SimBA (Guo et al., 2019) .

2. LINEAR REGRESSION WITH AN OVER-PARAMETRIZED MODEL

This section shows that for over-parameterized linear models, gradient descent (GD) finds a solution that has zero population loss, but is prone to attacks. It also shows that one can avoid this problem with defining an appropriate regularizer. Hence, we do not need adversarial training to robustify such a model. This simple illustration motivates the development of our method in next sections. We only briefly report the main results here, and defer the derivations to Appendix A . Consider a linear model f w (x) = w , x with x, w ∈ R d . Suppose that w * = (1, 0, 0, . . . , 0) and the distribution of x ∼ p is such that it is confined on a 1-dimensional subspace { (x 1 , 0, 0, . . . , 0) : x 1 ∈ R }. This setup can be thought of as using an over-parameterized model that has many irrelevant dimensions with data that is only covering the relevant dimension of the input space. This is a simplified model of the situation when the data manifold has a dimension lower than the input space. We consider the squared error pointwise loss l(x; w) = 1 2 | x , w -x , w * |foot_0 . Denote the residual by r(x; w) = x , w -w * , and the population loss by L(w) = E [l(X; w)]. Suppose that we initialize the weights as w(0) = W ∼ N (0, σ 2 I d×d ), and use GD on the population loss, i.e., w(t + 1) ← w(t) -β∇ w L(w). It is easy to see that the partial derivatives w.r.t. w 2,...,d are all zero, i.e., no weight adaptation happens. With a proper choice of learning rate β, we get that the asymptotic solution is w lim r→∞ w(t) = (w * 1 , w 2 (0), w 3 (0), . . . , w d (0)) . That is, the initial random weights on dimensions 2, . . . , d do not change. We make two observations. The first is that L( w) = 0, i.e., the population loss is zero. So from the perspective of training under the original loss, we are finding the optimal solution. The second observation is that this model is vulnerable to adversarial examples. An FGSM-like attack that perturbs x by ∆x = (0, ∆x 2 , ∆x 3 , . . . , ∆x d ) with ∆x i = ε sign(w i (0)) (for i = 2, . . . , d) has the population loss of E X,W [l(X + ∆x); w)] ≈ O(ε 2 d 2 σ 2 ) under the adversary at the asymptotic solution w. When the dimension is large, this loss is quite significant. The culprit is obviously that GD is not forcing the initial weights to go to zero when there is no data from irrelevant and unused dimensions. This simple problem illustrates how the optimizer and an over-parameterized model might interact and lead to a solution that is prone to attacks. An effective solution is to regularize the loss such that the weights of irrelevant dimensions to go to zero. Generic regularizers such as ridge and Lasso regression lead to a biased estimate of w * 1 , and thus, one is motivated to define a regularizer that is specially-designed for improving adversarial robustness. Bishop (1995) showed the close connection between training with random perturbation and Tikhonov Regularization. Inspired by this idea, we develop a regularizer that mimics the adversary itself. For this FGSM-like adversary, the population loss at the perturbed point is L robustified (w) E [l(X + ∆x; w)] = L(w) + εE [r(X; w)] w 2:d 1 + ε 2 2 w 2:d 2 1 . Minimizing L robustified (w) is equivalent to minimizing the model at the point x = x + ∆x. The regularizer εE [r(X; w)] w 2:d 1 + ε 2 2 w 2:d of attacks. The second is that this regularizer is designed for a linear model and the squared error loss. How can we design a regularizer for more complicated models, such as DNNs? We address these questions by formulating the problem of adversarial robustness within the robust optimization framework (Section 3), and propose an approach to approximately solve it (Section 4). . Given the prediction of f (x; w) and a target value y, the pointwise loss function of the model is denoted by (x, y; w) (f (x; w), y). Given the distribution of data, one can define the population loss as

3. ROBUST OPTIMIZATION FORMULATION

L(w) = E [ (X, Y ; w)]. The goal of the standard supervised learning problem is to find a w ∈ W that minimizes the population loss. A generic approach to do this is through empirical risk minimization (ERM). Explicit or implicit regularization is often used to control the complexity of the hypothesis to avoid over-or under-fitting (Hastie et al., 2009) . As shown in the previous section, it is possible to find a parameter w that minimizes the loss through ERM, but leads to a model that is vulnerable to adversarial examples. To incorporate the robustness notion in the model, it requires defenders to reconsider the training objective. It is also important to formalize and constrain the power of the adversary, so we understand the strength of the attack to which the model is resistant. This can be specified by limiting that the adversary can only modify any input x to x + δ with δ ∈ ∆ ⊂ X . Commonly used constraints are ε-balls w.r.t. the p -norms, though other constraint sets have been used too (Wong et al., 2019b) . This goal can be formulated as a robust optimization problem where the objective is to minimize the adversarial population loss given some perturbation constraint ∆: min w E (X,Y )∼D max δ∈∆ (X + δ, Y ; w) We have an interplay between two goals: 1) the inner-max term looks for the worst-case loss around the input, while 2) the outer-min term optimizes the hypothesis by minimizing such a loss. Note that solving the inner-max problem is often computationally difficult, so one may approximate it with a surrogate loss obtained from a particular attack. As shown in Section 2, one can design a regularizer that provides the exact value of the loss function at the attacked point for a particular choice of model, loss function, and adversary, cf. (1) . Under the robust optimization framework, the regularizer and adversarial training are two realizations of the inner-max objective in (2), but using such a regularizer relieved us from using a separate inner optimization procedure, as is done in adversarial training. Motivated by that example and the robust optimization framework discussed here, we develop a regularizer that can be understood as an upper-bound on the worst-case value of the loss at an attacked point under a second-order approximation of the loss function.

4. SECOND-ORDER ADVERSARIAL REGULARIZER (SOAR)

The main idea of SOAR is to approximate the loss function using the second-order Taylor series expansion around an input x and then solve the inner maximization term of the robust optimization formulation (2) using the approximated form. We show this for both 2 and ∞ attacks; the same idea can be applied to other p norms. We describe crucial steps of the derivation in this section, and defer details to Appendix B . Assuming that the loss is twice-differentiable, we can approximate the loss function around input x by the second-order Taylor expansion (x + δ, y; w) ≈ ˜ 2nd (x + δ, y; w) (x, y; w) + ∇ x (x, y; w) δ + 1 2 δ ∇ 2 x (x, y; w)δ. For brevity, we drop w, y and use ∇ to denote ∇ x . Let us focus on the p attacks, where the constraint set in (2) is ∆ = {δ : δ p ≤ ε} for some ε > 0 and p ≥ 1. We focus on the ∞ attack because of its popularity, but we also derive the formulation for the 2 attacks. As a warm-up, let us solve the inner optimization problem by considering the first-order Taylor series expansion. We have FOAR (x) max δ ∞ ≤ε (x) + ∇ (x) δ = (x) + ε ∇ (x) 1 . The term ε ∇ (x) 1 defines the First-Order Adversarial Regularizer (FOAR). This is similar to the regularizer introduced by Simon-Gabriel et al. ( 2019) with the choice of ∞ perturbation set. For a general p -attach with 1 ≤ p ≤ ∞, we have ∇ (x) q with q satisfying p -1 + q -1 = 1. We shall empirically evaluate FOAR-based approach (for the ∞ attack), but our focus is going to be on solving the inner maximization problem based on the second-order Taylor expansion: max δ p ≤ε (x) + ∇ (x) δ + 1 2 δ ∇ 2 (x)δ, for p = 2, ∞. The second-order expansion in (3) can be rewritten as (x + δ) ≈ (x) + 1 2 δ 1 ∇ 2 (x) ∇ (x) ∇ (x) 1 δ 1 - 1 2 = (x) + 1 2 δ Hδ - 1 2 , where δ = [δ; 1]. This allows us to derive an upper bound on the expansion terms using the characteristics of a single Hessian term H. Note that δ is a d + 1-dimensional vector and H is a (d + 1) × (d + 1) matrix. We need to find an upper bound on δ Hδ under the attack constraint. For the ∞ attack, solving this maximizing problem is not as easy as in (4) since the Boolean quadratic programming problem in formulation ( 5) is NP-hard. But we can relax the constraint set and find an upper bound for the maximizer. Note that with δ ∈ R d , an ∞ -ball of size ε is enclosed by an 2 -ball of size √ dε with the same centre. Therefore, we can upper bound the inner maximization by max δ ∞ ≤ε (x + δ) ≤ max δ 2 ≤ √ dε (x + δ), which after substituting the second-order Taylor series expansion leads to an 2 -constrained quadratic optimization problem (x) + 1 2 max δ 2 ≤ √ dε δ Hδ - 1 2 , with δ = [δ; 1] as before. The 2 version of SOAR does not require this extra step, and we have ε instead of √ dε in (8) . A more detailed discussion on the above relaxation procedure is included in Appendix B.2 . Proposition 1. Let : R d → R be a twice-differentiable function. For any ε > 0, we have max δ ∞ ≤ε ˜ 2nd (x + δ) ≤ (x) + dε 2 + 1 2 E [ Hz 2 ] - 1 2 , ( ) where H is defined in (6) and z ∼ N (0, I (d+1)×(d+1) ). This result upper bounds the maximum of the second-order approximation ˜ 2nd over an ∞ ball with radius ε, and relates it to an expectation of a Hessian-vector product. Note that there is a simple correspondence between (1) and regularized loss in (9) . The latter can be understood as an upper bound on the worst-case damage of an adversary under a second-order approximation of the loss. For the 2 attack, the same line of argument leads to ε 2 + 1 instead of dε 2 + 1. Let us take a closer look at Hz. By decomposing z = [z d , z 1 ] , we get Hz = ∇ 2 (x)z d + z 1 ∇ (x) ∇ (x) z d + z 1 . The term ∇ 2 (x)z d can be computed using Finite Difference (FD) approximation. Note that E [ z d 2 ] = √ d for our Normally distributed z. To ensure that the approximation direction has the same magnitude, we use the normalized zd = z d z d 2 instead, and use the approximation below ∇ 2 (x)z d ≈ z d 2 ∇ (x + hz d ) -∇ (x) h . To summarize, SOAR regularizer evaluated at x, with a direction z, and FD step size h > 0 is R(x; z, h) = dε 2 + 1 2 z d 2 ∇ (x+hz d )-∇ (x) h + z1∇ (x) ∇ (x) z d + z1 2 . ( ) The expectation in (9) can then be approximated by taking multiple samples of z drawn from z ∼ N (0, I (d+1)×(d+1) ). These samples would be concentrated around its expectation. One can show that P { Hz -E [ Hz ] > t} ≤ 2 exp(-ct 2 H 2 ), where c is a constant and H 2 is the 2induced norm (see Theorem 6.3.2 of Vershynin 2018). In practice, we observed that taking more than one sample of z do not provide significant improvement for increasing adversarial robustness, and we include an empirical study on the the effect of sample sizes in Appendix E.4 . Before we discuss the remaining details, recall that we fully robustify the model with an appropriate regularizer in Section 2. Note the maximizer of the loss based on formulation (2) is exactly the FGSM direction, and (1) shows the population loss with our FGSM-like choice of ∆x. To further motivate a second-order approach, note that we can obtain the first two terms in (1) with a first-order regularizer such as FOAR; and we recover the exact form with a second-order formulation in (5) . Next, we study SOAR in the simple logistic regression setting, which shows potential failure of the regularizer and reveals why we might observe gradient masking. Based on that insight, we provide the remaining details of the method afterwards in Section 4.1.

4.1. AVOIDING GRADIENT MASKING

Consider a linear classifier f : R d × R d → R with f (x; w) = φ( w , x ), where x, w ∈ R d are the input and the weights, and φ(•) is the sigmoid function. Note that the output of f has the interpretation of being a Bernoulli distribution. For the cross-entropy loss function (x, y; w) = -[y log f (x; w)+(1-y) log(1-f (x; w))], the gradient w.r.t. the input x is ∇ (x) = (f (x; w)-y)w and the Hessian w.r.t. the input x is ∇ 2 (x) = f (x; w)(1 -f (x; w))ww . The second-order Taylor series expansion (3) with the gradient and Hessian evaluated at x is (x + δ) ≈ (x) + r(x, y; w)w δ + 1 2 u(x; w)δ ww δ, where r = r(x, y; w) = f (x; w) -y is the residual term describing the difference between the predicted probability and the correct label, and u = u(x; w) = f (x; w)(1 -f (x; w)). Note that u can be interpreted as how confident the model is about its predication (correct or incorrect), and is close to 0 whenever the classifier is predicting a value close to 0 or 1. With this linear model, the maximization ( 8) becomes (x) + max δ 2 ≤ √ dε rw δ + 1 2 uδ ww δ = (x) + ε √ d |r(x, y; w)| w 2 + dε 2 2 u(x; w) w 2 2 . The regularization term is encouraging the norm of w to be small, weighted according to the residual r(x, y; w) and the uncertainty u(x; w). Consider a linear interpolation of the cross-entroply loss from x to a perturbed input x . Specifically, we consider (αx + (1 -α)x ) for α ∈ [0, 1] . Previous work has empirically shown that the value of the loss behaves logistically as α increases from 0 to 1 (Madry et al., 2018) . In such a case, since there is very little curvature at x, if we use Hessian exactly at x, it leads to an inaccurate approximation of the value at (x ). Consequently, we have a poor approximation of the inner-max, and the derived regularization will not be effective. For the approximation in (12), this issue corresponds to the scenario in which the classifier is very confident about the clean input at x. Standard training techniques such as minimizing the crossentropy loss optimize the model such that it returns the correct label with a high confidence. Whenever Algorithm 1: Computing the SOAR objective for a single training data Input : A pair of training data (x, y), ∞ constraint of ε, Finite difference step-size h. x ← x + η, where η ← (η 1 , η 2 , . . . , η d ) and η i ∼ U(-ε 2 , ε 2 ). x ← Π B(x, ε 2 ) x + ε 2 sign (∇ x (x )) where Π is the projection operator. Sample z ∼ N (0, I (d+1)×(d+1) ). Compute SOAR regularizer R(x ; z, h) as (11). Compute the pointwise objective: SOAR (x, y) = (x , y) + R(x ; z, h). the classifier is correct with a high confidence, both r and u will be close to zero. As a result, the effect of the regularizer diminishes, i.e., the weights are no longer regularized. In such a case, the Taylor series expansion, computed using the gradient and Hessian evaluated at x, becomes an inaccurate approximation to the loss, and hence its maximizer is not a good solution to the inner maximization problem. Note that this does not mean that one cannot use Taylor series expansion to approximate the loss. In fact, by the mean value theorem, there exists an h ∈ (0, 1) such that the second-order Taylor expansion is exact: (x + δ) = (x) + ∇ (x) δ + 1 2 δ ∇ 2 (x + h δ)δ. The issue is that if we compute the Hessian at x (instead of at x + h δ), our approximation might not be very good whenever the curvature profile of the loss function at x is drastically different from the one at x + h δ. More importantly, a method relying on the gradient masking can be easily circumvented (Athalye et al., 2018) . Our early experimental results had also indicated that gradient masking occurred with SOAR when the gradient and Hessian were evaluated at x. In particular, we observe that SOAR with zero-initialization leads to models with nearly 100% confidence on their predictions, leading to an ineffective regularizer. The result is reported in Table 5 in Appendix D. This suggests a heuristic to improve the quality of SOAR. That is to evaluate the gradient and Hessian, through FD approximation (10) at a less confident point in the ∞ ball of x. We found that evaluating the gradient and Hessian at 1-step PGD adversary successfully circumvent the issue (Line 1-2 in Algorithm 1). We compare other initializations in Table 5 in Appendix D. To ensure the regularization is of the original ∞ ball of ε, we use ε 2 for PGD1 initialization, and then ε 2 in SOAR. Based on this heuristic, the regularized pointwise objective for a data point (x, y) is SOAR (x, y) = (x , y) + R(x ; z, h), where z ∼ N (0, I (d+1)×(d+1) ) and the point x is initialized at PGD1 adversary. Algorithm 1 summarizes SOAR on a single training data. We include the full training procedure in Appendix C . Moreover, we include additional discussions and experiments on gradient masking in Appendix E.11. 

5. EXPERIMENTS

In this section, we verify the effectiveness of the proposed regularization method against ∞ PGD attacks on CIFAR10. Our experiments show that training with SOAR leads to significant improvements in adversarial robustness, which is achieved without significantly sacrificing standard accuracy. We focus on ∞ in this section and defer evaluations on 2 in Appendix E.5. Additionally, we provide a detailed discussion and evaluations on the SVHN dataset in Appendix E.6. We train ResNet-10 ( 

5.1. ROBUSTNESS AGAINST PGD WHITE-BOX ATTACKS

Before making the comparison between SOAR and the baselines in Table 1 , note that FOAR achieves 32.28% against PGD20 attacks. Despite its uncompetitive performance, this shows that approximating the robust optimization formulation based on Taylor series expansion is a reasonable approach. Furthermore, this justifies our extension to a second-order approximation, as the firstorder alone is not sufficient. Lastly, we observe that training with SOAR significantly improves the adversarial robustness against all PGD attacks, leading to higher robustness in all k-step PGD attacks on the ResNet model. SOAR remains competitive compared to baseline methods trained on high-capacity WideResNet architecture. SOAR achieves the best robustness against all baseline methods trained on ResNet, as shown in Table 2 . Compared with the baselines trained on WideResNet, SOAR remains the most robust model against transferred PGD20-W and PGD1000-W, approaching its standard accuracy on unperturbed data. Note that all defence methods are substantially more vulnerable to the score-based SimBA attack. SOAR regularized model is the most robust method against SimBA.

5.3. ROBUSTNESS AGAINST AUTOATTACK

During the ICLR rebuttal phase, we evaluated SOAR against Autoattack (Croce & Hein, 2020). In this section, we focus on the ∞ -bounded Autoattack, and similar results with the 2 -bounded attack is included in Appendix E. To better understand the source of SOAR's vulnerability, we tested it against the four attacks individually. First, we observed that the result against untargeted APGD-CE is similar to the one shown in Section 5.1. This is expected because the attacks are both formulated based on cross-entropybased PGD. However, there is a considerable degradation in the accuracy of SOAR against targeted APGD-DLR and targeted FAB. At ε = 8/255, SOAR is most vulnerable to targeted APGD-DLR with a robust accuracy of only 18.25%. To further investigate SOAR's robustness against AutoAttack, we tested with different ε to verify if SOAR can at least improve robustness against ∞ attacks with smaller ε. We observed that at ε = 4/255 the robustness improvement of SOAR becomes more consistent. Interestingly, we also noticed that a model with better robustness at ε = 8/255 does not guarantee a better robustness at ε = 4/255, as is the case for Square Attack on ADV and SOAR. Combing the results with the four attacks and with different ε, we provide three hypotheses on the vulnerability of SOAR. First, SOAR might overfit to a particular type of attack: adversarial examples generated based on the cross-entropy loss. APGD-DLR is based on logit difference and FAB is based on finding minimal perturbation distances, which are both very different from the cross-entropy loss. Second, SOAR might rely on gradient masking to a certain extent, and thus PGD with cross-entropy loss is difficult to find adversaries while they still exist. This also suggests that the results with black-box attacks might be insufficient to conclusively eliminate the possibility of gradient masking. Third, since SOAR provide a more consistent robustness improvement at a smaller ε, this suggests that the techniques discussed in Section 4 did not completely address the problems raised from the second-order approximation. This makes the upper-bound of the inner-max problem loose, hence making SOAR improves robustness against attacks with ε smaller than what it was formulated with. Finally, we emphasize that this should not rule SOAR as a failed defence. Previous work shows that a mechanism based on gradient masking can be completely circumvented, resulting in a 0% accuracy against non-gradient-based attacks (Athalye et al., 2018) . Our result on SimBA and Square Attack shows that this is not the case with SOAR, even at ε = 8/255, and thus the robustness improvement cannot be only due to gradient masking. Overall, we think SOAR's vulnerability to AutoAttack is an interesting observation and requires further investigation.

A DERIVATIONS OF SECTION 2: LINEAR REGRESSION WITH AN OVER-PARAMETRIZED MODEL

We derive the results reported in Section 2 in more detail here. Recall that we consider a linear model f w (x) = w , x with x, w ∈ R d . We suppose that w * = (1, 0, 0, . . . , 0) and the distribution of x ∼ p is such that it is confined on a 1-dimensional subspace { (x 1 , 0, 0, . . . , 0) : x 1 ∈ R }. So the density of x is p ((x 1 , . . . , x d )) = p 1 (x 1 )δ(x 2 )δ(x 3 ) . . . δ(x d ), where δ(•) is Dirac's delta function. We initialize the weights at the first time step as w(0) ∼ N (0, σ 2 I d×d ), and use GD to find the minimizer of the population loss. The partial derivatives of the population loss are ∂L(w) ∂w j = (w 1 -w * 1 )p 1 (x 1 )xdx = (w 1 -w * 1 )µ 1 , (w j -w * j )δ(x j )xdx = (w j -w * j )0 = 0, j = 1. where µ 1 = E [X 1 ]. Notice that the gradient in dimension j = 1 is non-zero, unless (w 1 -w * 1 )µ 1 = 0. Assuming that µ 1 = 0, this implies that the gradient won't be zero unless w 1 = w * 1 . On the other hand, the gradients in dimensions j = 2, . . . , d are all zero, so GD does not change the value of w j (t) for j = 2, . . . , d. Therefore, under the proper choice of learning rate β, we get that the asymptotic solution of GD solution is w lim r→∞ w(t) = (w * 1 , w 2 (0), w 3 (0), . . . , w d (0)) . It is clear that L( w) = 0, i.e., the population loss is zero, as noted already as our first observation in that section. Also note that we can easily attack this model by perturbing x by ∆x = (0, ∆x 2 , ∆x 3 , . . . , ∆x d ) . The pointwise loss at x + ∆x is l(x + ∆x; w) = 1 2 |(w 1 -w * 1 )x 1 + w , ∆x | 2 = 1 2 |r(x; w) + w , ∆x | 2 . With the choice of ∆x i = ε sign(w i (0)) (for i = 2, . . . , d) and ∆x 1 = 0, an FGSM-like attack (Goodfellow et al., 2014) at the learned weight w leads to the pointwise loss of l(x + ∆x; w) = 1 2 ε 2 d j=2 |w j (0)| 2 ≈ 1 2 ε 2 w(0) 2 1 . We comment that our choice of ∆x is not from the same distribution as the training data x. This choice aligns with the hypotheses in Ding et al. (2019a); Schmidt et al. (2018) that adversarial examples come from a shifted data distribution; however, techniques such as feature adversaries (Sabour et al., 2015) focus on designing perturbations to be close to input distributions. We stress that the goal here is to illustrate the loss under this particular attack. In order to get a better sense of this loss, we compute its expected value w.r.t. the randomness of weight initialization. We have that (including the extra |w 1 (0)| term too) E W ∼N (0,σ 2 I d×d ) W 2 1 = E   d i,j=1 |W i ||W j |   = d i=1 E |W i | 2 + d i,j=1,i =j E [|W i |] E [|W j |] , where we used the independence of the r.v. W i and W j when i = j. The expectation E |W i | 2 is the variance σ 2 of W i . The r.v. |W j | has a folded normal distribution, and its expectation E [|W j |] is 2 π σ. Thus, we get that E W ∼N (0,σ 2 I d×1 ) W 2 1 = dσ 2 + d(d -1) 2 π σ 2 ≈ 2 π d 2 σ 2 , for d 1. The expected population loss of the specified attack ∆x at the asymptotic solution w is E X,W [l(X + ∆x); w)] ≈ O(ε 2 d 2 σ 2 ). The dependence of this loss on dimension d is significant, showing that the learned model is quite vulnerable to attacks. We note that the conclusions would not change much with initial distributions other than the Normal distribution. An effective solution is to regularize the loss to encourage the weights of irrelevant dimensions going to zero. A generic regularizer is to use the 2 -norm of the weights, i.e., formulate the problem as a ridge regression. In that case, the regularized population loss is L ridge (w) = 1 2 E | X , w -X , w * | 2 + λ 2 w 2 2 . One can see that the solution of ∇ w L ridge (w) = 0 is w1 (λ) = µ1 µ1+λ w * 1 and wj (λ) = 0 for j = 1. wj (λ) = µ1 µ1+λ w * 1 j = 1 0 j = 1. The use of this generic regularizer seems reasonable in this example, as it enforces the weights for dimensions 2 to d to become zero. Its only drawback is that it leads to a biased estimate of w * 1 . The bias, however, can be made small with a small choice for λ. We can obtain a similar conclusion for the 1 regularizer (Lasso). As such, one has to define a regularizer that is specially-designed for improving adversarial robustness. Bishop (1995) showed the strong connection between training with random perturbation and Tikhonov Regularization. Inspired by this idea, we develop a regularizer that mimics the adversary itself. Let us assume that a particular adversary attacks the model by adding ∆x = (0, ε sign(w 2 (0)), . . . , ε sign(w d (0)) . The population loss at the perturbed point is L robustified (w) E [l(X + ∆x; w)] = 1 2 E    r(x; w) + ε d j=2 |w j | 2    = L(w) + εE [r(X; w)] w 2:d 1 + ε 2 2 w 2:d 2 1 , where 1 This is the same objective as (1) reported in Section 2. Note that minimizing L robustified (w) is equivalent to minimizing the model at the point x = x + ∆x. The regularizer εE [r(X; w)] w 2:d 1 + ε 2 2 w 2:d 2 1 incorporates the effect of adversary in exact form. This motivated the possibility of designing a regularized tailored to prevent attacks. w 2:d 1 = d j=2 |w j |.

APPRXOIMATION

First, we show that the FGSM direction is the maximizer of the loss when the perturbation is ∞ constrained. Based on the pointwise loss at x + ∆x, we have max ∆X ∞ ≤ε l(x + ∆x; w) = 1 2 r(x; w) + max ∆X ∞ ≤ε w , ∆x 2 . We use the Cauchy-Schwarz inequality to obtain max ∆X ∞ ≤ε w , ∆x ≤ max ∆X ∞ ≤ε | w , ∆x | ≤ max ∆X ∞ ≤ε w 1 ∆x ∞ = ε w 1 , which leads to argmax ∆X ∞ ≤ε l(x + ∆x; w) = ε sign(w). Next, we show that the first-order approximation of E [l(X + ∆x; w)] obtains the first two terms in (1) . Note the gradient of the loss w.r.t. the input is ∇ x l(x; w) = ( w , ∆x -w * , ∆x )(w -w * ) = r(x; w)(w -w * ), and the Hessian w.r.t. the input is ∇ 2 x l(x; w) = (w -w * )(w -w * ) . The first-order Taylor series approximation is L robustified (w) ≈ L1st (w) E l(X; w) + ∇ x l(X; w) ∆x = L(w) + E r(X; w)(w -w * ) ∆x = L(w) + E r(X; w)w ∆x = L(w) + εE [r(X; w)] w 2:d 1 . Note that w * ∆x = 0 because of our particular choice of ∆x and w * . Here we obtain the first two terms in (1) . The second-order Taylor series approximation is L robustified (w) ≈ L2nd (w) E l(X; w) + ∇ x l(X; w) ∆x + 1 2 ∆x ∇ 2 x l(x; w)∆x = L(w) + εE [r(X; w)] w 2:d 1 + 1 2 ∆x (w -w * )(w -w * ) ∆x = L(w) + εE [r(X; w)] w 2:d 1 + ε 2 2 w 2:d 2 1 , which recovers the exact form in (1). This completes the motivation of using second-order Taylor series approximation with our warm-up toy example.

B DERIVATIONS OF SECTION 4: SECOND-ORDER ADVERSARIAL REGULARIZER (SOAR)

B.1 RELAXATION Note the Boolean quadratic programming (BQP) problem in formulation ( 5) is NP-hard (Beasley, 1998; Lima & Grossmann, 2017) . Even though there exist semi-definite programming (SDP) relaxations, such approaches require the exact Hessian w.r.t. the input, which is computationally expensive to obtain for high-dimensional inputs. And even if we could compute the exact Hessian, SDP itself is a computationally expensive approach, and not suitable to be within the inner loop of a DNN training. As such, we relax the ∞ constraint to an 2 constraint, which as we see, leads to a computationally efficient solution.

B.2 THE ISSUE RELATED TO THE LOOSENESS OF THE BOUND IN EQ (7)

In the ICLR rebuttal phase, the reviewer pointed out that, from the perspective of the volume ratio between the two p balls, replacing δ ∞ ≤ ε with δ 2 ≤ √ dε can be problematic since the volume of {δ : δ ∞ ≤ ε} is 2 d ε d whereas the volume of {δ : δ 2 ≤ (d)ε} is π d/2 Γ(1+d/2) d d/2 ε d . Their ratio goes to 0 as the dimension increases. The implication is that the search space for the ∞ maximizer is infinitesimal compared to the one for the 2 maximizer, leading to a loose upper-bound. As a preliminary study on the tightness of the bound, we evaluated the two slides of ( 7) by approximating the maximum using PGD attacks. In particular, we approximate max ||δ||∞≤ (x + δ) using (x + δ ∞ ) where δ ∞ is generated using 20-iteration ∞ -PGD with = 8 255 . Similarly, we approximate max ||δ||2≤ √ d (x + δ) using (x + δ 2 ) where δ 2 is generated using 100-iteration 2 -PGD with = 1.74. The reason for this particular configuration of attack parameter is to match the ones used during our previous evaluations. From this preliminary study, we observe that there is indeed a gap between the approximated LHS and RHS of ( 7), and thus, we think it is a valuable future research direction to explore other possibilities that allow us to use a second-order approximation to study the worst-case loss subject to an constrained perturbation.

B.3 UNIFIED OBJECTIVE

We could maximize each term inside (8) separately and upper bound the max by max δ 2 ≤ √ dε ∇ (x) δ + max δ 2 ≤ √ dε 1 2 δ ∇ 2 (x)δ = √ dε ∇ (x) 2 + 1 2 dε 2 σ max (∇ 2 (x)) , where σ max (∇ 2 (x)) is the largest singular value of the Hessian matrix, ∇ 2 (x). Even though the norm of the gradient and the singular value of the Hessian have an intuitive appeal, separately optimizing these terms might lead to a looser upper bound than necessary. The reason is that the maximizer of the first two terms are argmax ∇ (x) δ = ∇ (x) ∇ (x) 2 and the direction corresponding to the largest singular value of ∇ 2 (x). In general, these two directions are not aligned.

B.4 PROOF OF PROPOSITION 1

Proof. By the inclusion of the ∞ -ball of radius ε within the 2 -ball of radius √ dε and the definition of H in (6), we have max δ ∞ ≤ε ˜ 2nd (x) ≤ max δ 2 ≤ √ dε ˜ 2nd (x) = max δ 2 ≤ √ dε (x) + 1 2 δ 1 ∇ 2 (x) ∇ (x) ∇ (x) 1 δ 1 - 1 2 = (x) + 1 2 max δ 2 ≤ √ dε δ 1 H δ 1 - 1 ≤ (x) + 1 2 max δ 2 ≤ √ dε 2 +1 δ Hδ - 1 2 . It remains to upper bound max δ 2 ≤ε δ Hδ with ε = √ dε 2 + 1. We use the Cauchy-Schwarz inequality to obtain max δ 2 ≤ε δ Hδ ≤ max δ 2 ≤ε δ Hδ ≤ max δ 2 ≤ε δ 2 Hδ 2 = ε max δ 2 ≤ε Hδ 2 = ε 2 H 2 , where the last equality is obtained using properties of the 2 -induced matrix norm (this is the spectral norm). Since computing H 2 would again require the exact input Hessian, and we would like to avoid it, we further upper bound the spectral norm by the Frobenius norm as H 2 = σ max (H) ≤ H F . The Frobenius norm itself satisfies H F = Tr(H H) = E [ Hz 2 ] , where z ∼ N (0, I (d+1)×(d+1) ). Therefore, we can estimate H F by sampling random vectors z and compute the sample average of Hz 2 .

C SOAR ALGROITHM: A COMPLETE ILLUSTRATION

In Algorithm 1, we present the inner-loop operation of SOAR using a single data point. Here we summarize the full training procedure with SOAR in Algorithm 2. Note that it is presented as if the optimizer is SGD, but we may use other optimizers as well. Algorithm 2: Improving adversarial robustness via SOAR Input :Training dataset. Learning rate β, training batch size b, number of iterations N , ∞ constraint of ε, Finite difference step-size h. Initialize network with pre-trained weight w; for i ∈ {0, 1, . . . , N } do Get mini-batch B = {(x 1 , y 1 ) , • • • , (x b , y b )} from the training set. for j = 1, . . . , m (in parallel) do x j ← x j + η, where η ← (η 1 , η 2 , . . . , η d ) and η i ∼ U(-ε 2 , ε 2 ). x j ← Π B(xj , ε 2 ) x j + ε 2 sign (∇ x j (x j )) where Π is the projection operator. Sample z ∼ N (0, I (d+1)×(d+1) ). Compute the SOAR regularizer R(x j ; z, h) as (11). Compute the pointwise objective: SOAR (x j , y j ) = (x j , y j ) + R(x j ; z, h). end We summarize the average value of the highest probability output for test set data initialized with zero, random and PGD1 perturbations in Table 5 . We notice that training with SOAR using zero or random initialization leads to models with nearly 100% confidence on their predictions. This is aligned with the analysis of SOAR for a linear classifier (Section 4.1), which shows that the regularizer becomes ineffective as the model outputs high confidence predictions. Indeed, results in Table 7 show that those models are vulnerable under black-box attacks. w i+1 ← w i -β × 1 b b j=1 ∇ wi SOAR . end

D POTENTIAL CAUSES OF GRADIENT MASKING

Results in Table 5 suggest that highly confident predictions could be an indication for gradient masking. We demonstrate this using the gradient-based PGD attack. Recall that we generate PGD attacks by first initializing the clean data x n with a randomly chosen η within the ∞ ball of size ε, followed by gradient ascent at x n + η. Suppose that the model makes predictions with 100% confidence on any given input. This leads to a piece-wise loss surface that is either zero (correct predictions) or infinity (incorrect predictions). The gradient of this loss function is either zero or undefined, and thus making gradient ascent ineffective. Therefore, white-box gradient-based attacks are unable to find adversarial examples.

E.5 ROBUSTNESS UNDER 2 ATTACKS ON CIFAR-10

We evaluate SOAR and two of the baseline methods, ADV and TRADES, against 2 white-box and black-box attacks on CIFAR-10 in Table 9 . No 2 results were reported by MART and we are not able to reproduce the 2 results using the implementation by MMA, thus those two methods are not included in our evaluation. In Section 4, we show that the ∞ formulation of SOAR with δ ∞ = ε is equivalent to the 2 formulation of SOAR with δ 2 = ε √ d. In other words, models trained with SOAR to be robust against ∞ attacks with ε = 8 255 should also obtain improved robustness against 2 attacks with ε = 8 255 √ 32 * 32 * 3 = 1.74. In our evaluation, all 2 adversaries used during ADV and TRADES are generated with 10-step PGD (ε = 1.74) and a step size of 0.44. Note that the goal here is to show the improved robustness of SOAR against 2 attacks other than being SOTA, thus the optimization procedures are the same as the ones used in the ∞ evaluation. We observe that training with SOAR improves the robustness of the model against 2 attacks. Instead of a fixed 2 norm, we demonstrate the improved robustness using an increasing range of ε. For all attacks, we use 100 iterations of PGD and a step size of 2.5ε 100 . In Table 9 , we find that training with SOAR leads to a significant increase in robustness against white-box and black-box 2 adversaries. As ε increases, SOAR model remain robust against white-box 2 attacks (ε = 1), while other methods falls off. The last column of Table 9 shows the robustness against transferred 2 attacks (ε = 1.74). The source model is a ResNet10 network trained separately from the defence models on the unperturbed training set. We observe that SOAR achieves the second highest robustness compared to baseline methods against transferred 2 attacks. This result empirically verifies our previous claim that 2 and ∞ formulation of SOAR only differs by a factor of √ d. Moreover, it aligns with findings by Simon-Gabriel et al. (2019) , that empirically showed adversarial robustness through regularization gains robustness against more than one norm-ball attack at the same time. We use the same ResNet-10 architecture as the one for CIFAR-10 evaluation. Training data is augmented with random crops and horizontal flips. For Standard training, we use the same optimization procedure as the one used for CIFAR-10. For SOAR and TRADES, we use the exact same hyperparameter for the regularizer. For SOAR, we use early-stopping at epoch 130 to prevent catastrophic over-fitting. Besides, the optimization schedule is identical for SOAR and TRADES as the ones used for CIFAR-10. We emphasize again that the goal of evaluting using SVHN is to demonstrate the improved robustness with SOAR on a different dataset, thus we did not perform an additional hyper-parameter sweep. The optimization procedures are the same as the ones used in the CIFAR-10 evaluation. 11 . For MART, we were not able to translate their CIFAR-10 results on SVHN. We performed the same hyperparameter sweep as the one in Table 18 , as well as different optimization settings, but none resulted in a meaningful model. It is likely that the potential cause is the small capacity of ResNet-10. For MMA, the implementation included in its public repository is very specific to the CIFAR-10 dataset, so we did not include it in the comparison. Overall, we observe a similar performance on SVHN vs. on CIFAR-10. Compared to the result in Table 1 , we observe a slight increase in standard accuracy and robust accuracy for both SOAR and TRADES. In particular, the standard accuracy increases by 8.87% and 3.28%, and the PGD20 accuracy increases by 3.52% and 2.93% for TRADES and SOAR respectively. More notably, we observe on SVHN that SOAR regularized model gains robustness without significantly sacrificing its standard accuracy. Table 12 compares the performance of SOAR to TRADES on SimBa and on transferred ∞ attacks. The evaluation setting for transferred attacks is identical to the one used for CIFAR-10, where we use an undefended independently trained ResNet-10 as the source model. Despite a smaller gap on the accuracy against transferred attacks, we see that SOAR regularized model yields a significant higher accuracy against the stronger SimBA attacks. Note that we did not perform any extensive hyperparameter sweep on SVHN, and we simply took what worked on CIFAR-10. We stress that the goal is to demonstrate the effectiveness of SOAR, and its performance relative to other baseline methods. Next, we evaluate SOAR and TRADES under 2 bounded white-box and black-box attacks. All 2 PGD adversaries are generated using the same method as the one in the evaluation for CIFAR-10. Also, we do not include ADV due to the same result discussed above. Our results show that training with SOAR significantly improves the robustness against 2 PGD white-box attacks compared to TRADES. For transferred attacks, TRADES and SOAR performs similarly.

E.7 CHALLENGES

Batch Normalization: We observe that networks with BatchNorm layers do not benefit from SOAR in adversarial robustness. Specifically, we performed an extensive hyper-parameter search for SOAR on networks with BatchNorm layers, and we were not able to achieve meaningful improvement The SOAR regularizer is derived based on the second-order Taylor approximation of the loss which requires the loss to be twice-differentiable. Although ReLU is not differentiable at 0, the probability of its input being at exactly 0 is very small. That is also why we can train ReLU networks through backpropagation. This is true for the Hessian too. In addition, notice that from a computation viewpoint, we never need to compute the exact Hessian as we approximate it through first-order approximation.

E.9 POTENTIAL ROBUSTNESS GAIN WITH INCREASING CAPACITIES

Empirical studies in Madry et al. (2018) and Wang et al. (2020) reveal that their approaches benefit from increasing model capacity to achieve higher adversarial robustness. We have a similar observation with SOAR. In the ∞ attack setting, PGD uses the sign of the gradient sign(∇ x (x)) to generate perturbations. As such, one way to verify the strength of gradient is to measure the average number of none-zero elements in the gradient. A model with gradient masking is expected to have much less non-zero elements than one without. In our experiment, the average non-zero element in gradient is 3072 for ADV trained (no GM), 3069 for SOAR (PGD1-init) and 1043 for SOAR (zero-init, has GM). We observe that SOAR with PGD1-init has a similar number of non-zero gradient elements compared to ADV, meaning PGD adversary can use sign of those non-zero gradient elements to generate meaningful perturbations. In Section 5, the 20-iteration ∞ PGD adversaries are generated with a step-size of 2 255 and ε = 8 255 . Suppose we use ε = 1 instead of ε = 8 255 and other parameters remain the same, that is, we allow the maximum ∞ perturbation to reach the input range ([0, 1]) and generate PGD20 attacks. We observe such attacks result in near black-and-white images on SOAR with PGD-1 init; it has a 0% accuracy against such PGD20 attacks, similar to the 3.3% on ADV trained model. On the other hand, the robust accuracy for SOAR (zero-init) is 9.7%.

E.12 HYPERPARAMETER SWEEP FOR TRADES, MART AND MMA ON RESNET

The following results show the hyperparameter sweep on TRADES, MART and MMA respectively. We include the one with the highest PGD20 accuracy in Section 5. 



1 incorporates the effect of adversary in exact form. Nonetheless, there are two limitations of this approach. The first is that it is designed for a particular choice of attack, an FGSM-like one. We would like a regularizer that is robust to a larger class A similar, but more complicated result, would hold if the adversary could also attack the first dimension. https://github.com/BorealisAI/mma_training https://github.com/MadryLab/cifar10_challenge https://github.com/cg563/simple-blackbox-attack



Several regularization-based alternatives to adversarial training have been proposed. Simon-Gabriel et al. (2019) studied regularization under the first-order Taylor approximation. The proposed regularizer for the ∞ perturbation set is the same as FOAR. Qin et al. (2019) propose local linearity regularization (LLR), where the local linearity measure is defined by the maximum error of the firstorder Taylor approximation of the loss. LLR minimizes the local linearity mesaure, and minimizes the magnitude of the projection of gradient along the corresponding direction of the local linearity mesaure. It is motivated by the observation of flat loss surfaces during adversarial training.

ROBUSTNESS AGAINST BLACK-BOX ATTACKSMany defences only reach an illusion of robustness through methods collectively known as gradient masking(Athalye et al., 2018). These methods often fail against attacks generated from an undefended independently trained model, known as transfer-based black-box attacks. Recent works(Tramèr et al., 2017;Ilyas et al., 2019) have proposed hypotheses for the success of transfer-based black-box attacks. In our evaluation, the transferred attacks are PGD20 and PGD1000 perturbations generated from two source models: ResNet and WideResNet, which are denoted by the suffix -R and -W respectively. The source models are trained separately from the defence models on the unperturbed training set. Additionally,Tramer et al. (2020) recommends score-based black-box attacks such as SimBA(Guo et al., 2019). They are more relevant in real-world applications where gradient information is not accessible, and are empirically shown to be more effective than transfer-based attacks. Because they are solely based on the confidence score of the model, score-based attacks are resistant to gradient-masking. All black-box attacks in this section are ∞ constrained at ε = 8/255.

Designing an adversarial robust estimator can be formulated as a robust optimization problem(Huang et al., 2015;Madry et al., 2018;Wong & Kolter, 2018;Shaham et al., 2018). To describe it, let us introduce our notations first. Consider an input space X ⊂ R d , an output space Y, and a parameter (or hypothesis) space W, parameterizing a model f : X × W → Y. In the supervised learning scenario, we are given a data distribution D over pairs of examples {(X i , Y i )} n i=1

Performance on CIFAR-10 against ∞ bounded white-box PGD attacks (ε = 8/255).

2018) andWang et al. (2020) reveal that their approaches benefit from increasing model capacity to achieve higher adversarial robustness, as such, we include WideResNet(Zagoruyko & Komodakis, 2016) for all baseline methods. We were not able to reproduce the results of two closely related works, CURE and LLR, which we discuss further in Appendix E.1. In Appendix E.13, we compare SOAR and FOAR with different initializations. FOAR achieves the best adversarial robustness using PGD1 initialization, so we only present this variation of FOAR in this section. Performance on CIFAR-10 against ∞ bounded black-box attacks (ε = 8/255).

Performance of the ResNet models on CIFAR-10 against the four ∞ bounded attacks used as an ensemble in AutoAttack (ε = 8/255, 6/255, 4/255). APGD-CE is based on the cross-entropy loss similar to(Madry et al., 2018), and APGD-DLR is based on the logit difference similar to(Carlini & Wagner, 2017).

Comparing (x + δ ∞ ) and (x + δ 2 ). We approximate δ ∞ and δ 2 using the PGD attacks with their corresponding p norm.

Average value of the highest probability output for all test set data, that is,1 N Σ Nn=1 max i∈1,2,...,c P (x n ) i , where P (x n ) i represent the probability of class i given data x n .

2 robustness of the adversarially trained model (under 2 formulations) at different epsilon values. 100-step PGD is used for all attacks. Accuracy (%) against 2 -PGD attacks.

Performance of the ResNet models on CIFAR-10 against the four 2 -bounded attacks used as an ensemble in AutoAttack (ε = 1.74, 1.0, 0.5).

Performance on SVHN against ∞ bounded white-box attacks (ε = 8/255).

Performance on SVHN against ∞ bounded black-box attacks (ε = 8/255). For PGD10 adversarial training, we observe that ResNet-10 is not able to learn anything meaningful. Specifically, when trained with PGD10 examples, ResNet-10 does not perform better than a randomlyinitialized network in both standard and adversarial accuracy. Cai et al. (2018) made a similar observation on ResNet-50, where training accuracy is not improving over a long period of adversarial training with PGD10. They further investigated models with different capacities and found that even ResNet-50 might not be sufficiently deep for PGD10 adversarial training on SVHN. Wang & Zhang (2019) reported PGD10 adversarial training result on SVHN with WideResNet, which we include in Table

Performance on SVHN against 2 bounded white-box and black-box attacks. 100-step PGD is used for all attacks. . A related work byGalloway et al. (2019) focuses on the connection between BatchNorm and adversarial robustness. In particular, their results show that on VGG-based architecture(Simonyan & Zisserman, 2014), there is a significant gap in adversarial robustness between networks with and without BatchNorm layers under standard training. Needless to say, the interaction between SOAR and BatchNorm requires further investigations, and we consider this as an important future direction. As such, we use a small-capacity ResNet (ResNet-10) in our experiment, and modified it by removing its BatchNorm layers. Specifically, we removed BatchNorm layers from all models used in the baseline experiments with ResNet. Note that BatchNorm layers makes the training process less sensitive to hyperparameters(Ioffe & Szegedy, 2015), and removing them makes it difficult to train a very deep network such as WideResNet. As such, we did not perform SOAR on WideResNet.Starting from pretrained model: We notice that it is difficult to train with SOAR on a newlyinitialized model. Note that it is a common technique to perform fine-tuning on a pretrained model for a specific task. In CURE, regularization is performed after a model is first trained with a cross-entropy loss to reach a high accuracy on clean data. They call the process adversarial fine-tuning. Cai et al. (2018); Sitawarin et al. (2020) study the connection between curriculum learning(Bengio et al., 2009) and training using adversarial examples with increasing difficulties. Our idea is similar. The model is first optimized for an easier task (standard training), and then regularized for a related, but more difficult task (improving adversarial robustness). Since the model has been trained to minimize its standard loss, the loss gradient can be very small compared to the regularizer gradient, and thus we apply a clipping of 10 on the regularizer. We observe that when the model achieves a high adversarial accuracy and continues training for a long period of time, both the standard and adversarial accuracy drop significantly. A similar phenomenon was observed in(Cai et al., 2018; Wong et al., 2019a), which they refer to as catastrophic forgetting and catastrophic over-fitting respectively. Wong et al. (2019a) use early-stopping as a simple solution. We observe that with a large learning rate, the model reaches a high adversarial accuracy faster and catastrophic over-fitting happens sooner. As such, our solution is to fix the number of epochs to 200 and then carefully sweep over various learning rates to make sure that catastrophic over-fitting do not happen.Discussion on Computation Complexity: We emphasize that our primary goal is to propose regularization as an alternative approach to improving adversarial robustness. We discussed techniques towards an efficient implementation, however, there is still potential for a faster implementation. In our current implementation, a single epoch with WideResNet takes: 19 mins on PGD10 adversarial training, 26.5 mins on SOAR, 29 mins on MART, and 39.6 mins on TRADES. We observe that despite being a faster method than MART and TRADES, SOAR is still quite slow compared to PGD10 adversarial training. We characterize the computation complexity as a function of the number of forward and backward passes required for a single mini-batch.

Performance on CIFAR-10 against ∞ bounded white-box attacks (ε = 8/255).

Table14compares the performance of SOAR against ∞ bounded white-box attacks on networks with different capacities. CNN6(CNN8) refers to a simple 6-layer(8-layer) convolutional network, and ResNet-10 is the network we use in Section 5. Evidently, as network capacity increases, we observe improvements in both standard accuracy and adversarial accuracy. As such, we expect a similar gain in performance with larger capacity networks such as WideResNet.E.10 EXPERIMENT RESULTS ON RESNET10 IN TABLE 1 AND TABLE 2 WITH STANDARDDEVIAITIONSAll results on ResNet10 are obtained by averaging over 3 independently initialized and trained models. Here, we report the standard deviation of the results in Table1 and Table 2. Notice we omit results on PGD100 and PGD200 due to space constraint. Performance on CIFAR-10 against ∞ bounded white-box attacks (ε = 8/255). (Table1with standard deviations)

Performance on CIFAR-10 against ∞ bounded black-box attacks (ε = 8/255). (Table 2 with standard deviations) MART 48.57%(0.99) 72.99%(0.90) 74.91%(1.15) 72.99%(0.87) 75.04%(1.12) MMA 43.53%(1.25) 78.70%(0.09) 80.39%(1.20) 78.72%(0.09) 81.35%(0.17) FOAR 35.97%(0.26) 63.56%(0.30) 65.20%(0.21) 63.60%(0.33) 65.27%(0.32) SOAR 68.57%(0.95) 79.25%(0.45) 86.35%(0.04) 79.49%(0.29) 86.47%(0.10) E.11 ADDITIONAL EXPERIMENTS ON GRADIENT MASKING To verify that SOAR improves robustness of the model without gradient masking, we include the following experiments to empirically support our claim. First, from the result in Appendix E.3, we conclude that SOAR with zero initilaization results in gradient masking. This is shown by the high accuracy (89.24%, close to standard accuracy) under white-box PGD attacks and low accuracy (2.86%) under black-box transferred attacks. Next, prior work has verified that adversarial training with PGD20 adversaries (ADV) results model without gradient maskingAthalye et al. (2018). Therefore, let us use models trained using ADV and SOAR(zero-init) as examples of models with/without gradient masking respectively.

Hyperparameter sweep of TRADES on ResNet: evaluation based performance on CIFAR-10 against ∞ bounded adversarial perturbations (ε = 8/255).

Hyperparameter sweep of MART on ResNet: evaluation based performance on CIFAR-10 against ∞ bounded adversarial perturbations (ε = 8/255).

Hyperparameter sweep of MMA on ResNet: evaluation based performance on CIFAR-10 against ∞ bounded adversarial perturbations (ε = 8/255). .13 ADVERSARIAL ROBUSTNESS OF THE MODEL TRAINED USING FOAR WITH DIFFERENT INITIALIZATIONSFOAR achieves the best adversarial robustness using PGD1 initialization, so we only present this variation of FOAR in Section 5.

Performance of FOAR with different initializations on CIFAR-10 against white-box and transfer-based black-box ∞ bounded adversarial perturbations (ε = 8/255).

6. CONCLUSION

This work proposed SOAR, a regularizer that improves the robustness of DNN to adversarial examples. SOAR was obtained using the second-order Taylor series approximation of the loss function w.r.t. the input, and approximately solving the inner maximization of the robust optimization formulation. We showed that training with SOAR leads to significant improvement in adversarial robustness under ∞ and 2 attacks. This is only one step in designing better regularizers to improve the adversarial robustness. Several directions deserve further study, with the prominent one being SOAR's vulnerabilities to AutoAttack. Another future direction is to understand the loss surface of DNN better in order to select a good point around which an accurate Taylor approximation can be made. This is important for designing regularizers that are not affected by gradient masking.

E SUPPLEMENTARY EXPERIMENTS E.1 DISCUSSION ON THE REPRODUCIBILITY OF CURE AND LLR

We were not able to reproduce results of two closely related works, CURE (Moosavi-Dezfooli et al., 2019) and LLR (Qin et al., 2019) . For CURE, we found the open-source implementation 2 , but were not able to reproduce their reported results using their implmentation. We were not able to reproduce the results of CURE with our own implementation either. For LLR, Yang et al. (2020) were not able to reproduce the results, they also provided an open-source implementation 3 . Regardless, we compare SOAR to the reported result by CURE and LLR in Table 6 : ). Note that we remove the BatchNorm layers in the ResNet-10 architecture, and we discuss this further in Appendix E.7 .WideResNet: We used the implementation 5 of WideResNet-34-10 model found in public repository maintained by the authors of TRADES (Zhang et al., 2019) .Standard training on ResNet and WideResNet: Both are trained for a total of 200 epochs, with an initial learning rate of 0.1. The learning rate decays by an order of magnitude at epoch 100 and 150. We used a minibatch size of 128 for testing and training. We used SGD optimizer with momentum of 0.9 and a weight decay of 2e-4.

Adversarial training with PGD10 examples on ResNet:

The optimization setting is the same as the one used for standard training. Additionally, to ensure that the final model has the highest adversarial robustness, we save the model at the end of every epoch, and the final evaluation is based on the one with the highest PGD20 accuracy.SOAR on ResNet: SOAR refers to continuing the training of the Standard model on ResNet. It is trained for a total of 200 epochs with an initial learning rate of 0.004 and decay by an order of magnitude at epoch 100. We used SGD optimizer with momentum of 0.9 and a weight decay of 2e-4. We use a FD step-size h = 0.01 for the regularizer. Additionally, we apply a clipping of 10 on the regularizer, and we discuss this clipping operation in Appendix E.7 .MART and TRADES on ResNet: We used the same optimization setup as the ones in their respective public repository 6 . We briefly summarize it here. The model is trained for a total of 120 epochs, with an initial learning rate of 0.1. The learning rate decays by an order of magnitude at epoch 75, 90, 100. We used SGD optimizer with momentum of 0.9 and a weight decay of 2e-4. We performed a hyperparameter sweep on the strength of the regularization term β and selected one that resulted in the best performance against PGD20 attacks. A complete result is reported in Appendix E.12 .2 https://github.com/F-Salehi/CURE_robustness MMA on ResNet: We used the same optimization setup as the one in its public repository 7 . We briefly summarize it here. The model is trained for a total of 50000 iterations, with an initial learning rate of 0.3. The learning rate changes to 0.09 at the 20000 iteration, 0.03 at the 30000 iteration and lastly 0.009 at the 40000 iteration. We used SGD optimizer with momentum of 0.9 and a weight decay of 2e-4. We performed a hyperparameter sweep on the margin term and selected the one that resulted in the best performance against PGD20 attacks. A complete result is reported in Appendix E.12 .ADV, TRADES, MART and MMA on WideResNet: We use the pretrained checkpoint provided in their respective repositories. Note that we use the pretrained checkpoint for PGD10 adversarially trained WideResNet in Madry's CIFAR10 Challenge We report the adversarial robustness of the model trained using SOAR with different initialization techniques in Table 7 . The second column shows the accuracy against white-box PGD20 adversaries. The third column shows the accuracy against black-box PGD20 adversaries transferred from an independently initialized and standard-trained ResNet-10 model. Note that despite the high adversarial accuracy against white-box PGD attacks, models trained using SOAR with zero and random initialization perform poorly against transferred attacks. This suggests the presence of gradient masking when using SOAR with zero and random initializations. Evidently, SOAR with PGD1 initialization alleviates the gradient masking problem. Suppose we slightly modify Eq (13) by SOAR (x, y, n) = (x , y) + 1 n n i=0 R(x ; z (i) , h) to incorporate the effect of using multiple randomly sampled vectors z (i) in computing the SOAR regularized loss. Notice that the current implementation is equivalent to using n = 1. We observed the model at two checkpoints, at the beginning and the end of SOAR regularization, the value of the regularized loss remains unchanged as we increase n from 1 to 100.

