COMPOSITE ADVERSARIAL TRAINING FOR MULTIPLE ADVERSARIAL PERTURBATIONS AND BEYOND Anonymous

Abstract

One intriguing property of deep neural networks (DNNs) is their vulnerability to adversarial perturbations. Despite the plethora of work on defending against individual perturbation models, improving DNN robustness against the combinations of multiple perturbations is still fairly under-studied. In this paper, we propose composite adversarial training (CAT), a novel training method that flexibly integrates and optimizes multiple adversarial losses, leading to significant robustness improvement with respect to individual perturbations as well as their "compositions". Through empirical evaluation on benchmark datasets and models, we show that CAT outperforms existing adversarial training methods by large margins in defending against the compositions of pixel perturbations and spatial transformations, two major classes of adversarial perturbation models, while incurring limited impact on clean inputs.

1. INTRODUCTION

Despite their state-of-the-art performance in tasks ranging from computer vision (Szegedy et al., 2016) to natural language processing (Seo et al., 2017) , deep neural networks (DNNs) are inherently susceptible to adversarial examples (Szegedy et al., 2014) , which are maliciously crafted samples to deceive target DNNs. A flurry of adversarial attacks have been proposed, which craft adversarial examples via either pixel perturbation (Goodfellow et al., 2015b; Moosavi-Dezfooli et al., 2016; Carlini & Wagner, 2017a) or spatial transformation (Engstrom et al., 2017; Xiao et al., 2018; Alaifari et al., 2019) . To defend against such attacks, a line of work attempts to improve DNN robustness by developing new training and inference strategies (Kurakin et al., 2017; Guo et al., 2018; Liao et al., 2018; Tramèr et al., 2018 ). Yet, the existing defenses are often circumvented or penetrated by adaptive attacks (Athalye et al., 2018) , while adversarial training (Madry et al., 2018; Shafahi et al., 2019) proves to be one state-of-the-art defense that still stands against adaptive attacks. While most adversarial training methods are primarily designed for individual attacks which are either fixed (Madry et al., 2018) or selected from a pre-defined pool (Tramèr & Boneh, 2019; Maini et al., 2020) , in realistic settings, the adversary is not constrained to individual perturbation models but free to "compose" multiple perturbation models to construct more powerful attacks. Despite their robustness against individual attacks, the DNNs trained using existing methods often fail to defend against such composite attacks (details in § 2). Moreover, the existing adversarial training methods focus on pixel perturbation-based attacks (e.g., bounded by p -norm balls), while the research on training DNNs robust against spatial transformation-based attacks is still limited. To bridge this striking gap, in this paper, we present CAT, a novel adversarial training method able to flexibly integrate and optimize multiple adversarial robustness losses, which leads to DNNs robust with respect to multiple individual perturbation models as well as their "compositions". Specifically, CAT assumes an attack model that composes multiple perturbations and, while bounded by the overall perturbation budget, optimally allocates the budget to each iteration. To solve the computational challenges of this formulation, we extend the recent advances on fast projection to p,1 mixed-norm ball (Liu & Ye, 2010; Sra, 2012; Béjar et al., 2019) to our setting and significantly improve the optimization efficiency. We validate the efficacy of CAT on benchmark datasets and models. For instance, on MNIST, CAT outperforms alternative adversarial training methods (Tramèr & Boneh, 2019) by over 44% in terms of adversarial accuracy against attacks that combine pixel perturbation and spatial transformation (details in § 4), with comparable clean accuracy and training efficiency. Our contributions can be summarized as follows. First, we demonstrate that a new class of adversarial attacks, which "compose" multiple perturbations, render most existing adversarial training methods ineffective; then, we propose CAT, the first adversarial training method designed for multiple perturbation models as well as their compositions; further, we validate the efficacy of CAT by comparing it against alternative methods on benchmark datasets and DNNs; finally, we explore the optimization space of composite perturbations, leading to several promising research directions.

2.1. ADVERSARIAL TRAINING

Adversarial training is a class of techniques to train robust DNNs by minimizing the worst-case loss with respect to a given adversarial perturbation model. Formally, let f θ be a DNN parameterized by θ, the loss function, and D train = {x i , y i } n i=1 the training set. Then the adversarial training with respect to an p adversary with perturbation magnitude is defined as: θ * = arg min θ i max δ∈Bp( ) (f θ (x i + δ), y i ), where B p ( ) = {δ : δ p ≤ } is the p -norm ball of radius . Here, the inner maximization problem essentially defines the target adversarial attack. For instance, instantiating it as the ∞ projected gradient descent (PGD) attack leads to the well-known PGD adversarial training. Despite their effectiveness against considered perturbation models (e.g., ∞ perturbation), the existing adversarial training methods often fail to defend against perturbations that they are not designed for (Tramèr & Boneh, 2019) . Motivated by this, some recent work explores conducting adversarial training with respect to multiple perturbation models simultaneously. AVG and MAX -Tramèr & Boneh (2019) propose two methods, AVG and MAX, to aggregate multiple perturbations. Specifically, AVG formulates the robustness optimization as: θ * = arg min θ i p∈A max δp∈Bp( ) (f θ (x i + δ p ), y i ) where A = {1, 2, ∞}. Compared with Eq. 1, Eq. 2 aggregates multiple adversarial perturbations in the inner loop. Similarly, instead of averaging multiple perturbations, MAX selects the perturbation resulting in the largest loss: θ * = arg min θ i max {δ∈Bp( )|p∈A} (f θ (x i + δ), y i ) If A contains only one adversarial perturbation, Eq. 3, Eq. 2, and Eq. 1 are all equivalent. MSD -While AVG and MAX achieve varying degrees of robustness to the considered perturbations, it is practically difficult to minimize the worst-case loss with respect to the union of perturbations. To this end, Maini et al. (2020) propose multiple steepest descent (MSD) which improves MAX (and AVG) along two aspects. First, it selects the largest (or average) perturbation at each inner iteration; Second, it applies the steepest descent instead of the projected gradient descent in generating adversarial inputs. Formally, MSD formulates the optimization at the t-th iteration as: δ (t+1) p = Proj Bp( ) δ (t) + v p (δ (t) ) for p ∈ A (4) δ (t+1) = arg max δ (t+1) p (f θ (x + δ (t+1) p ), y) where Proj C (•) is the projection operator onto the convex set C, and v p (δ (t) ) is the steepest descent direction for p perturbation, v p (δ) = arg max v p ≤λ v T ∇ (f θ (x + δ), y), where λ is the step size.

2.2. COMPOSITE ADVERSARIAL ATTACK

While the existing adversarial training methods seem effective against individual perturbation models which are either fixed or selected from a pre-defined pool (i.e., the union of perturbations), in a realistic setting, the adversary is able to combine multiple perturbation models to construct more destructive attacks, which we exemplify with a new class of composite adversarial attacks. Intuitively, the attack constructs an adversarial example by applying a sequence of perturbation models {A i } m i=1 , each A i bounded by an independent perturbation budget i . Here, we treat A i as an abstract operator A i (•, i ) (e.g., pixel perturbation or spatial transformation), which applies the corresponding perturbation (bounded by i ) over the output of its previous perturbation: x (i) = x (i-1) + δ i-1 , δ i = A i (x (i) , i ) for i = 1, . . . , m We can further generalize the attack to a more flexible setting in which the adversary, while bounded by the overall perturbation budget, is able to optimally allocate the budget to each optimization iteration. The details of this generalization are discussed in Appendix B. Figure 1 illustrates samples generated by the composite adversarial attack which combines one pixel perturbation (with budget p ) and one spatial transformation (with budget f ). Given attacks {A i } m i=1 with A i bounded by i , in the composite attack, we re-scale the perturbation budget by a factor of 1/m (i.e., i /m for A i ), to make the composition of {A i } m i=1 comparable with the union attack. Intuitively, with proper setting of { i } m i=1 , the composition of {A i } m i=1 is strictly stronger than each individual attack as well as their union (detailed proofs in Appendix A). Figure 2 compares the robust accuracy of AVG, MAX, and MSD on the unions and compositions of multiple perturbations. Observe that while effective on the union attacks, the existing methods fail to defend against the composite attacks, with robust accuracy drop as large as 40%. 

3. CAT: COMPOSITE ADVERSARIAL TRAINING

We now present CAT, a new adversarial training method to defend against multiple perturbations as well as their compositions.

3.1. FORMULATION

At a high level, CAT adopts the composite adversarial attack as the inner loop of Eq. 1, which generates adversarial examples through a sequence of perturbations: x * = max {δi} m i=1 (f θ (x + m i=1 δ i ), y) s.t. δ i ≤ i for i = 1, . . . , m As concrete instances, in the case of composing pixel perturbations (i.e., 1 , 2 , ∞ perturbations) with budget 1 , 2 , and ∞ , respectively, the straightforward composite adversarial examples is produced by x * = x + δ 1 + δ 2 + δ ∞ , where we omit the clipping operator. Next we consider composing pixel perturbation and spatial transformation (Xiao et al., 2018; Alaifari et al., 2019) with budget p and f respectively. We first give a brief introduction of spatial transformation. Instead of directly perturbing the values of pixels, spatial transformation displaces their coordinates. Formally, the input x is represented as a set of tuples x = {(u i , v i , b i )} n i=1 , where (u i , v i ) are the coordinates of x's i-th pixel and b i is its value. We set f = {(∆u i , ∆v i )} n i=1 (with abuse of notations here), where (∆u i , ∆v i ) are the displacement of x's i-th pixel. When we apply f to x to generate the adversarial input x , (u i , v i ) = (u i + ∆u i , v i + ∆v i ). Typically, bi-linear interpolation is applied to handle fractional pixel positions (Jaderberg et al., 2015) : Alaifari et al. (2019) , we measure the perturbation magnitude as f 's ∞ -norm: As composite adversarial attacks are by nature stronger than individual attacks, setting the perturbation budget overly large in CAT may cause accuracy degradation with respect to clean inputs. We propose a variant α-CAT to mitigate this issue. With 0 < 1 ≤ α as a hyper-parameter, we re-scale the perturbation budget of each component attack A i to α i during the adversarial training. b i = q∈N (ui,vi) b q (1 -|u i -u q |)(1 -|v i -v q |), where N (u i , v i ) is the neighboring points of (u i , v i ). Following f ∞ = max{max i |∆u i |, max i |∆v i |}. Max Max Max Attack A Attack B i-th Iteration Attack A Attack B i-th Iteration Attack A Attack B i-th Iteration Avg Attack A Attack B i-th Iteration Max MAX AVG MSD CAT

CAT-r

One issue with the above α-CAT formulation is the trade-off between attack strength of individual component attack and clean accuracy. With smaller α, we expect a high clean accuracy. However, the component attack might not strong enough to cover the original perturbation size. For instance, if we take α = 1 3 , the trained model may have low robust accuracy for each individual attack and hence their union. On the contrary, a very large α (close to 1) causes significant drop in the clean accuracy, make the robust model is not useful. Our work-round solution for this is to use smaller α to ensure clean accuracy, while we sample component attacks with replacement during adversarial training to enhance the robustness against individual attacks. In fact, under this implementation, we may sample a multiple stack of the same type attack, which corresponds to that attacker with larger perturbation size.

4. EMPIRICAL EVALUATION

We empirically evaluate the efficacy of CAT in various settings. All the experiments are performed on MNIST and CIFAR10 dataset. Specifically, our results convey two key messages. First, CAT trains DNNs robust against both composition perturbations and union perturbations in the p space ( 1 , 2 and ∞ ), suggesting that composite adversarial robustness is a generalization of the adversar-ial robustness with respect to the union of multiple perturbations. Second, CAT is able to train DNNs robust against both pixel perturbation (e.g., ∞ perturbation) and free-form spatial transformation. Models and Hyper-parameters -On MNIST, we use the network architecture used by both Maini et al. (2020) and Madry et al. (2018) , which is a DNN consisting of 4 convolutional layers, followed by 2 fully connected layers. On CIFAR10, we use a pre-activation version of ResNet32 (He et al., 2016) , which is build up with 16 residual blocks, followed by 1 global averaging pool layer, and 1 fully connected layer. Setting of α -As discussed in § 3.2, to minimize the performance degradation on clean inputs, we re-scale the allowed perturbation magnitude uniformly by α. To set α properly, we conduct a grid search within {1.0, 0.9, 0.8, 0.7, 0.6, 0.5} and select the optimal α value by optimizing the trained model's robust accuracy (under the union threat model) and clean accuracy (with less than 10.0% drop from all baselines under the union threat model). We implement all the algorithms with PyTorch and run all the experiments on a single Nvidia RTX 6000. The detailed setting of models and (hyper-)parameters is summarized in Appendix B.

4.1. ROBUSTNESS IN p SPACE

In the first part, we evaluate CAT and baselines on the regular pixel perturbation-based attacks. Baselines -We compare CAT with MAX-worst-case perturbation, AVG-average-case perturbation (Tramèr & Boneh, 2019) , and MSD (Maini et al., 2020) . Besides, we also include robust DNNs trained with PGD attacks with respect to individual perturbations. Component Attacks for CAT-We consider 3 commonly used p perturbations, namely ∞ , 2 , and 1 in CAT. For ∞ attack, we use ∞ PGD attack (Madry et al., 2018) . For 2 attack, we implement an 2 PGD adversary. For 1 attack, we use the enhanced 1 attack proposed in Maini et al. (2020) . Attacks Used for Evaluation -To evaluate the robustness of DNNs trained by CAT and baselines, we consider a collection of representative adversarial attacks. Individual perturbations -For ∞ attack, we consider both ∞ PGD attack and Fast Gradient Sign attack (Goodfellow et al., 2015a) ; for 2 attack, we consider 2 PGD, DeepFool (Moosavi-Dezfooli et al., 2016) , C&W attack (Carlini & Wagner, 2017b) , and Salt&Pepper attack (Rauber et al., 2017) ; for 1 attack, we use 1 PGD attack. Combined perturbations -Besides, we consider both unions and compositions of multiple perturbations. Following Maini et al. (2020) , in the union threat model, the adversary applies all the attacks on the given input and is considered successful if one of the attacks succeeds. In the composite threat model, we consider a set of composite attacks with A 1 = ∞ -PGD, A 2 = 2 -PGD, and A 3 = 1 -PGD. For the composite attacks, we measure the robust accuracy under different settings of re-scale factor α, which is defined similarly in CAT-α. Results -Table 1 and Table 2 summarize the results with respect to pixel perturbation-based attacks on CIFAR10 and MNIST. The perturbation budget for each type of p attack is shown in the tables. To measure models' performance on clean inputs, we use their accuracy on the test set. To evaluate the robustness of each model, we measure the robust accuracy of all the models on a random sample of 1,000 test inputs. The robust accuracy is defined as the fraction of test inputs that are misclassified initially or are predicted to wrong classes after an attack. In reporting robust accuracy, we aggregate results of all the attacks from each target norm. In other word, an input is correctly predicted after attacks for p norm if and only if all the attacks of p norms from above fail for the input. Similarly, for the union setting, all the attacks except combined perturbations are considered. On CIFAR10, with a slight degradation of clean accuracy, CAT achieves the same robustness accuracy as MSD under the union of three p perturbations and outperforms all other baselines. Furthermore, CAT is much more robust against composite adversarial attacks when α = 0.5, outperforming MSD and other baselines by over 10%. On MNIST, PGD-∞ (P ∞ ) achieves the best performance across all the settings, which however is attributed to the holding gradient masking effects (Tramèr & Boneh, 2019) . In Appendix C, we show the robust accuracy of models in discussion. Across all the other methods, CAT only performs slightly worse than MAX, AVG, and MSD under the union threat model. Under the composite threat model (α = 0.5), CAT outperforms baselines in terms of robust accuracy by margins over 20%. The above results indicate that CAT assumes a strong adversarial attack model during the adversarial training process and produces DNNs with robustness not only against individual perturbations (and their unions) but also against their compositions.

4.2. ROBUSTNESS AGAINST PIXEL PERTURBATIONS AND SPATIAL TRANSFORMATIONS

Next we evaluate CAT and baselines against compositions of pixel perturbations and spatial transformations. Component Attacks -For pixel perturbation, we consider ∞ PGD attack. For spatial transformation, we consider a projected gradient descent approach. The key differences between this formulation and ADef (Alaifari et al., 2019) are that 1) PGD is much faster than the DeepFool procedure proposed in ADef and 2) it uses free-form flows instead of the smoothed flows. Attacks Used for Evaluation -We evaluate CAT and baselines against the above individual perturbations as well as their unions and compositions. Baselines -Since MSD is only applicable to pixel perturbation-based attacks, we consider AVG and MAX as the baselines. Plus, we use two DNNs that are adversarially trained using the above piexl perturbation and spatial transformation respectively, which we refer to as P pixel and P flow . 3 and 4 summarize the results. We have the following observations. First, CAT achieves similar (on MNIST) or even better (on CIFAR10) robust accuracy than the baselines under the union threat model. Second, CAT outperforms all the baselines under the composite threat model by large margins. For instance, on MNIST, the robust accuracy of all the baselines drops to close to 0 even with α = 0.5 (i.e., half of the specified perturbation budget); in contrast, CAT attains 46% robust accuracy. 

4.3. ROBUSTNESS AGAINST COMPOSITE ATTACKS WITH VARYING α

Thus far, we have assumed the defender and attacker use the same setting of re-scaling factor (α = 0.5) in attacking the DNNs. Next, we evaluate the impact of varying α by the attacker on the robust accuracy. Figure 4 summarizes the results on CIFAR10 under α = 0.5, 0.8, and 1.0 (which correspond to stronger attacks). Observe that as expected, both CAT and baselines experience performance degradation under large α and yet, CAT still consistently outperforms all the baselines by large margins across all the settings.

5. EXPLORING THE SPACE OF COMPOSITE PERTURBATIONS

In this section, we empirically study the critical properties of composite adversarial attacks and CAT. One particular aspect is the impact of the ordering of component attacks on the adversarial training. Besides, we consider an even stronger composite attack which, bounded by overall perturbation budget, is able to optimally allocate the budget to each round.

5.1. ORDERING OF COMPONENT ATTACKS

We evaluate the impact of the ordering of component attacks under the setting of p perturbations only as well as compositions of pixel perturbations and spatial transformations. p Perturbations -In this set of experiments, we consider the compositions of ∞ -PGD and 2 -PGD attacks. Table 5 and 6 summarize the performance of CAT under the compositions of ∞ and 2 perturbations. We have the following observations. First, flipping the two attacks in CAT has little impact on both clean accuracy and robust accuracy of CAT. Second, observable from the last two columns of each table, the effectiveness of composite adversarial attacks seems also independent of the ordering of component attacks. 7 . Impact of the ordering of component attacks (pixel and spatial perturbations) in CAT on MNIST and CIFAR10 (A1 -A2 denotes the ordering of component attacks used in CAT and composite attacks, while the perturbation budget is the same as Table 3 and 4 ).

Clean Accuracy

∞ 2 Union ∞ -2 (0.5) 2 -∞ (0.5) ∞ = 0.3 -2 = 2 Pixel and Spatial Perturbations -Similarly, we empirically evaluate the impact of the ordering of pixel and spatial perturbations. Here we follow the same setup as § 4 (the same attacks and α for each dataset). The results are summarized in Table 7 . We observe that on CIFAR10, the ordering has fairly limited impact as in the p case; however, we find that first applying pixel perturbation and then spatial transformation results in more robust model on MNIST. We consider the study of the root cause of this interesting phenomenon as our ongoing work.

5.2. ROBUSTNESS AGAINST FINE-GRAINED COMPOSITE ADVERSARIES

Finally, we consider an extension of the basic composite adversarial attack (each perturbation is applied only once) to a multiple round setting. Under this setting, the adversary, while bounded by the overall perturbation budget, is able to optimally allocate the budget to each iteration, leading to even stronger attacks. Formulation and Solution -We sketch the differences between CAT and K-round CAT here, with full details deferred to the appendix B. Perturbation Accounting -For each component perturbation A i , instead of seeking a single δ i , we divide it into K parts δ i,k for k = 1, . . . , K. We measure the overall perturbation cost as the sum of { δ i,k p } K k=1 , where p is p norm. In other word, the new constraint is K k=1 δ i,k p ≤ i . Optimization Solution -This new constraints essentially specifies a 1,p mixed-norm ball constraint. Therefore, we can still solve this new optimization problem with projected gradient descent. We extend the recent advances (Liu & Ye, 2010; Béjar et al., 2019) to solve this new problem for the cases of p = 2 and p = ∞ respectively (details in the appendix B). Results -Figure 5 displays how K-round composite attack impacts the robust accuracy of CAT under the settings of p perturbations only as well as pixel plus spatial perturbations. As K increases, the robust accuracy decreases by 4% and 7% respectively on MNIST and CIFAR10 under the setting of pixel plus spatial perturbations, while the decrease is much less evident under the setting of p perturbations only, indicating that CAT is fairly robust to K-round composite attacks. Moreover, by incorporating K-round composite attacks in the adversarial training of CAT, we expect to see further robustness improvement.

6. CONCLUSION

While effective against individual perturbation models, existing adversarial defenses often fail to defend against combinations of multiple perturbations. In this paper, we first present a new class of composite attacks that combine multiple perturbations and penetrate the state-of-the-art defenses. We then propose composite adversarial training (CAT), a novel training method that improves DNNs robustness not only against individual perturbations but also against their compositions. Empirical evaluation on benchmark datasets and models shows its promising performance.

C K-ROUND COMPOSITE ADVERSARIAL ATTACK

We describe the detailed formulation for the multiple round composite adversarial attack and technical tools to solve this new attack.

C.1 FORMULATION

We denote m component attacks as A 1 , . . . , A m , and whose perturbation sizes are 1 , . . . , m respectively. Unlike the definition in § 2, here we represent each attack A i as A i (x, δ i ), which denotes applying perturbation δ i onto x with the mechanism of A i . The K-round attack generalizes composite adversarial attack in the following sense, the attack runs in K-round. At k-th round, it performs a composite attack with perturbations δ k,1 , . . . , δ k,m . The constraint for adversary is the overall magnitude of perturbations he spent for each attack. Formally, (δ * 1 , . . . , δ * m ) arg min δi (x K , y) s.t.      x 0 = x x k = A m (. . . A 2 (A 1 (x k-1 , δ k,1 ), δ k,2 ) . . . , δ k,m ) k = 1, . . . , K K k=1 δ i,k ≤ i i = 1, . . . , m where x k is the perturbed sample after the k-th round and δ i = (δ i,1 , . . . , δ i,K ) is the concatenation of the perturbation at each iteration. Observe that by specifying K, Eqn 13 instantiates a spectrum of attacks. As K approaches infinity, Eqn 13 essentially considers all finite combinations of ways of allocating the total budgets { i } m i=1 using the m perturbation mechanisms.

C.2 DERIVATION

Now we present an iterative projected gradient descent algorithm to find a solution to Eqn (13). We use the superscript (t) to denote the value of related variables at t-th iteration. We randomly initialize δ (0) i . At the t-th iteration, the update rule is defined as: g (t) i ← ∂ (x (t) K , y) ∂δ (t) i (14) δ (t+1) i ← Π {δi: K k=1 δ i,k ≤ i} (δ (t) i -αg (t) i ) ( ) where α is the learning rate and Π S (•) is the projection operator of a convex set S. The computation of Eqn ( 14) is straightforward. We focus our discussion on the projection operator in Eqn (15). Without loss of generality, we omit the subscript i for simplicity. Let V be a matrix with its k-th row as δ k (for k = 1, . . . , K). Then the summation K k=1 δ k ≤ can be rewritten as K k=1 v k . Additional, We suppose the perturbation of i-th component attack is measured with p norm for some p ∈ {1, 2, ∞}, which hold for all the experiments in the paper. Thus, we reach to the p,1 mixed-norm of V , denoted as V p,1 , which is a special case of p,q mixed-norm ball: V p,q = ( m i=1 v i q p ) 1/q . Hence, we cast Eqn (15) as the projection operator onto an p,1 mixed-norm ball. For p = 1, the calculation is straightforward since it reduces to the 1 projection operator for the concatenated vectors. We work with p = 2 and p = ∞ in the next.

C.3 IMPLEMENTATION FOR p = 2

We leverage Algorithm 1 in (Sra, 2012) to compute the 2,∞ mixed-norm of an input x. Interested readers could find more details in (Sra, 2012) .

C.4 IMPLEMENTATION FOR p = ∞

We leverage the method proposed in (Béjar et al., 2019) , which solves the proximal operator of 1,∞ mixed-norm using an active set approach and attains better efficiency than previous methods (Quattoni et al., 2009; Gustavo et al., 2018; Chau et al., 2019) . However, as this is a primitive procedure in CAT, which is executed for hundreds of iterations, the basic implementation in (Béjar et al., 2019) is not scalable for our setting. We now improve the scalability of (Béjar et al., 2019) for CAT, with their original algorithm sketched in Algorithm 1. The method is based on computing the proximal operator of the dual form of ∞,1 mixed-norm ball, ∞,1 mixed-norm. We define U = abs(V ). Given an initial radius of 1 ball for its row u i , denoted by r, it iteratively improves r by first projecting each row with 1 -norm exceeding r to the 1 ball of radius r (line 4 to 8) and then computing a larger r based on the nonzero elements of the projected row (line 9). In particular, Algorithm 1 uses sorting to solve the 1 -norm projection problem, which is detailed in Algorithm 2. We state our more efficient implementation based on two key observations here. Observation 1 -The projection radius r (line 6) in Algorithm 1 for related rows increases at every iteration. Thus, for each related row v, we face a set of queries with increasing radii r 1 ≤ • • • ≤ r T , where T is the number of queries. Following the notations of Algorithm 2, let u denote the sorted v in non-decreasing order, and we define h k = k j=1 u j -ku k . Note that h is monotonically increasing: h k+1 -h k = u k+1 -(k + 1)u k+1 + ku k = k(u k -u k+1 ) ≥ 0 Leveraging this observation, we optimize Algorithm 2 as follows. At line 2, K is the largest item among 1, . . . , N such that r > h K ; we only need to scan h k one pass to find the optimal K t for all a t due to that both h and r are increasing. At line 3, we only need to pre-process the partial sums once for each row before the main loop. Observation 2 -To update the radius r of the 1 -norm ball in Algorithm 1, we need to access the statistics of line 7 ∼ 9. A critical observation is that it is not necessary to explicitly compute the projected vector at every iteration to compute the statistics. Specifically, line 7 computes the number of non-zero elements J i for the projected i-th row; and line 9 updates r using the sums of the projected rows n j=1 x i,j and J i . Combining the previous observation, if K i,t is the optimal K (line 2 of Algorithm 2) for the t-th radius r t and the i-th row, it holds that |J i | = K t,i and j∈Ji u i,j = {j:xi,j >0} (x i,j + τ t,i ) = r t + K i,t τ i,t where τ i,t is calculated based on line 3 of Algorithm 2 for i-th row with radius r t . Thus, we can avoid computing the intermediate 1 projections. Algorithm 1: Proximal operator of mixed 1,∞ norm: prox λ • 1 (•) Béjar et al. (2019) Input: m × n matrix V Output: X 1 U ← abs (V ) ; 9 summarizes the results of all the models in Table 2 on two decision-based attacks: 2 -Pointwise attack and 1 -Pointwise attack (Schott et al., 2019) . One may notice that the inferior performance of P ∞ model on these black-box attacks compared to its performance in Table 2 for white-box attacks. In summary, we find strong gradient masking effects within this model. 



https://github.com/bethgelab/foolbox https://github.com/locuslab/robust_union



Figure 1: Samples produced by composite attacks on CIFAR10 ( p = 0.015, f = 0.175, untargeted).

Figure 2: Adversarial accuracy of AVG, MAX, and MSD w.r.t. unions and compositions of A = { 1, 2, ∞}.

Figure 3: Comparison of different adversarial training frameworks.3.2 DISCUSSIONFigure3compares different adversarial training methods (AVG, MAX, MSD, and CAT). The design of CAT enjoys two major benefits. First, by definition, the composite adversarial attack naturally covers the strongest individual attack and the union of these attacks, as demonstrated in the two instantiations above. Second, CAT generalizes adversarial robustness from individual attacks which are either fixed or selected from a fixed pool to their compositions.

(a) CIFAR10: p space (b) CIFAR10: pixel and spatial perturbations Figure 4: Robust accuracy of CAT and baselines on CIFAR10 dataset under composite adversarial attacks with re-scaling factor α=0.5, 0.8, and 1.0.

Figure 5: Robust accuracy versus the number of rounds K on MNIST and CIFAR10 under composite adversarial attacks.

2 r ← initial radius computed via Lemma 2 and Lemma 3 inBéjar et al. (2019) ; or {Ji} m i=1 change;10 for i ← 1, . . . , m do µi ← max j∈J i u i,j -r λ|J i | , 0 ; 11 X ← sgn(V ) max U -λµ1 T , 0 ; D ADDITIONAL RESULTS D.1 DECISION-BASED ATTACKS ON MNIST Table

Performances of CAT and baselines on CIFAR10 with p perturbations (p = 1, 2, ∞). Rows represent attacks, and columns denote robust trained models. P∞, P2, and P1 are models adversarially trained with PGD attacks (with corresponding norms).

Performances of CAT and baseline methods on MNIST with p perturbations (p = 1, 2, ∞). Rows represent attacks, and columns denote robust trained models. P∞, P2, and P1 are models adversarially trained with PGD attacks (with corresponding norms).

Performance of CAT and baselines on MNIST with respect to pixel and spatial perturbations. Rows represent attacks, and columns denote robust trained models.

Performance of CAT and baselines on CIFAR10 with respect to pixel and spatial perturbations. Rows represent attacks, and columns denote robust trained models.

Impact of the ordering of component attacks in CAT on MNIST under compositions of ∞ and 2 perturbations (A1 -A2 is the ordering used in CAT and composite attacks).

Impact of the ordering of component attacks in CAT on CIFAR10 under compositions of ∞ and 2 perturbations (A1 -A2 is the ordering used in CAT and composite attacks).

Performances of CAT and baselines on CIFAR100 with p perturbations (p = 1, 2, ∞). Rows represent attacks, and columns denote robust trained models. P∞, P2, and P1 are models adversarially trained with PGD attacks (with corresponding norms).

Performance of CAT and baselines on CIFAR100 with respect to pixel and spatial perturbations. Rows represent attacks, and columns denote robust trained models. P flow and P pixel are models trained with PGD attacks (with corresponding perturbation spaces).

Computational Cost of CAT and baseline methods.

annex

Proof. (Lemma 1) The adversarial loss of a given input (x, y) with respect to A i is defined as:where B i ( ) is the feasible set for A i with bound i . Meanwhile, the adversarial loss with respect to the union of {A i } m i=1 is defined as:Given that A i is arbitrarily chosen, the union attack is stronger than each component attack.Therefore, we only need to show that the composite attack is stronger than the union attack. We prove this by showing that the feasible set for the composite attack is larger than the union attack. To simplify the discussion, we consider two p attacks A 1 and A ∞ with budget 1 and ∞ respectively, and d is the data dimensionality. Proof. (Lemma 2) First, it can be verified that with the constraint, ∞ < 1 < d ∞ , thus the ddimensional hypercubes represented by 1 and ∞ intersect but none of them completely contains the other. Figure 6 shows the case for d = 2.The feasible set B u for the union attack is given by:where B p ( p ) denotes the feasible set for A p (p = 1, ∞). The volume of B u , Vol(B u ) is given by:In comparison, the volume of the feasible set B c for the composite attack (with budget 1 /2 and ∞ /2 for A 1 and A ∞ ) is given by:Given the constraint, it is trivial to see Vol(B c ) > Vol(B u ), indicating that the composite attack entails a larger perturbation space than the union attack.This result can be generalized to the cases of other p attacks and more than two component attacks.

B EXPERIMENT SETTINGS

We present the detailed setting of CAT, baseline methods, as well as the attacks used in § 4.Case MNISTp CIFAR10p MNIXT-pixel and spatial CIFAR10-pixel and spatial α 0.8 0.8 1.0 0.7Table 8 . The rescale factor α for CAT models presented in the paper.

B.1 MODEL TRAINING

We reuse a few pre-trained models from repository provided by previous work. For the rest of models, their training use the following setups for each dataset.• MNIST. The models are optimized with an Adam optimizer (Kingma & Ba, 2015) . The learning rate linearly increases from 0 to 0.001 in the first 6 epochs, and then linearly decreases to 0 in the last 9 epochs.• CIFAR10. The models are optimized by a SGD optimizer with momentum of 0.9. The learning rate linearly increases from 0 to 0.1 in the first 20 epochs, then linearly decrease to 0.005 in the next 20 epochs, and it linearly decay to 0 in the last 10 epochs. Besides, we regularize models with a 2 weight decay of 5 × 10 -4 .

B.2 CAT

• p cases. We run the composite attack during CAT model training with step sizes of 0.1× p for each p ∈ {1, 2, ∞}. The number of iterations for attacks is 50.• pixel and spatial perturbations. We run the composite attack during CAT model training with step sizes of 0.1 × p and 0.1 × f for pixel and spatial perturbation. The number of iterations for attacks is 40.We also present the re-scaling factor α for the models presented in the main text in Table 8 .

B.3 ADVERSARIAL ATTACKS

In evaluation, we run all the attacks except C&W with 5 random restarts. Below we summarize all the attacks we used in the evaluation and their hyper-parameters.• Attacks from Foolbox. We use the following attacks from Foolbox 3.1.1 1 : ∞ PGD, 2 PGD, 1 PGD, Fast Gradient Sign Method, DeepFool, C&W, Salt&Pepper. We take their default settings from Foolbox in the evaluation.• Our Implementation of p PGD Attacks. Plus, we also implement PGD attacks for three p norms ourselves, which achieves higher attack succces rate than the version from the Foolbox. For ∞ PGD, we run 200 iterations with a step size of 0.1 × . For 2 PGD, we run 500 iterations with a step size of 0.05 × . Our implementation of 1 PGD attack is based upon Section A.1 of (Maini et al., 2020) , where we set the the range of number of pixels to modify in each iteration [k 1 , k 2 ] as: k 1 = 5 and k 2 = 20. The number of iterations of this attack is 200, and the step size is set to 0.05 × .• The PGD Spatial Perturbation Attack. We run this attack with 200 iterations and step size of 0.1 × f .• Composite Adversarial Attacks. We run this attack with 200 iterations and step sizes of 0.1 × for all the component attacks.

B.4 BASELINES

• p Perturbations. We use pre-trained robust models provided by MSD (Maini et al., 2020) 2 for all the baseline methods, including P ∞ , P 2 , P 1 , AVG, MAX, and MSD.• Pixel and Spatial Perturbations. The adversarial trained models with ∞ PGD is the same as in the p perturbations case. For P flow , the number of iterations for spatial attack is 50, and the step size is set to 0.1 × f . The same rule applies to AVG and MAX on the both two attacks.Algorithm 2: Algorithm for projection v ∈ R N onto simplex N n=1 x n = r, and 

D.3 PRELIMINARY RESULTS FOR CIFAR 100 DATASETS

Table 10 and Table 11 present performances of baseline models and CAT for CIFAR100 dataset. Similar to previous results, with slightly reduction in clean accuracy, CAT outperforms MAX and AVG as well as simple PGD adversarial trained models. The MSD is excluded here temporally due to it takes extreme long time to train. We will fill MSD's result when it is available.

D.4 COMPUTATIONAL EFFICIENCY OF CAT

We describe the computational cost of MAX, AVG, and CAT here. MSD is excluded here due to it requires way more time, as the official implementation. We consider filling its result with our implementation later. All the experiments are with CIFAR10 dataset and of p settings. The batch size of all the experiments is 50, and the total number of epochs is 50. The result is in Table 12 .

