CAT: COLLABORATIVE ADVERSARIAL TRAINING

Abstract

Adversarial training can improve the robustness of neural networks. Previous adversarial training methods focus on a single training strategy and do not consider the collaboration between different training strategies. In this paper, we find different adversarial training methods have distinct robustness for sample instances. For example, an instance can be correctly classified by a model trained using standard adversarial training (AT) but not by a model trained using TRADES, and vice versa. Based on this phenomenon, we propose a collaborative adversarial training framework to improve the robustness of neural networks. Specifically, we simultaneously use different adversarial training methods to train two robust models from scratch. We input the adversarial examples generated by each network to the peer network and use the peer network's logit to guide its network's training. Collaborative Adversarial Training (CAT) can improve both robustness and accuracy. Finally, Extensive experiments on CIFAR-10 and CIFAR-100 validated the effectiveness of our method. CAT achieved new state-of-the-art robustness without using any additional data on CIFAR-10 under the Auto-Attack benchmark 1 . Do two networks learn better if they collaborate?

1. INTRODUCTION

With the development of deep learning, Deep Neural Networks (DNNs) have been applied to various fields, such as image classification (He et al., 2016) , object detection (Redmon et al., 2016) , semantic segmentation (Pal & Pal, 1993) , etc. And state-of-the-art performance has been obtained. But recent research has found that DNNs are vulnerable to adversarial perturbations (Goodfellow et al., 2014) . A finely crafted adversarial perturbation by a malicious agent can easily fool the neural network. This raises security concerns about the deployment of neural networks in security-critical areas such as Autonomous driving (Chen et al., 2019) and medical diagnostics (Kong et al., 2017) . To cope with the vulnerability of DNNs, different types of methods have been proposed to improve the robustness of neural networks, including adversarial training (Madry et al., 2017) , defensive distillation (Papernot et al., 2016) , feature denoising (Xie et al., 2019) and model pruning (Madaan et al., 2020) . Among them, Adversarial Training (AT) is the most effective method to improve adversarial robustness. AT can be regarded as a type of data augmentation strategy that trains neural networks based on adversarial examples crafted from natural examples. AT is usually formulated as a min-maximization problem, where the inner maximization generates adversarial examples, while the outer minimization optimizes the parameters of the model based on the adversarial examples generated by the inner maximization process. The previous methods have focused on how to improve the model's adversarial accuracy, focusing only on the numerical improvement, but not on the characteristics of the different methods. We ask: Do different adversarial training methods perform the same for different sample instances? We analyzed different adversarial training methods (taking AT (Madry et al., 2017) and TRADES (Zhang et al., 2019) as examples) and found that different methods behave differently for sample instances, as illustrated in Figure 1 . Specifically, for the same adversarial example, the network trained by AT can classify correctly, while the network trained by TRADES misclassifies. Similarly, some examples can be correctly classified by the network trained by TRADES, but not by the network trained by AT. That is, although AT and TRADES have the same numerical adversarial accuracy, they behave differently for sample instances. This raises the question: 1 1 0 1 0 0 … 1 0 0 1 0 1 0 1 1 0 1 … 0 0 1 1 Based on this observation, we propose a Collaborative Adversarial Training (CAT) framework to improve the robustness of neural networks. Our framework is shown in Figure 2 . Specifically, we simultaneously train two deep neural networks separately using different adversarial training methods. At the same time, the adversarial examples generated by this network are input to the peer network to obtain the corresponding logit. Then the logit obtained by the peer network is used to guide the learning of this network together with its own adversarial training objective function. We expect to improve the robustness of the neural network by allowing peers to learn from each other in this collaborative learning way. Extensive experiments on different neural networks and different datasets demonstrate the effectiveness of our approach. CAT achieved new state-of-the-art robustness without any additional synthetic data or real data on CIFAR-10 under the Auto-Attack benchmark. In summary, our contribution is threefold as follows. • We find that the models obtained using different adversarial training methods have different representations for individual sample instances. et al., 2016) and CW (Carlini & Wagner, 2017 ) also make the model more challenging. Recently, the ensemble approach of diverse attack methods (Auto-Attack), consisting of APGD-CE (Croce & Hein, 2020b) , APGD-DLR (Croce & Hein, 2020b) , FAB (Croce & Hein, 2020a) and Square Attack (Andriushchenko et al., 2020) , became a benchmark for testing model robustness. Black-box Attack: Due to the similarity of the model structure, the adversarial examples generated on the surrogate model can be transferred to fool the target model. There are many ways to explore the transferability of adversarial examples for black-box attacks. Dong et al. (2018) combines momentum with an iterative approach to obtain better transferability. Scale-invariance (Lin et al., 2019) boosts the transferability of adversarial examples by transforming the inputs on multiple scales.

2.2. ADVERSARIAL ROBUSTNESS

Adversarial attacks present a significant threat to DNNs. For this reason, many methods have been proposed to defend against adversarial examples, including denoising (Xie et al., 2019) , adversarial training (Madry et al., 2017 ), data aumentation (Rebuffi et al., 2021) , and input purification (Naseer et al., 2020) . ANP (Madaan et al., 2020) 

2.3. KNOWLEDGE DISTILLATION

Knowledge distillation (KD) is commonly used for model compression and was first used by hinton (Hinton et al., 2015) to distill knowledge from a well-trained teacher network to a student network. KD can significantly improve the accuracy of student models. There have been many later works to improve the effectiveness of KD (Romero et al., 2014) . In recent years, KD has been extended to other areas. Goldblum et al. (2020) analyzes the application of knowledge distillation to adversarial robustness and proposes ARD to transfer knowledge from a large teacher model with better robustness to a small student model. ARD can produce a student network with better robustness than training from scratch. In this paper, we propose a more effective collaborative training framework to improve the robustness of the network.

3.1. MOTIVATION

We investigated the performance of the robust models obtained by different training methods on sample instances. We found that different models perform differently on sample instances: for some samples, the model trained by correctly classifies AT (Madry et al., 2017) , while the model trained by TRADES (Zhang et al., 2019) We then use the logits obtained from the peer networks to guide the learning of its network, i.e., g u → f u , f v → g v .

3.2. COLLABORATIVE ADVERSARIAL TRAINING (CAT)

We take the methods of AT and TRADES as an example to introduce collaborative adversarial training. We first briefly introduce the training objective functions of AT and TRADES and then introduce CAT in detail. Adversarial training is defined as a min-maximization problem, and Madry et al. (2017) proposes to use PGD for adversarial training. That is, PGD is used to generate adversarial examples for the internal maximization process, while external minimization uses the internal PGD-generated adversarial examples and the ground-truth label y to optimize the model parameters. AT is formulated as: min θ E (x,y)∈D data (arg max δ L(f AT θ (x adv AT ), y)), x adv AT = x + δ. where D data is the training data distribution, x and y are training data and corresponding label samples from D data . f θ is a neural network parameterized by θ. L is the standard cross-entropy loss used in image classification tasks. δ is the adversarial perturbations generated by PGD. Follwing previous study, δ is bounded by l ∞ . Neural Networks trained by AT can obtain a certain level of robustness, with compromises on the accuracy of natural samples. For this purpose, TRADES uses a new training objective function for adversarial training. Formulated as: min θ ′ E (x,y)∈D data L(g T RADES θ ′ (x), y) + λD KL (g T RADES θ ′ (x), g T RADES θ ′ (x adv T RADES )), where x adv is the adversarial data corresponding to natural data x and y is the true label. L is the cross-entropy loss in classification task. D KL is the KL divergence to pushing natural logits and adversarial logits together. λ is a trade-off parameter. CAT expects to improve robustness by letting neural networks trained by different methods exchange knowledge information, i.e., collaborative adversarial learning. As illustrated in Figure 2 . we use the logit of a peer network to guide the learning of this network. Specifically, we input the adversarial data crafted by the network trained by AT into the network trained by TRADES to get the corresponding logit. Then use the logit obtained by the network trained by TRADES to guide the training of the network trained by AT. The formulation goes to: L 1 = D KL (f AT (x adv AT ), ĝT RADES (x adv AT )), where D KL is KL divergence used to compute the relative entropy, the same as in TRADES. f AT is the network trained with AT and g T RADES is the network trained with TRADES. ĝT RADES (x adv AT ) represents that we take the logit obtained by network trained by TRADES as a constant. x adv AT is the adversarial data generated by f AT with PGD function based on natural example x. Similarly, to make the two networks learn collaboratively. We need to feed the adversarial samples generated by the TRADES network to the AT network to obtain the corresponding logit. And then the logit obtained by the peer network is used to guide the training of the network trained by TRADES. The loss is formulated as: L 2 = D KL (g T RADES (x adv T RADES ), f AT (x adv T RADES )). (5) x adv T RADES is the adversarial example crafted by network trained by TRADES use KL divergence function. f AT (x adv T RADES ) represents that we take the logit obtained by network trained by AT as a constant. It is not enough to let the two networks learn from each other in this way. Real class labels are needed to guide them. For this purpose, we combine the respective training objective functions of the two networks and the mutual learning objective function to guide the learning of the networks together. Therefore, the training objective function for collaborative adversarial training based on AT and TRADES is: L total = αL T RDES + (1 -α)L 2 + βL AT + (1 -β)L 1 , where α and β is the trade-off parameter to balance the guidance of peer logit and the original training objective function. L T RADES is the training objective of TRADES defined in Equation (3). And L AT is the training objective of AT defined in Equation ( 1). The first two items in Equation ( 6) are used to train model g and the last two items are used to train model f , due to we take the peer logit as constant. The decision boundaries learned by different adversarial training methods are different. Under the constraint of peer logit, i.e., Equation (4) and Equation ( 5), the two networks trained by different methods continuously optimize the classification decision boundaries in the process of collaborative learning. Finally, both networks learn better decision boundaries than learning alone to obtain better robustness. Our collaborative adversarial learning is a generalized adversarial training method that can be used with any two adversarial methods. Generally, CAT can use any number of different adversarial training methods for collaborative learning. Results of CAT with three adversarial training methods are delayed to Appendix.

4. EXPERIMENT RESULTS

In this section, we conduct extensive experiments on popular benchmark datasets to demonstrate the effectiveness and performance of CAT. First, we briefly introduce the experiment setup and implementation details of CAT. Evaluation setup: We report the clean accuracy on natural examples and the adversarial accuracy on adversarial examples. For adversarial accuracy, we report both white-box and black-box. We follow the widely used protocols in the adversarial research field. For the white-box attack, we consider three basic attack methods: FGSM (Goodfellow et al., 2014) , PGD (Madry et al., 2017) , and CW ∞ (Carlini & Wagner, 2017) optimized by PGD 20 , and a stronger ensemble attack method named AutoAttack (AA) (Croce & Hein, 2020b) . For the black-box attacks, we consider both transfer-based attacks and query-based attacks. Due to the fact that CAT uses two methods for collaborative training, we report the results for both networks. For example, the method of collaborative adversarial training using TRADES and ALP can be denoted as CAT T RADES-ALP .

4.1.1. HYPERPARAMETER:

CAT improves adversarial robustness through the learning of collaboration, which requires both the knowledge of peer networks and the guidance of the ground truth label. The balance of original methods and the collaborative is a trade-off by a hyperparameter α and β. For simplicity, we set α equals β in our experiment. We execute collaborative training by TRADES and AT as the base method and experiment with different trades-off parameters. We test various α values varying from 1/50 to 1/5. The experiment results are illustrated in Figure 3 . From the figure we can conclude that if α is too high, i.e., little knowledge is drawn from the peer network, the effect is about the same as training with AT and trades alone. If α is too low, i.e., overly focused on the knowledge from the peer network, The network is also not very robust. Since Auto-Attack is currently the most powerful integrated attack method, we choose hyperparameters α and β based primarily on the robustness of the network against AA. In the following experiments, we set α = β = 1/20.0 by default.

4.1.2. DIFFERENT CAT METHODS:

As described in Section 3. 3 shows the results, and our method CAT achieves the best performance.

4.3. COMPARISION TO SOTA

We use two WideResNet-34-10 (Zagoruyko & Komodakis, 2016) networks for collaborative adversarial training, one using the TRADES (Zhang et al., 2019) training method and the other using the ALP (Kannan et al., 2018) training method. Table 4 shows the accuracy of the different methods for natural examples and the robustness against Auto-Attack. For some methods, we also report the results of WideResNet-34-20. All results are from the original paper. From the table we can conclude that the robustness of both networks trained with CAT outperforms the previous methods, demonstrating the state-of-the-art performance obtained by our CAT. 

4.4. COMPARISION TO KD-AT

In general, the robustness of large models is higher than that of small models under the same training settings. For example, WideResNet-34-10 (Zagoruyko & Komodakis, 2016) trained by TRADES can achieve 53.08% robustness against AA, while the accuracy of ResNet-18 is only 49.21%. From this motivation, some authors have used knowledge distillation to distill the robustness of large models to small models and obtained good results. Considering that CAT also involves the collaborative training of two models, we compare CAT with the KD-AT method. To give a fair comparison, unlike the previous experiments using two same-size networks for CAT, we use two different-size networks for CAT training and then report the accuracy of both the two networks, i.e., WideResNet-34-10 and ResNet-18. Note that, unlike the KD method where the teacher is trained in advance, our CAT is trained with both the large model and the small model simultaneously, so there is no concept of teacher and student. In another word, we extend previous off-line distillation to an online way and achieve better performance. Table 5 shows the results of Knowledge Distillation-AT and CAT, where ARD (Goldblum et al., 2020) , IAD (Zhu et al., 2021) , and RSLAD (Zi et al., 2021) are trained by KD-AT using TRDES trained WideResNet-34-10 network as teacher (second row in Table 5 ). CAT was collaboratively (Pang et al., 2020b) WideResNet-34-20 85.14 53.74 Overfitting in AT* (Rice et al., 2020) WideResNet-34-20 85.34 53.42 Overfitting in AT (Rice et al., 2020) WideResNet-34-10 85.18 53.14 Self-Adaptive Training (Huang et al., 2020) WideResNet-34-10 83.48 53.34 FAT (Zhang et al., 2020) WideResNet-34-10 84.52 53.51 TRADES (Zhang et al., 2019) WideResNet-34-10 84.92 53.08 LLR (Qin et al., 2019) WideResNet-40-8 86.28 52.84 LBGAT+TRADES (α = 0)* (Cui et al., 2021) WideResNet-34-20 88.70 53.57 LBGAT+TRADES (α = 0) (Cui et al., 2021) WideResNet A ALLEVIATE OVERFITTING 

B CONFUSION MATRIX

To better present motivation for CAT, we show the confusion matrices of different methods. In Figure 5 , three confusion matrices are shown, i.e., ALP-prediction and AT-prediction, TRADESprediction and AT-prediction, ALP-prediction and TRADES-prediction. Confusion exists in all three matrices, especially for blocks from class 3 to class 7. The conclusion is correspondence to Figure 1 . The prediction intersection is reported in Tab. 6.

C CORRELATION BETWEEN DISCREPANCY AND CAT PERFORMANCE

In this section, we analyze the correlation between the discrepancy of different adversarial training methods and their adversarial robustness after CAT. First, we compute the prediction intersection  intersection = 1 N xi∈D I(f AT (x i ), g T RADES (x i )), where D is the datasets, and I is an indicator function, which is 1 when f AT (x i ) = g T RADES (x i ) and 0 otherwise. The smaller this value is, the greater the discrepancy. Then, we report the adversarial robustness of CAT trained by different settings. Results are reported in Tab. 6. A conclusion can be drawn that the greater the discrepancy is, the higher the adversarial robustness after CAT.

D.1 VGG-16 RESULTS ON CIFAR-10

The white-box robustness of VGG-16 (Simonyan & Zisserman, 2014) et al., 2017) on CIFAR-10 datasets under various attacks in Tab. 8. The experiment set is the same as the previous setting. We can see that our CAT brings 1.0 improvement for MobileNet under AutoAttack, which is the most powerful adversarial attack method.

D.3 RESNET-18 RESULTS ON TINY-IMAGENET

For the large-scale ImageNet dataset, just as all the baseline methods did not report the results, we are also unable to evaluate on ImageNet due to the very high training cost. To investigates the performance of our CAT in large datasets, we conduct the experiment of white-box robustness of Under review as a conference paper at ICLR 2023 



https://github.com/fra31/auto-attack



Figure 1: Classification results of different adversarial training methods on sample instances. The first row is the classification result of the model trained by AT and the second row is the classification result of the model trained by TRADES. 1 means correct classification and 0 means incorrect classification. 10000 is the size of the CIFAR-10 test set. The third row is the result of correct classification by both AT and TRADES, and the result is shown in red. It can be seen that the models trained by different methods perform differently on sample instances.

Figure 2: The framework of CAT, performing adversarial training collaboratively. Given a batch of natural samples, the two networks f and g are attacked separately to generate adversarial examples u and v. Then u and v are fed into both networks to obtain the corresponding logits.We then use the logits obtained from the peer networks to guide the learning of its network, i.e., g u → f u , f v → g v .

Figure 3: Adversarial robustness using different hyperparameters under the trades-at for collaborative adversarial training framework. From left to right, the results of Clean acc, FGSM acc, PGD acc, and AA acc are shown. model f and model g represent the results of using TRADES and AT in the CAT training framework, respectively.

Figure 4: Test robust accuracy of AT, ALP, TRADES, and CAT with ResNet-18 on CIFAR-10 datasets. CAT can alleviate the problem of overfitting.

• We propose a novel adversarial training framework: collaborative adversarial training. CAT simultaneously trains two neural networks from scratch using different adversarial training methods and allows them to collaborate to improve the robustness of the model. • We have conducted extensive experiments on a variety of datasets and networks, and evaluated them on state-of-the-art attacks. We demonstrate that CAT can substantially improve the robustness of neural networks and obtain new state-of-the-art performance without any additional data. FGSM performs FGSM iteratively with a small step size.Madry et al. (2017) proposed PGD to generate adversarial examples, which is the most efficient way of using the firstorder information of the network.Dong et al. (2018) combines momentum into the iterative process to help the model escape from local optimal points. And the adversarial examples generated by this method are also more transferable. Boundary-based attacks such as deepfool (Moosavi-Dezfooli

finds the vulnerability of latent features and uses pruning to improve robustness. Madry uses PGD to generate adversarial examples for adversarial training, which is also the most effective way to defend against adversarial examples. A large body of work uses new regularization or objective functions to improve the effectiveness of standard adversarial training. Kannan et al. (2018) uses Adversarial logit pairing to improve robustness by encouraging the logits of normal and adversarial examples to be closer together. TRADES (Zhang et al., 2019) uses KL divergence to regularize the output of adversarial and pure examples.

Then, ablation studies are done to choose the best hyperparameters and CAT methods. Finally, according to the best CAT methods, we report the white-box and blackbox adversarial robustness on two popular benchmark datasets. All images values are scaled into [0,1], and all our experiments are run on a single NVIDIA GeForce GTX 1080Ti.Datasets:We used two benchmark datasets, including CIFAR-10 (Krizhevsky et al., 2009) and CIFAR-100(Krizhevsky et al., 2012). CIFAR-10 has 10 classes. For each class, there are 5000 images for training and 1000 images for test. And for CIFAR-100, there are 100 classes, and similarly for each class, there are 500 images for training and 100 images for test. Both datasets are widely used for training and testing adversarial robustness. The image size in both datasets is 32 × 32.

The white-box robustness results (accuracy (%)) of different CAT methods on CIFAR-10. We report the results of best checkpoint and last checkpoint. The best results are marked using boldface. Two ResNet-18 networks are used in our CAT framework. Clean FGSM PGD 20 CW ∞ AA Clean FGSM PGD 20 CW ∞ AA CAT AT -T RADES 83.74 59.69 54.44 52.60 50.52 84.45 60.03 53.01 52.01 49.30 83.55 59.78 54.52 52.58 50.86 84.12 59.69 52.82 51.88 49.39 CAT AT -ALP 84.66 59.94 53.11 51.90 49.74 84.71 59.84 50.77 50.53 47.80 85.21 60.21 53.02 52.13 49.96 85.27 59.75 51.10 50.69 47.91 CAT T RADES-ALP 83.91 59.76 54.44 52.56 51.02 84.67 59.85 52.51 51.43 49.31 84.75 59.76 54.17 52.72 50.85 85.27 59.82 52.56 51.83 49.64

2, any two adversarial training methods can be incorporated into the CAT framework and learned collaboratively. Considering that different adversarial training methods have distinct properties, the performance of different CAT methods may also vary. For this reason, we consider three collaborative adversarial training methods, AT-TRADES, AT-ALP, and TRADES-ALP, respectively. Table1shows the performance of the different CAT methods. All CAT training methods obtained good robustness against four attack methods. We again mainly consider the performance of auto-attack and choose TRADES-ALP as the base method for our CAT. PGD, CW ∞ , AA, the attack perturbations are all 8.0/255 and the step size for PGD, CW ∞ are 2/25, with 20 iterations. According to the way of reporting in previous papers, we report both the best checkpoint and the last checkpoint of the training phase. The best checkpoint result of the training phase is selected based on the model's PGD defense results for the test dataset (attack step size 2.0/255, iteration number 10, perturbation size 8.0/255).For black-box attacks, we consider both transfer-based attacks and query-based attacks. For the transfer-based attack, we use the standard adversarial training of ResNet-34 as the surrogate model, trained with the same parameters as described in Section 4. First, we perform the attack on the surrogate model to generate adversarial examples and then transfer the adversarial examples to the target network to get the robustness of the target network. Here, we consider four attacks: FGSM, PGD 20 , PGD 40 , and CW ∞ , with the same attack parameters as Section 4.2.1. For query-based attacks, we consider Square attack, which is an efficient black-box query-based attack method. TableTable

The white-box robustness results (accuracy (%)) of CAT on CIFAR-10 and CIFAR-100. We report the results of best checkpoint and last checkpoint. The best results are marked using boldface. Two ResNet-18 networks are used in our CAT framework. TRA-ALP is short for TRADES-ALP due to the limitation.

The black-box robustness results (accuracy (%)) of CAT on CIFAR-10 and CIFAR-100. We only report the results of the best checkpoint. The best results are marked using boldface. Two ResNet-18 networks are used in our CAT framework. TRA-ALP is short for TRADES-ALP due to the limitation. FGSM PGD 20 PGD 40 CW ∞ Square FGSM PGD 20 PGD 40 CW ∞ Square

Quantitative comparison with the state-of-the-art adversarial traing methods. Two WideResNet-34-10 networks are used in our CAT framework.

Quantitative comparison with the state-of-the-art Knowledge-Distillation AT methods. We use WideResNet-34-10 and ResNet-18 networks in the CAT framework for a fair comparison. , we first analyze the properties of different adversarial training methods and find that networks trained by different methods perform differently on sample instances, i.e., the network can correctly classify samples that are misclassified by other networks. Based on this observation, we propose a collaborative adversarial training framework, which aims to use the knowledge learned by peer networks to guide its learning. Thus improving the robustness of both networks. Extensive experiments on different datasets and different networks demonstrate the effectiveness of our approach, and state-of-the-art performance is achieved. Broadly, CAT can be easily extended to multiple networks for collaborative adversarial training, e.g, three peer networks. However, our method has some limitations the same as the previous method that adversarial training consumes training resources, and CAT has to train two networks simultaneously. Work on how to accelerate CAT will be carried out in the future.

Overfitting in adversarial training is first proposed byRice et al. (2020), which shows the test robustness decreases after peak robustness. And overfitting is one of the most concerning problems in adversarial training. Here, we investigated the overfitting problem in CAT. Results are illustrated in Fig.4. Our CAT can alleviate the overfitting probalem that widely occurs in previous adversarial methods. Moreover, the performance for CAT has not saturated, and high performance is expected with longer epoch training.

The correlation between white-box robustness results (accuracy (%)) and prediction discrepancy of different CAT methods on CIFAR-10. Two ResNet-18 networks are used in our CAT framework.

The white-box robustness results (accuracy (%)) of CAT on CIFAR-10. We report the results of the best checkpoint. The best results are marked using boldface. Two VGG-16 networks are used in our CAT framework. TRA-ALP is short for TRADES-ALP due to the limitation.

models trained using AT, ALP, TRADES, and CAT are reported in Tab. 7. The setting for VGG-16 is the same as ResNet-18 models, i.e., α = 1.0/20 and β = 1.0/20. The improvement for CAT with VGG-16 models is as consistent with ResNet-18 models. CAT can boost model's robustness under AutoAttack with 2.0 points.

The white-box robustness results (accuracy (%)) of CAT on CIFAR-10. We report the results of the best checkpoint. The best results are marked using boldface. Two MobileNet networks are used in our CAT framework. TRA-ALP is short for TRADES-ALP due to the limitation.

The white-box robustness results (accuracy (%)) of CAT on Tiny-ImageNet. We report the results of the best checkpoint. The best results are marked using boldface. Two ResNet-18 networks are used in our CAT framework. TRA-ALP is short for TRADES-ALP due to the limitation.ResNet-18 on Tiny-ImageNet, which also is a widely used dataset in adversarial training. The results are shown in Tab. 9. Surprisingly, CAT shows impressive robustness on the large-scale dataset. The improvement is as significant as ResNet-18 in small datasets like CIFAR-10 and CIFAR-100.D.4 CAT OF ONE MODEL WITH VARIOUS ATTACKS

The white-box robustness results (accuracy (%)) of CAT on CIFAR-10. We report the results of the best checkpoint. The best results are marked using boldface. PGD-CW denotes one network trained by PGD and CW. TRA-ALP is short for TRADES-ALP, denoting two networks with TRADES and ALP. TRA-TRA is short for TRADES-TRADES, denoting two networks with TRADES and TRADES. AT-ALP-TRA is short for ALP-ALP-TRADES.For our CAT method, we use two networks and two different attack methods for each network to perform adversarial training. An interesting baseline is one network with two different attack methods. Therefore, we use PGD and CW as our attack methods and one ResNet-18 as our network.

