CAT: COLLABORATIVE ADVERSARIAL TRAINING

Abstract

Adversarial training can improve the robustness of neural networks. Previous adversarial training methods focus on a single training strategy and do not consider the collaboration between different training strategies. In this paper, we find different adversarial training methods have distinct robustness for sample instances. For example, an instance can be correctly classified by a model trained using standard adversarial training (AT) but not by a model trained using TRADES, and vice versa. Based on this phenomenon, we propose a collaborative adversarial training framework to improve the robustness of neural networks. Specifically, we simultaneously use different adversarial training methods to train two robust models from scratch. We input the adversarial examples generated by each network to the peer network and use the peer network's logit to guide its network's training. Collaborative Adversarial Training (CAT) can improve both robustness and accuracy. Finally, Extensive experiments on CIFAR-10 and CIFAR-100 validated the effectiveness of our method. CAT achieved new state-of-the-art robustness without using any additional data on CIFAR-10 under the Auto-Attack benchmark 1 .

1. INTRODUCTION

With the development of deep learning, Deep Neural Networks (DNNs) have been applied to various fields, such as image classification (He et al., 2016) , object detection (Redmon et al., 2016) , semantic segmentation (Pal & Pal, 1993) , etc. And state-of-the-art performance has been obtained. But recent research has found that DNNs are vulnerable to adversarial perturbations (Goodfellow et al., 2014) . A finely crafted adversarial perturbation by a malicious agent can easily fool the neural network. This raises security concerns about the deployment of neural networks in security-critical areas such as Autonomous driving (Chen et al., 2019) and medical diagnostics (Kong et al., 2017) . To cope with the vulnerability of DNNs, different types of methods have been proposed to improve the robustness of neural networks, including adversarial training (Madry et al., 2017) The previous methods have focused on how to improve the model's adversarial accuracy, focusing only on the numerical improvement, but not on the characteristics of the different methods. We ask: Do different adversarial training methods perform the same for different sample instances? We analyzed different adversarial training methods (taking AT (Madry et al., 2017) and TRADES (Zhang et al., 2019) as examples) and found that different methods behave differently for sample instances, as illustrated in Figure 1 . Specifically, for the same adversarial example, the network trained by AT can classify correctly, while the network trained by TRADES misclassifies. Similarly, some examples can be correctly classified by the network trained by TRADES, but not by the network trained by AT. That is, although AT and TRADES have the same numerical adversarial accuracy, they behave differently for sample instances. This raises the question: Do two networks learn better if they collaborate? 1 https://github.com/fra31/auto-attack 1



, defensive distillation(Papernot et al., 2016), feature denoising(Xie et al., 2019)  and model pruning(Madaan  et al., 2020). Among them, Adversarial Training (AT) is the most effective method to improve adversarial robustness. AT can be regarded as a type of data augmentation strategy that trains neural networks based on adversarial examples crafted from natural examples. AT is usually formulated as a min-maximization problem, where the inner maximization generates adversarial examples, while the outer minimization optimizes the parameters of the model based on the adversarial examples generated by the inner maximization process.

