GEOMETRY-AWARE INSTANCE-REWEIGHTED ADVER-SARIAL TRAINING

Abstract

In adversarial machine learning, there was a common belief that robustness and accuracy hurt each other. The belief was challenged by recent studies where we can maintain the robustness and improve the accuracy. However, the other direction, we can keep the accuracy and improve the robustness, is conceptually and practically more interesting, since robust accuracy should be lower than standard accuracy for any model. In this paper, we show this direction is also promising. Firstly, we find even over-parameterized deep networks may still have insufficient model capacity, because adversarial training has an overwhelming smoothing effect. Secondly, given limited model capacity, we argue adversarial data should have unequal importance: geometrically speaking, a natural data point closer to/farther from the class boundary is less/more robust, and the corresponding adversarial data point should be assigned with larger/smaller weight. Finally, to implement the idea, we propose geometry-aware instance-reweighted adversarial training, where the weights are based on how difficult it is to attack a natural data point. Experiments show that our proposal boosts the robustness of standard adversarial training; combining two directions, we improve both robustness and accuracy of standard adversarial training.

1. INTRODUCTION

Crafted adversarial data can easily fool the standard-trained deep models by adding humanimperceptible noise to the natural data, which leads to the security issue in applications such as medicine, finance, and autonomous driving (Szegedy et al., 2014; Nguyen et al., 2015) . To mitigate this issue, many adversarial training methods employ the most adversarial data maximizing the loss for updating the current model such as standard adversarial training (AT) (Madry et al., 2018) , TRADES (Zhang et al., 2019) , robust self-training (RST) (Carmon et al., 2019), and MART (Wang et al., 2020b) . The adversarial training methods seek to train an adversarially robust deep model whose predictions are locally invariant to a small neighborhood of its inputs (Papernot et al., 2016) . By leveraging adversarial data to smooth the small neighborhood, the adversarial training methods acquire adversarial robustness against adversarial data but often lead to the undesirable degradation of standard accuracy on natural data (Madry et al., 2018; Zhang et al., 2019) . Thus, there have been debates on whether there exists a trade-off between robustness and accuracy. For example, some argued an inevitable trade-off: Tsipras et al. ( 2019 



) showed fundamentally different representations learned by a standard-trained model and an adversarial-trained model; Zhang et al. (2019) and Wang et al. (2020a) proposed adversarial training methods that can trade off standard accuracy for adversarial robustness. On the other hand, some argued that there is no such the trade-off: Raghunathan et al. (2020) showed infinite data could eliminate this trade-off; Yang et al. (2020) showed benchmark image datasets are class-separated.

