ARMOURED: ADVERSARIALLY ROBUST MODELS USING UNLABELED DATA BY REGULARIZING DIVER-SITY

Abstract

Adversarial attacks pose a major challenge for modern deep neural networks. Recent advancements show that adversarially robust generalization requires a large amount of labeled data for training. If annotation becomes a burden, can unlabeled data help bridge the gap? In this paper, we propose ARMOURED, an adversarially robust training method based on semi-supervised learning that consists of two components. The first component applies multi-view learning to simultaneously optimize multiple independent networks and utilizes unlabeled data to enforce labeling consistency. The second component reduces adversarial transferability among the networks via diversity regularizers inspired by determinantal point processes and entropy maximization. Experimental results show that under small perturbation budgets, ARMOURED is robust against strong adaptive adversaries. Notably, ARMOURED does not rely on generating adversarial samples during training. When used in combination with adversarial training, AR-MOURED yields competitive performance with the state-of-the-art adversariallyrobust benchmarks on SVHN and outperforms them on CIFAR-10, while offering higher clean accuracy.

1. INTRODUCTION

Modern deep neural networks have met or even surpassed human-level performance on a variety of image classification tasks. However, they are vulnerable to adversarial attacks, where small, calculated perturbations in the input sample can fool a network into making unintended behaviors, e.g., misclassification. (Szegedy et al., 2014; Biggio et al., 2013) . Such adversarial attacks have been found to transfer between different network architectures (Papernot et al., 2016) and are a serious concern, especially when neural networks are used in real-world applications. As a result, much work has been done to improve the robustness of neural networks against adversarial attacks (Miller et al., 2020) . Of these techniques, adversarial training (AT) (Goodfellow et al., 2015; Madry et al., 2018) is widely used and has been found to provide the most robust models in recent evaluation studies (Dong et al., 2020; Croce & Hein, 2020) . Nonetheless, even models trained with AT have markedly reduced performance on adversarial samples in comparison to clean samples. Models trained with AT also have worse accuracy on clean samples when compared to models trained with standard classification losses. Schmidt et al. (2018) suggest that one reason for such reductions in model accuracy is that training adversarially robust models requires substantially more labeled data. Due to the high costs of obtaining such labeled data in real-world applications, recent work has explored semi-supervised AT-based approaches that are able to leverage unlabeled data instead (Uesato et al., 2019; Najafi et al., 2019; Zhai et al., 2019; Carmon et al., 2019) .

