TARGETED ADVERSARIAL SELF-SUPERVISED LEARNING

Abstract

Recently, unsupervised adversarial training (AT) has been extensively studied to attain robustness with the models trained upon unlabeled data. To this end, previous studies have applied existing supervised adversarial training techniques to selfsupervised learning (SSL) frameworks. However, all have resorted to untargeted adversarial learning as obtaining targeted adversarial examples is unclear in the SSL setting lacking of label information. In this paper, we propose a novel targeted adversarial training method for the SSL frameworks, especially for positive-pairs in SSL framework. Specifically, we propose a target selection algorithm for the adversarial SSL frameworks; it is designed to select the most confusing sample for each given instance based on similarity and entropy, and perturb the given instance toward the selected target sample. Our method significantly enhances the robustness of a positive-only SSL model without requiring large batches of images or additional models, unlike existing works aimed at achieving the same goal. Moreover, our method is readily applicable to general SSL frameworks that only uses positive pairs. We validate our method on benchmark datasets, on which it obtains superior robust accuracies, outperforming existing unsupervised adversarial training methods.

1. INTRODUCTION

Enhancing the robustness of deep neural networks (DNN) is a critical challenge for their real-world applications. DNNs have been known to be vulnerable to adversarial attacks using imperceptible perturbations (Goodfellow et al., 2015) , corrupted images (Hendrycks & Dietterich, 2019) , and images with shifted distributions (Koh et al., 2021) , which cause the attacked DNN models to perform incorrect predictions. A vast volume of prior studies has proposed to leverage adversarial training (AT) (Madry et al., 2018) ; AT explicitly uses generated adversarial examples with specific types of perturbations (e.g., ℓ ∞ -norm attack) when training a DNN model. Most of these previous AT studies have considered supervised learning settings (Madry et al., 2018; Zhang et al., 2019; Wu et al., 2020; Wang et al., 2019) in which we can utilize class label information to generate adversarial examples. On the other hand, achieving robustness in a self-supervised learning (SSL) setting has been relatively understudied despite the recent success of SSL in a variety of tasks and domains. SSL frameworks (Dosovitskiy et al., 2015; Zhang et al., 2016; Tian et al., 2020b; Chen et al., 2020; He et al., 2020; Grill et al., 2020; Chen & He, 2021) have been proposed to learn transferable visual representations by solving for pretext tasks constructed out of the training data (Dosovitskiy et al., 2015; Zhang et al., 2016) . A popular SSL approach is contrastive learning (e.g., SimCLR (Chen et al., 2020 ), MoCo (He et al., 2020) ), which learns to maximize the similarity across positive pairs, each of which contains differently augmented samples of the same instance, while minimizing the similarity across different intances. Recently, to establish robustness in these SSL frameworks, RoCL (Kim et al., 2020) and ACL (Jiang et al., 2020) have proposed adversarial SSL methods based on contrastive learning frameworks. They have demonstrated improved robustness without leveraging any labeled data. However, both of these adversarial SSL frameworks are inefficient as they require a large batch size in order to attain good performances either on clean or adversarial samples. Recent SSL frameworks (Grill et al., 2020; Chen & He, 2021; Zbontar et al., 2021) mostly resort to maximizing the consistency across two differently augmented samples of the same instance, using an additional momentum encoder (Grill et al., 2020) , without any negative pairs or additional

