FEW-SHOT TRANSFERABLE ROBUST REPRESENTATION LEARNING VIA BILEVEL ATTACKS

Abstract

Existing adversarial learning methods for enhancing the robustness of deep neural networks assume the availability of a large amount of data from which we can generate adversarial examples. However, in an adversarial meta-learning setting, the model needs to train with only a few adversarial examples to learn a robust model for unseen tasks, which is a very difficult goal to achieve. Further, learning transferable robust representations for unseen domains is a difficult problem even with a large amount of data. To tackle such a challenge, we propose a novel adversarial self-supervised meta-learning framework with bilevel attacks which aims to learn robust representations that can generalize across tasks and domains. Specifically, in the inner loop, we update the parameters of the given encoder by taking inner gradient steps using two different sets of augmented samples, and generate adversarial examples for each view by maximizing the instance classification loss. Then, in the outer loop, we meta-learn the encoder parameter to maximize the agreement between the two adversarial examples, which enables it to learn robust representations. We experimentally validate the effectiveness of our approach on unseen domain adaptation tasks, on which it achieves impressive performance. Specifically, our method significantly outperforms the state-of-theart meta-adversarial learning methods on few-shot learning tasks, as well as selfsupervised learning baselines in standard learning settings with large-scale datasets.

1. INTRODUCTION

Deep neural networks (DNNs) are known to be vulnerable to imperceptible small perturbations in the input data instances (Szegedy et al., 2013) . To overcome such adversarial vulnerability of DNNs, adversarial training (AT) (Madry et al., 2018) , which trains the model with adversarially perturbed training examples, has been extensively studied to enhance the robustness of the trained deep network models. While the vast majority of previous studies (Zhang et al., 2019; Carlini & Wagner, 2017; Moosavi-Dezfooli et al., 2016; Wang et al., 2019; Rebuffi et al., 2021) have been proposed to defend against the adversarial attacks that maximize the classification loss, they assume the availability of a large amount of labeled data. Even with the recent progress in adversarial supervised learning, training on a large number of samples is essential to achieve better robustness (Carmon et al., 2019; Rebuffi et al., 2021; Gowal et al., 2021) On the other hand, a meta-learning framework (Koch et al., 2015; Sung et al., 2018; Snell et al., 2017; Finn et al., 2017; Nichol et al., 2018) which learns to adapt to a new task quickly with only a small amount of data, has been also known to be vulnerable to adversarial attacks (Goldblum et al., 2020) . Since meta-learning employs scarce data and has to adapt quickly to new tasks, it is difficult to obtain robustness with conventional adversarial training methods which require a large amount of data (Goldblum et al., 2020) . Adversarial Querying (AQ) (Goldblum et al., 2020) proposed an adversarially robust meta-learning scheme that meta-learns with adversarial perturbed query examples with AT loss (Madry et al., 2018) . Similarly, Wang et al. (2021) studies how to enhance the robustness of a meta-learning framework with the adversarial regularizer in the inner adaption or outer optimization. However, previous works (Goldblum et al., 2020; Wang et al., 2021) show poor robustness on unseen domains (see Table 1 ).



. Recently, Carmon et al. (2019) employs larger dataset (i.e., TinyImageNet (Le & Yang, 2015)) with pseudo labels, Gowal et al. (2021) utilizes generative model to generate additional samples from the dataset, and Rebuffi et al. (2021) leverages augmentation functions to obtain more data samples.

annex

(c) To generate adversarial examples for meta-learning, we propose a bilevel attack with the instance-wise attack that maximizes the difference between differently augmented query images, for the task-shared encoder f . Then, we train the framework to have an adversarially consistent prediction across multiple views with self-supervised loss while learning the encoder to generalize across tasks, which enables it to learn robust representations that are transferable to unseen tasks and domains.Since existing adversarial meta-learning approaches (Yin et al., 2018; Goldblum et al., 2020; Wang et al., 2021) mostly focus on the rapid adaptation to new tasks, while mostly reusing the features with little modification at the task adaptation step (Oh et al., 2020) , the representations themselves may not be effectively meta-learned to be robust across tasks, and thus they fail to achieve robustness when applied to unseen datasets (Section 4.1). The t-sne visualization of the feature space of AQ in Figure 2 shows that the embeddings of the adversarial examples have large overlaps across classes, which confirms this point.To tackle such challenges, we propose a novel and effective adversarial meta-learning framework which can generalize to unseen domains, Transferable RObust meta-learning via Bilevel Attack (TROBA). TROBA utilizes a bilevel attack scheme to meta-learn robust representations that can generalize across tasks and domains, motivated by self-supervised learning (Figure 1 ). Specifically, we redesign the instance-wise attack proposed in Kim et al. ( 2020); Jiang et al. ( 2020) which maximizes the instance classification loss, by adapting the shared encoder to two sets of differently augmented samples of the same instance with inner gradient update steps and then attacking them (dynamic instance-wise attack). Then, our framework learns to maximize the similarity between the feature embeddings of those two attacked samples, while meta-learning the shared encoder by BOIL (Oh et al., 2020) , which allows it to learn robust representations for any given set of augmented samples. Since the robustness is achieved at the representation level without consideration of the labels rather than at the task level, our framework can generalize to unseen tasks and domains.The experimental results from multiple benchmark datasets show that our meta-adversarial learning framework is not only robust on few-shot learning tasks from seen domains (Table 3 ) but also on tasks from unseen domains (Table 1 , 2) thanks to its ability to learn generalizable robust representations. Moreover, our model even obtains comparable robust transferability to the self-supervised pre-trained models while using fewer data instances (Table 7 ). Our contributions can be summarized as follows:• We propose a novel adversarial meta-learning framework with bilevel attacks, which allows the model to learn generalizable robust representations across tasks and domains. • Our framework obtains impressive robustness in few-shot tasks both in the seen domain and the unseen domains. Notably, on unseen domains, our model outperforms baselines by more than 10% in robust accuracy without compromising the clean accuracy.• Our framework achieves impressive robust transferability in unseen domains, that are competitive with that of the model pre-trained by SSL with larger data, while using significantly smaller amount of data for training.

2. RELATED WORK

Meta-Learning. Meta-learning (Thrun & Pratt, 1998) aims to learn general knowledge across the distribution of tasks that can be utilized to rapidly adapt to new tasks with a small amount of data. Meta-learning approaches can be broadly categorized into metric-based (Koch et al., 2015; Sung et al., 2018; Snell et al., 2017 ) or gradient-based (Finn et al., 2017; Nichol et al., 2018) approaches,

