FEW-SHOT TRANSFERABLE ROBUST REPRESENTATION LEARNING VIA BILEVEL ATTACKS

Abstract

Existing adversarial learning methods for enhancing the robustness of deep neural networks assume the availability of a large amount of data from which we can generate adversarial examples. However, in an adversarial meta-learning setting, the model needs to train with only a few adversarial examples to learn a robust model for unseen tasks, which is a very difficult goal to achieve. Further, learning transferable robust representations for unseen domains is a difficult problem even with a large amount of data. To tackle such a challenge, we propose a novel adversarial self-supervised meta-learning framework with bilevel attacks which aims to learn robust representations that can generalize across tasks and domains. Specifically, in the inner loop, we update the parameters of the given encoder by taking inner gradient steps using two different sets of augmented samples, and generate adversarial examples for each view by maximizing the instance classification loss. Then, in the outer loop, we meta-learn the encoder parameter to maximize the agreement between the two adversarial examples, which enables it to learn robust representations. We experimentally validate the effectiveness of our approach on unseen domain adaptation tasks, on which it achieves impressive performance. Specifically, our method significantly outperforms the state-of-theart meta-adversarial learning methods on few-shot learning tasks, as well as selfsupervised learning baselines in standard learning settings with large-scale datasets.

1. INTRODUCTION

Deep neural networks (DNNs) are known to be vulnerable to imperceptible small perturbations in the input data instances (Szegedy et al., 2013) . To overcome such adversarial vulnerability of DNNs, adversarial training (AT) (Madry et al., 2018) , which trains the model with adversarially perturbed training examples, has been extensively studied to enhance the robustness of the trained deep network models. While the vast majority of previous studies (Zhang et al., 2019; Carlini & Wagner, 2017; Moosavi-Dezfooli et al., 2016; Wang et al., 2019; Rebuffi et al., 2021) have been proposed to defend against the adversarial attacks that maximize the classification loss, they assume the availability of a large amount of labeled data. Even with the recent progress in adversarial supervised learning, training on a large number of samples is essential to achieve better robustness (Carmon et al., 2019; Rebuffi et al., 2021; Gowal et al., 2021) . Recently, Carmon et al. (2019) employs larger dataset (i.e., TinyImageNet (Le & Yang, 2015) ) with pseudo labels, Gowal et al. (2021) utilizes generative model to generate additional samples from the dataset, and Rebuffi et al. (2021) leverages augmentation functions to obtain more data samples. On the other hand, a meta-learning framework (Koch et al., 2015; Sung et al., 2018; Snell et al., 2017; Finn et al., 2017; Nichol et al., 2018) which learns to adapt to a new task quickly with only a small amount of data, has been also known to be vulnerable to adversarial attacks (Goldblum et al., 2020) . Since meta-learning employs scarce data and has to adapt quickly to new tasks, it is difficult to obtain robustness with conventional adversarial training methods which require a large amount of data (Goldblum et al., 2020) . Adversarial Querying (AQ) (Goldblum et al., 2020) proposed an adversarially robust meta-learning scheme that meta-learns with adversarial perturbed query examples with AT loss (Madry et al., 2018) . Similarly, Wang et al. (2021) studies how to enhance the robustness of a meta-learning framework with the adversarial regularizer in the inner adaption or outer optimization. However, previous works (Goldblum et al., 2020; Wang et al., 2021) show poor robustness on unseen domains (see Table 1 ). (c) To generate adversarial examples for meta-learning, we propose a bilevel attack with the instance-wise attack that maximizes the difference between differently augmented query images, for the task-shared encoder f . Then, we train the framework to have an adversarially consistent prediction across multiple views with self-supervised loss while learning the encoder to generalize across tasks, which enables it to learn robust representations that are transferable to unseen tasks and domains. Since existing adversarial meta-learning approaches (Yin et al., 2018; Goldblum et al., 2020; Wang et al., 2021) mostly focus on the rapid adaptation to new tasks, while mostly reusing the features with little modification at the task adaptation step (Oh et al., 2020) , the representations themselves may not be effectively meta-learned to be robust across tasks, and thus they fail to achieve robustness when applied to unseen datasets (Section 4.1). The t-sne visualization of the feature space of AQ in Figure 2 shows that the embeddings of the adversarial examples have large overlaps across classes, which confirms this point. To tackle such challenges, we propose a novel and effective adversarial meta-learning framework which can generalize to unseen domains, Transferable RObust meta-learning via Bilevel Attack (TROBA). TROBA utilizes a bilevel attack scheme to meta-learn robust representations that can generalize across tasks and domains, motivated by self-supervised learning (Figure 1 ). Specifically, we redesign the instance-wise attack proposed in Kim et al. (2020) ; Jiang et al. (2020) which maximizes the instance classification loss, by adapting the shared encoder to two sets of differently augmented samples of the same instance with inner gradient update steps and then attacking them (dynamic instance-wise attack). Then, our framework learns to maximize the similarity between the feature embeddings of those two attacked samples, while meta-learning the shared encoder by BOIL (Oh et al., 2020) , which allows it to learn robust representations for any given set of augmented samples. Since the robustness is achieved at the representation level without consideration of the labels rather than at the task level, our framework can generalize to unseen tasks and domains. The experimental results from multiple benchmark datasets show that our meta-adversarial learning framework is not only robust on few-shot learning tasks from seen domains (Table 3 ) but also on tasks from unseen domains (Table 1 , 2) thanks to its ability to learn generalizable robust representations. Moreover, our model even obtains comparable robust transferability to the self-supervised pre-trained models while using fewer data instances (Table 7 ). Our contributions can be summarized as follows: • We propose a novel adversarial meta-learning framework with bilevel attacks, which allows the model to learn generalizable robust representations across tasks and domains. • Our framework obtains impressive robustness in few-shot tasks both in the seen domain and the unseen domains. Notably, on unseen domains, our model outperforms baselines by more than 10% in robust accuracy without compromising the clean accuracy. • Our framework achieves impressive robust transferability in unseen domains, that are competitive with that of the model pre-trained by SSL with larger data, while using significantly smaller amount of data for training.

2. RELATED WORK

Meta-Learning. Meta-learning (Thrun & Pratt, 1998) aims to learn general knowledge across the distribution of tasks that can be utilized to rapidly adapt to new tasks with a small amount of data. Meta-learning approaches can be broadly categorized into metric-based (Koch et al., 2015; Sung et al., 2018; Snell et al., 2017) or gradient-based (Finn et al., 2017; Nichol et al., 2018) approaches, and in this work, we focus on the gradient-based approaches to benefit from their versatility. Perhaps the most popular work in this direction is Model Agnostic Meta-Learning (MAML) (Finn et al., 2017) which uses a bilevel optimization scheme. MAML consists of inner-and outer-optimization loops, where in the inner loop, the model takes a few gradient steps to quickly adapt to a new task and in the outer loop, the parameters are meta-updated to generalize multiple tasks. To efficiently reuse the features of the encoder, ANIL (Raghu et al., 2019) updates only the classification head in the inner loop while keeping the encoder fixed. On the other hand, Oh et al. (2020) propose BOIL, which meta-learns the feature extractor while keeping the final classifier fixed, and show that it has better generalization over cross-domain adaption tasks compared to MAML or ANIL. Various approaches have been proposed to efficiently update the meta-learner; Li et al. (2017) propose to meta-learn the learning rate for each parameter, Li et al. (2017) propose to reduce the computation costs of MAML with first-order approximation in the inner loop, and Nichol et al. (2018) propose to repeatedly sample a task to move the initialization toward the task. Adversarial Training. Many existing works aim to enhance the robustness of a model trained with supervised learning with labeled data, by utilizing adversarial examples generated by applying perturbations that can maximize its loss (Goodfellow et al., 2015; Carlini & Wagner, 2017; Papernot et al., 2016) . The most popular approach to enhance the robustenss of general DNNs is Adversarial Training from Madry et al. (2018) , which utilizes project gradient descent (PGD) to maximize the loss in the inner-maximization loops while minimizing the overall loss on adversarial samples generated by the PGD attack. Zhang et al. (2019) theoretically shows the trade-off between clean accuracy and robustness in adversarial training (TRADES), and introduces regularized Kullback-Leibler divergence (KLD) loss that helps to enhance the robustness by enforcing the consistency in the predictive distribution between the clean and adversarial examples. Adversarial Meta-Learning. Although meta-learning contributes to learning useful generalizable knowledge with scarce data, existing state-of-the-art meta-learning approaches are prone to adversarial perturbations. To tackle this problem, Yin et al. (2018) attempt to combine adversarial training (AT) (Madry et al., 2018) with MAML (Finn et al., 2017) , referring to the problem as adversarial meta-learning (ADML). ADML uses clean and adversarial examples simultaneously to update the parameters both in the inner and outer optimization, and thus requires a high computational cost. However, Goldblum et al. (2020) later point out that ADML (Yin et al., 2018) may not obtain good robustness to strong attack since it uses relatively weak attacks during training. Then, they propose an adversarially robust meta-learner, Adversarial Querying (AQ), which trains with adversarial examples only from the query set. Wang et al. (2021) studies how to obtain robustness in meta-learning framework and suggest Robust-regularized meta-learner on top of the MAML (RMAML), where adversarial attacks are conducted only in the meta-optimization phase as in AQ (Goldblum et al., 2020) . However, previous adversarial meta-learning methods are still vulnerable to adversarial attacks on unseen domains. This is mainly because they do not result in learning robust representations, as observed in Oh et al. (2020) , Goldblum et al. (2020) and Wang et al. (2021) , as the representations are reused with little updates during inner optimization steps. Thus the effect of meta-learning is minimal on the learned representations, which prevents them to achieve generalizable robustness. To tackle such a limitation, we propose a robust self-supervised meta-learning framework via bilevel attacks which meta-learns the representation layers to generalize across any adversarial learning tasks that are generated from randomly sampled instances.

3. TRANSFERABLE ROBUST META-LEARNING VIA BILEVEL ATTACKS

One of the ultimate goals of meta-learning is the generalization to unseen domains and tasks. Learning transferable robust representations that can generalize to unseen domains is a difficult problem even with a large amount of labeled data. Yet, we aim to tackle this challenging problem only by learning with a few samples per category, but with an effective meta-learning framework with a bilevel attack scheme. Before describing our methods, we first elaborate on the preliminaries of our framework.

3.1. PRELIMINARIES

Model-Agnostic Meta-Learning. Let us denote the encoder as f θ and classifier as h µ . Since meta-learning aims to learn to learn the new tasks, it needs to train on a large number of tasks τ sampled from a task distribution p(τ ), where a given task consists of the support set S τ and the query set Q τ ∈ D. Each set contains a n-way k-shot classification task, that classify n classes with k images, i.e., n × k instances. The most popular framework for meta-learning is model-agnostic meta-learning (MAML) Finn et al. (2017) , which meta-learns the model with a bilevel optimization scheme, with inner optimization and outer meta-level optimization steps. During the inner optimization, we adapt the shared initial parameter to each new task τ to obtain task-adaptive parameters θ τ and µ τ , by taking a few gradient steps, as follows: θ τ , µ τ = θ, µ -α∇ θ,µ L Sτ (h µ (f θ )), where S τ is a support set of task τ , α is the step size and L is a task-specific loss to conduct gradient step for inner updates (e.g. cross-entropy loss). There also exist different variants of the MAML framework with respect to which parameters to update. ANIL (Raghu et al., 2019) only meta-learns the final linear layer while fixing the encoder (i.e., θ τ = θ), for rapid adaptation to a new task while reusing the features. On the other hand, BOIL (Oh et al., 2020) only meta-learns the encoder, thus the representation layers, while keeping the final classifier fixed (i.e., µ τ = µ). We employ BOIL (Oh et al., 2020) which only updates the encoder because our focus is on learning generalizable robust representations. In the meta-optimization phase, model parameters are updated with meta-objective via stochastic gradient descent (SGD) as follows: θ, µ ← θ, µ -β∇ θ,µ L Qτ (h µ τ (f θ τ )), where Q τ is a query set of task τ , β is a meta step size and L is meta-objective. The meta-objective is a summation of losses from the query set of all given tasks, where the losses depend on what aims to be meta-learned. To reduce the computation overhead in MAML, we use Meta-SGD (Li et al., 2017) which learns the learning rate of parameters that enables to initialize and adapt any differentiable learner in a single step. Attacking a meta-learner. To obtain robustness on few-shot tasks, Adversarial Querying (AQ) (Goldblum et al., 2020) proposes to generate attacks with only the query examples. The AQ employs the project gradient descent attack (PGD (Madry et al., 2018) ), which is a class-wise attack that maximizes the cross-entropy on a given query image as follows, δ t+1 = Π B(x q ,x q +ϵ) δ t + γsign ∇ δ t L CE hµτ (f θ τ (x q + δ t )), y q , ( ) where x q and y q is a query image and its label of task τ , respectively, B(x q , x q + ϵ) is the l ∞ norm-ball around x q with radius ϵ, γ is step size of the attack, δ is perturbation and the cross entropy loss (L CE ) is calculated on the inner updated parameters (θ τ , µ τ ). Robust training loss. Various adversarial training methods have been proposed to enhance the model's robustness to adversarial attacks (Appendix A.1). Among them, we adapt the TRADES (Zhang et al., 2019) loss to improve robustness. TRADES proposes to regularize the model's outputs on the clean and adversarial examples with Kullback-Leibler divergence (KLD) as follows: L TRADES = L CE h µ τ (f θ τ (x q )), y q + β max δ∈B(x q ,x q +ϵ) L KL h µ τ (f θ τ (x q ))||h µ τ (f θ τ (x q + δ)) , ) where L CE is cross-entropy loss on clean examples, L KL is KLD loss between clean and adversarial logit to obtain robustness, and β is a regularizer to control the trade-off between clean accuracy and robustness which normally set as 6.0. In our framework, we calculate the adversarial loss on query sets (x q , y q ), which are different instances used in inner adaptation, to meta-learn robust representations in the meta-optimization phase.

3.2. ADVERSARIAL META-LEARNING WITH SELF-SUPERVISED LEARNING

Bi-level parameter augmentation in adversarial meta-learning. In recent self-supervised learning (Chen et al., 2020; He et al., 2020; Grill et al., 2020) , image augmentation is applied to produce multiple views of the same instance, which is used to learn non-linear transformation representation space that leads to learning good quality of the visual representations. Motivated by the self-supervised learning concept, to have transferable robustness in meta-learning, we propose a Algorithm 1 Transferable robust meta learning via bilevel attack (TROBA) Require: Dataset D, transformation function t ∼ T Require: Encoder f , parameter of encoder θ, classifier h, parameter of classifier µ Require: adversary A(base, target, parameter) while not done do Sample tasks {τ }, Support set S(x s , y s ), Query set Q(x q , y q ) for i = 1, • • • do Transform input t 1 (x s ), t 2 (x s ) Fine-tune model with t 1 (x s ), y s and updates parameter θ τ 1 Fine-tune model with t 2 (x s ), y s and updates parameter θ τ 2 Generate adversarial examples t 1 (x q ) adv = A(t 1 (x q ), t 2 (x q ), θ τ 1 ), t 2 (x q ) adv = A(t 2 (x q ), t 1 (x q ), θ τ 2 ) Calculate losses on query set images L TRADES1 = L CE (h µ (f θ τ 1 (t 1 (x q ))), y q ) + L KL (f θ τ 1 (t 1 (x q )), f θ τ 1 (t 1 (x q ) adv )) L TRADES2 = L CE (h µ (f θ τ 2 (t 2 (x q ))), y q ) + L KL (f θ τ 2 (t 2 (x q )), f θ τ 2 (t 2 (x q ) adv )) L self-sup = L similarity (f θ τ 1 (t 1 (x q ) adv ), f θ τ 2 (t 2 (x q ) adv )) L Our = L TRADES1 + L TRADES2 + L self-sup Compute gradient g τ = ∇ θ τ 1 ,θ τ 2 L Our end for Update model parameters θ, µ ← θ, µ -α τ g τ end while bilevel parameter augmentation with self-supervised learning. Bilevel parameter augmentation enables the model to adapt the view-specific projected latent space to set of augmented samples of the given instance. Specifically, to generate augmented parameters of the encoder, we first generate multiple views of images with a stochastic data augmentation function t that is randomly selected from the augmentation set T , including random crop, random flip, random color distortion, and random grey scale as Zbontar et al. (2021) . We then apply two random augmentations t 1 , t 2 ∼ T to images from both support set (S = {t 1 (x s ), t 2 (x s ), y s }) and query set (Q = {t 1 (x q ), t 2 (x q ), y q }). Then, we generate multiple views of the shared parameters (θ τ 1 and θ τ 2 ) which are adapted parameters of encoder with differently transformed support sets (S τ = {t 1 (x s ), t 2 (x s ), y s }) as shown in Figure 1 . Overall, we introduce parameter-level augmentation along with image-level augmentation to form a different view of single instances in the meta-learning framework, which we refer to as bilevel parameter augmentation. Bilevel attack with dynamic instance-wise attack. On top of bilevel parameter augmentation, we propose a bilevel attack with a dynamic instance-wise attack to obtain generalized robustness in few-shot tasks. We redesign the instance-wise attack introduced in the self-supervised adversarial learning (Kim et al., 2020; Jiang et al., 2020) , which generates adversaries by maximizing the instance classification loss in Equation 3. Specifically, we apply an instance-wise attack on our meta-learning framework, by generating adversaries that maximize the difference between the representations of the augmented samples of the same instance obtained by the encoder whose parameters are adapted to each view, as follows: δ t+1 1 = Π B(x q ,x q +ϵ) δ t 1 + αsign ∇ δ t 1 L similarity f θ τ 1 (t 1 (x q ) + δ t 1 ), f θ τ 1 (t 2 (x q )) , δ t+1 2 = Π B(x q ,x q +ϵ) δ t 2 + αsign ∇ δ t 2 L similarity f θ τ 2 (t 2 (x q ) + δ t 2 ), f θ τ 2 (t 1 (x q )) , where δ 1 , δ 2 are generated perturbations to maximize the difference between features from each bilevel augmented encoder f (θ τ 1 ) and f (θ τ 2 ) respectively. The maximized loss L similarity is the instance-wise classification loss used in adversarial self-supervised learning (Kim et al., 2020) . We use the differently transformed query counterpart sets as a target for dynamic instance-wise attack and calculate perturbations with the parameter of the augmented encoder. Adversarial meta-learning with bilevel attack. We now present a framework to learn transferable robust representations via bilevel attack for unseen domains. The gradient (g) is calculated to minimize our proposed objective as follows: where L Our is the meta-objective loss to obtain generalized robustness, h µ is a meta-initialized classifier, and f θ1 and f θ2 are bilevel augmented encoder for each view. Further, the L Our consists of adversarial loss, i.e., TRADES (Zhang et al., 2019) loss, and self-supervised loss as follow, g = ∇ θ τ L Our = 2 n=1 L CE (l n , y q ) + L KL (l adv n , l n ) + L self-sup (z adv 1 , z adv 2 ), where z n = f θ τ n (t n (x q )) and l n = h µ (z n ) are a feature and a logit of each multi-view instance with augmented encoder f θ τ n and meta-initialized classifier h µ respectively, L CE is a cross-entropy loss, l adv n is a logit of an attacked image generated from our bilevel attack which is a dynamic instance-wise attack, L KL is a KL-divergence loss, and L self-sup is a cosine similarity loss between two differently augmented features. This sum of the cross-entropy and KL-divergence loss is the TRADES loss to learn robustness for each augmented encoder for each task (Equation 4). The crucial component here is the self-supervised loss which regularizes our model to have robust consistency between the features from the two different views, which helps it learn robust representations across any instances or augmentations, allowing it to achieve transferable robustness. The overall algorithm of TROBA is described in Algorithm 1.

4. EXPERIMENT

In this section, we first validate the robustness of our model in unseen domain few-shot learning tasks (Section 4.1). Then, we analyze our model through ablation experiments on the type of loss, attack, and augmentations (Section 4.2). Specifically, our models also show comparable transfer robustness to that of the self-supervised framework trained in a standard learning setting with large datasets, although we use a significantly smaller amount of data for pre-training (Section 4.3). Experimental Setup. For meta-learning setting, we train our approach on ResNet12 with 5-way 5-shot images in both CIFAR-FS and Mini-ImageNet. We train our model with the BOIL (Oh et al., 2020) framework, and take a single step in both inner and outer optimization as done in (Li et al., 2017) during meta-training and the inner optimization of meta-testing steps. We adversarially train our model with ℓ ∞ PGD attacks with the epsilon of 8/255, alpha of 2/255 in 7 steps. We evaluate the robustness against ℓ ∞ PGD attacks with the epsilon of 8/255 and 20 iterations for evaluation, following the standard procedure. The code will be available in Anonymous. More experimental details are in Appendix B. Robustness in unseen domain. Since our main goal is to achieve transferable robustness in unseen domains, we mainly validate our methods on unseen domain few-shot tasks. We metatrain our model on CIFAR-FS and meta-test on the benchmark datasets with different domains such as Mini-ImageNet, Tiered-ImageNet, CUB, Flowers, and Cars. As shown in Table 1 , previous adversarial meta-learning methods have difficulty in achieving robustness on unseen domains. However, TROBA is able to show impressive transferable robustness in this cross-domain task. It also obtains significantly better clean accuracy over the adversarial meta-learning baselines, while obtaining competitive clean accuracy to MAML. In particular, TROBA shows better robustness compared to baselines even though the distribution of the unseen domain is highly different from the distributions of the meta-trained dataset (i.e., CUB, Flowers, Cars). Further visualization of the representations of the instance-wise adversarial samples from the unseen domains shows that our model is able to obtain well-separated feature space for attacked samples on this novel domain (CIFAR-10) even before adapting to it, while the previous adversarial meta-learning framework learns a feature space with large overlaps across the adversarial instances belonging to different classes (Figure 2 ). This suggests that the superior performance of our model (Table 1 , 2) mainly comes from its ability to learn such transferable robust representations. In particular, TROBA has smoother loss surface to adversarial examples compared to the baseline, which is why TROBA could demonstrate better robustness in unseen domain (Figure 3 ). Robustness in seen domain. Even though TROBA is designed to have transferable robustness in the unseen domain, our methods also show better robustness in seen domain few shot tasks compare to baselines, even with better clean accuracy (Table 3 ). In addition, TROBA shows smoother loss surface to adversarial examples which is also directly associated with better robustness and generalization (Figure 4 ). Our method is agnostic to the meta-learning approach, as shown in Table 4 , which suggests that the type of meta-learning strategy is not the main factor in achieving the transferable robustness. We only update the encoder in the inner optimization for all meta-learning algorithms.

4.2. ABLATION STUDY

To examine each component of our proposed methods, we conduct the ablation study on augmentation, loss, and attack. Through our ablation study, we verify the effectiveness of each component by their robustness on unseen domains. Bilevel parameter augmentation contributes to learn generalized features. As shown in Table 5, image-only augmentation alone meaningfully contributes to learning generalized features for unseen domains. However, when we apply parameter augmentation on top of the image augmentation, the model achieves significantly better clean and robust accuracy than the model trained with image-only augmentation, especially in the seen domain. This suggests that the bilevel parameter augmentation is effective in learning consistent representations across tasks and views. To support our claim, we calculate the Centered Kernel Alignment (CKA) (Kornblith et al., 2019) value, which measures the similarity between representations (When representations are identical, the CKA is 1). As shown in Figure 5 , when bilevel parameter augmentation is applied, features from the augmented parameters are more dissimilar than features with image augmentation only. These results show that our bilevel parameter augmentation may generate more different multi-views of the same instances which helps learn invariant representations across views, that help it to achieve generalizable robustness. Self-supervised loss regularized to learn generalized features. TROBA leverages both adversarial loss and self-supervised loss in meta-objective; specifically, it uses TRADES loss (Equation 4) and cosine similarity loss between representations of differently bilevel augmented views, as shown in Equation 7. The adversarial loss is calculated independently in each bilevel augmented network to enhance the robustness on each training sample. On the other hand, the self-supervised loss is computed between the representations of each bilevel augmented encoder to enforce the consistency across features for samples attacked with our bilevel attack, which helps it to obtain a consistent representation space across perturbations and instances, which helps with its generalization (Figure 6 ). Notably, the self-supervised loss has a larger contribution when we conduct transfer learning to unseen domains with larger data (Appendix D.2). Bilevel attack makes the model to be robust on unseen domain attacks. We further analyze the effect of our bilevel instance-wise attack compared to class-wise attack in Table 6 . We observe that adversarial examples that are attacked with instance-wise attack make the model more robust in unseen domains compared to class-wise attack. Specifically, instance-wise attack generates adversaries that have larger difference to clean examples in the representation level, and thus can be thought as a stronger attack. To demonstrate the effectiveness of instance-wise attack, we calculate CKA (Kornblith et al., 2019) between clean and adversarial features from each bilevel augmented parameters. As shown in Figure 7 , instance-wise attack produces more difficult adversarial examples that are highly dissimilar from clean instances. However, when the parameter is augmented with bilevel parameter augmentation, the class-wise attack also can show transferable robustness since self-supervised loss supports it to obtain generalized robustness.

4.3. TRANSFERABLE ROBUSTNESS IN DIFFERENT DOMAINS

To demonstrate the power of our adversarially transferable meta-trained model, we further evaluate our model on a standard transfer learning scenario that employs full data to fully train the encoder with the linear layer on top of it. Specifically, we want to evaluate the generalizable robustness of the representations learned by our encoder against a self-supervised learning model trained with a large amount of data. We evaluate our model on the seen domain, CIFAR-100, as well as on two unseen domains, which are CIFAR-10 and STL-10 respectively. As shown in Table 7 , our model shows comparable clean accuracy and robustness in the unseen domains despite the difference in the amount of data used to train the model. Our model is pre-trained with scarce data, and we have even reduced the number of the steps for the bilevel attack to 3 steps to reduce the computational cost, but obtains competitive performance to the model trained with larger data. The experimental results suggest that we may use our method as a means of pretraining the representations to ensure robustness for a variety of applications, when the training data is scarce.

5. CONCLUSION

We proposed a novel adversarial self-supervised meta-learning framework that can learn transferable robust representations using only a few data via bilevel attack, which introduces a novel bilevel parameter augmentation along with dynamic instance-wise attack. Specifically, the bilevel attack leverages self-supervised learning to effectively generate robust representation of multi-views with differently augmented encoder, which allows learning non-linear transformation task-adaption that brings good robust generalization power. While previous adversarial meta-learning methods are extremely vulnerable to unseen domains, our model learned generalized robust representations which can demonstrate impressive transferable robustness on few-shot tasks in unseen domains. Moreover, we validate our models on larger data in unseen domains which shows comparable robust representations with self-supervised learning (SSL) model with much fewer data. We hope that our work inspires adversarial meta-learning to obtain a good robust representations only using a few data.

Supplementary Material

Few-Shot Transferable Robust Representation Learning via Bilevel Attacks A RELATED WORKS A.1 ADVERSARIAL LEARNING The vulnerability of deep neural network (DNN) to imperceptible small perturbation on the input is a well-known problem as observed in previous works (Biggio et al., 2013; Hendrycks & Dietterich, 2019; Szegedy et al., 2013) . To overcome the adversarial vulnerability, many attack-based approaches for constructing perturbed examples (Goodfellow et al., 2015; Carlini & Wagner, 2017; Papernot et al., 2016) have appeared. On the other hand, Madry et al. (2018) proposes a defense-based approach against adversarial examples. Madry et al. ( 2018) utilizes a project gradient descent (PGD) in the perspective of robust optimization, which maximizes the loss in the inner-maximization loops while minimizing the overall loss on tasks in outer-minimization loops, the so-called min-max formulation. Zhang et al. (2019) theoretically shows the trade-off between clean accuracy and robustness in adversarial training. To improve both clean and robust accuracy, TRADES (Zhang et al., 2019) introduces regularized surrogate loss. Especially, the Kullback-Leibler divergence (KLD) in TRADES (Zhang et al., 2019) helps to enhance the robustness by enforcing consistency between representations of clean and adversarial examples. Afterward, significant advances in adversarial robustness have emerged. Kim et al. (2020) ; Jiang et al. (2020) proposes a self-supervised adversarial learning mechanism coined with contrastive learning to obtain a robust representation without explicit labels. Since a larger dataset is essential to have better adversarial robustness, Shafahi et al. (2019) leverages transfer learning to transfer learned robust representations into new target domains with only a few data. Goldblum et al. (2020) proposes robust supervised meta-learners with adversarial query images in few-shot classification tasks. However, previous works still have difficulty in obtaining generalized robustness on multiple datasets.

A.2 SELF SUPERVISED LEARNING

Conventional adversarial learning mechanisms in a supervised manner require the label information which needs expensive human labeling annotations. Self-supervised learning makes the neural networks possible to learn comparable representations to supervised representations, even it does not leverage labels (Grill et al., 2020) . Many previous works focus on learning consistent representations to different distortions in the input (Koch et al., 2015; Chen et al., 2020; He et al., 2020; Tian et al., 2020) . To learn the distortion-invariant representations, they enforce consistency between representations of two differently augmented inputs with the same instance-level identity. Especially, Chen et al. (2020) employs contrastive learning to maximize agreement only between positive pairs in mini-batch while negative pairs are handled as the opposite. In advance, other works introduce asymmetry into network architecture or parameter update (Chen & He, 2021; Grill et al., 2020) to improve performance. However, the existence of trivial solutions derived from asymmetry leaves room to improve. Zbontar et al. (2021) achieves comparable performance by introducing redundancy reduction terms in the training objectives, even it does not require additional asymmetric networks or large batches.

A.3 SELF-SUPERVISED ADVERSARIAL LEARNING

Utilizing the advantages of self-supervised learning, adversarial training mechanisms in a selfsupervised manner have emerged to learn robust representations without relying on label information. Recent works leverage contrastive learning to obtain robust representation in a self-supervised manner (Kim et al., 2020; Jiang et al., 2020) . Kim et al. (2020) first devises the instance-wise adversarial perturbation, which does not require explicit labels during the attack, and utilizes those perturbed examples in maximizing contrastive loss. Jiang et al. (2020) introduces a dual stream with optimizing two contrastive losses against four augmented views, which are computed between clean views and adversarial images, respectively. However, these approaches highly rely on large batch sizes to effectively create positive and negative samples for the contrastive learning framework. Gowal et al. (2020) injects adversarial examples on top of the BYOL framework (Grill et al., 2020) to achieve robustness to avoid the restrictions on large batch sizes. Although existing restrictions on large batch sizes or image augmentation have been relieved during extensive development in self-supervised adversarial training, obtaining robustness with scarce data is still difficult, even in a supervised manner.

B EXPERIMENTAL DETAILS

B.1 DATASET For meta-training, we use CIFAR-FS (Bertinetto et al., 2019) and Mini-ImageNet (Russakovsky et al., 2015) . CIFAR-FS and Mini-ImageNet consist of 100 classes which are 64, 16, and 20 for meta-training, meta-validation, and meta-testing, respectively. We validate our model on 6 benchmark few-shot datasets: CIFAR-FS (Bertinetto et al., 2019) , Mini-ImageNet (Russakovsky et al., 2015) , Tiered-ImageNet (Russakovsky et al., 2015) , Cars, CUB and VGG-Flower, for few-shot classification and 3 additional benchmark standard image classification datasets: CIFAR-10, CIFAR-100, and STL-10, for robust transferability. CIFAR-10 and CIFAR-100 consist of 50,000 training images and 10,000 test images with 10 and 100 classes, respectively. All images are used with 32×32×3 resolution (width, height, and channel) for meta-training. Especially, we apply TorchMetafoot_2 library to load the few-shot datasets into our frameworks.

B.2 META-TRAIN

We meta-train ResNet12 and ResNet18 as the base encoder network on CIRAR-FS and Mini-ImageNet. All models are meta-trained with tasks consist of 5-way 5-shot support set images and 5-way 15shot query set images, and meta-validated with only clean tasks consist of 5-way 1-shot support set images and 5-way 15-shot query set images. Especially, we train the model with randomly selected 200 tasks and validate the model with randomly selected 100 tasks. For optimization, we meta-train our models with 300 epochs under SGD optimizer with weight decay 1e-4. For data augmentation, we use random crop with 0.08 to 1.0 size, color jitter with probability 0.8, horizontal flip with probability 0.5, grayscale with 0.2, gaussian blur with 0.0, and solarization probability with 0.0 to 0.2. We exclude normalization for adversarial training. In the case of adversarial learning, we use our proposed bilevel attack with 3 steps and 7 steps. To generate adversaries with query set images, we take the gradient step within l ∞ norm ball with ϵ = 8.0/255.0 and α = 2.0/255.0 to maximize the similarity with target instance. To obtain robust representation, we utilize an adversarial loss and self-supervised loss which are TRADES (Zhang et al., 2019) with a regularization hyperparameter of 6.0 and cosine similarity loss, respectively. Overall TROBA model figure is shown in Figure 8 . Three different meta-learning frameworks are leveraged to train our model, which are MAML (Finn et al., 2017) , FOMAML (Finn et al., 2017) and Meta-SGD (Li et al., 2017) . Specifically, we only update the encoder parameters in inner optimization for all three meta-learning strategies. Detailed hyperparameters for meta-train and meta-test will be described in B.3.

B.3 HYPERPARAMETER DETAILS OF EACH META-LEARNING FRAMEWORKS

MAML We take a single step for both inner optimization and outer optimization to meta-train ResNet12 on CIFAR-FS and Mini-ImageNet. We use the same learning rate for both datasets, which are 0.3 and 0.08 for outer optimization and inner optimization, respectively. For both dataset, we use batch size 4. FOMAML To reduce the computational cost, we try to adapt FOMAML (Finn et al., 2017) , which is the first-order approximation of MAML (Finn et al., 2017) . For ResNet18, we use a single step in both inner optimization and outer optimization, and use the learning rates 0.3 and 0.4 in outer optimization and inner optimization, respectively. For ResNet12, we use 3 steps for inner optimization, and 1 step for outer optimization. We use learning rate 0.3 and 0.2 for outer optimization and inner optimization, respectively. For both dataset, we use batch size 4. META-SGD To learn quickly, we use the Meta-SGD (Li et al., 2017) with the single step. We use a single step in inner optimization and use the 0.005 inner learning rate. For outer loop, we use 0.005 outer learning rate for CIFAR-FS. For Mini-ImageNet, we use a same step size as CIFAR-FS but with different inner learning rate, 0.001, and outer optimization learning rate 0.001. For both dataset, we use batch size 4.

B.4 META-TEST

The trained models are evaluated with 400 randomly selected tasks from test set, where each task consists of 5-way 5-shot support set images and 5-way 15-shot query set images. We use a single step in both inner optimization and outer optimization. We especially use same learning rate and meta step size as the model is meta-trained.

B.5 ADVERSARIAL EVALUATION

FEW-SHOT ROBUSTNESS We validate the robustness of our trained models against two types of attack, which are PGD (Madry et al., 2018) and AutoAttack (Croce & Hein, 2020) (Croce & Hein, 2020) . Especially, in comparisons with self-supervised models, we pre-train ResNet18 based on FOMAML (Finn et al., 2017) , which is the first-order approximation of MAML (Finn et al., 2017) , and apply bilevel attacks with 3 steps to reduce the computational cost. Other self-supervised models are pre-trained with PGD-7 attacks. For optimization, we fine-tune the pre-trained models for 110 epochs with batch size 128 under SGD optimizer with weight decay 5e-4, where Pang et al. (2022) demonstrated as optimal for robust full-finetuning on CIFAR datasets. Our meta-objective consists of adversarial loss and self-supervised loss which are TRADES (Zhang et al., 2019) and the cosine similarity loss between differently augmented features as described in Equation 7. We compare two different meta-objectives of TROBA in the main paper, which are the case of using cross-entropy loss on clean examples instead of adversarial loss, and using only TRADES loss without self-supervised loss terms, respectively. Further, we replace the adversarial loss term with AT (Madry et al., 2018) , which is widely used to obtain robustness in adversarial learning, while utilizing the same self-supervised loss. As shown in Table 8 , utilizing a TRADES (Zhang et al., 2019) loss as an adversarial loss is more effective to obtain transferable robustness in adversarial meta-learning than a AT (Madry et al., 2018) loss.

C.5 COMPARISON WITH SELF-SUPERVISED PRE-TRAINED MODELS

We select baseline models with ACL (Jiang et al., 2020) foot_4 , BYORL (Gowal et al., 2020) and RoCL (Kim et al., 2020) foot_5 for self-supervised pre-trained baselines. We implement BYORL on top of the BYOL (Grill et al., 2020) To prove that TROBA is an effective method to obtain transferable robust representations, we experiment with three different types of meta-learning frameworks and different strengths of bilevel attacks. Specifically, we train TROBA on top of the MAML (Finn et al., 2017) , FOMAML (Finn et al., 2017) and MetaSGD (Li et al., 2017) and apply bilevel attacks with 3 steps and 7 steps, respectively. Here, we only update the encoder parameters in inner adaption, since we propose task adaptive attacks that maximize the difference between the features, further to learn generalized representations as BOIL (Oh et al., 2020) demonstrated. As shown in Table 9 , TROBA outperforms the previous adversarial meta-learning model (Goldblum et al., 2020) by more than 10% robustness regardless of meta-learning strategies. Furthermore, we show outstanding robustness with only 3 steps of bilevel attacks (i.e., dynamic instance-wise attack) compared to AQ (Goldblum et al., 2020) , which is trained with PGD-7 attacks (i.e., class-wise attack). To demonstrate that a bilevel attack is a more effective attack than a class-wise attack in the representation level, we calculate CKA (Kornblith et al., 2019) between clean and adversarial features to measure the similarity in the feature level. Notably, the CKA value of features attacked with the bilevel attack is smaller than the CKA values of features attacked with the class-wise attack (Figure 7 ), which means that the bilevel attack constructs more confusing perturbed images that are more dissimilar from their clean examples. Through these remarkable results, we demonstrate that our proposed bilevel attack served as a stronger attack that makes the model to have robust transferability to unseen domains, even with fewer gradient steps of attacks and little data.



,θ τ ,µ L Our (h µ , f θ τ 1 , f θ τ 2 , t 1 (x q ), t 1 (x q ) adv , t 2 (x q ), t 2 (x q ) adv , y q ),(6) https://github.com/tristandeleu/pytorch-meta https://github.com/fra31/auto-attack https://github.com/VITA-Group/Adversarial-Contrastive-Learning https://github.com/Kim-Minseon/RoCLforself-supervisedlearning https://github.com/lucidrains/byol-pytorch



Figure 1: Overview of TROBA. (a) TROBA adapts the encoder to differently augmented sets of the support sets (blue, purple line). Then, it meta learns (black line) with both adversarial loss (red) and self-supervised loss (yellow). (b) During the inner adaptation, TROBA adapts encoders with the differently augmented support sets.(c) To generate adversarial examples for meta-learning, we propose a bilevel attack with the instance-wise attack that maximizes the difference between differently augmented query images, for the task-shared encoder f . Then, we train the framework to have an adversarially consistent prediction across multiple views with self-supervised loss while learning the encoder to generalize across tasks, which enables it to learn robust representations that are transferable to unseen tasks and domains.

Figure 2: Visualization of feature on unseen domains, CIFAR-10, where models are trained on CIFAR-FS.

Figure 3: Loss surface of unseen domain (Mini-ImageNet)

Figure 4: Loss surface of seen domain (CIFAR-FS)

Figure 6: Ablation on meta-objectives in TROBA. Test accuracy(%) on the seen domain (CIFAR-FS) and unseen domains (Mini: Mini-ImageNet, Tiered: Tiered-ImageNet, Flower, CUB, Cars) of 5-way 5-shot task. Legends denote the meta-objectives loss that is used to train the model. All models are adversarially meta-trained on CIFAR-FS with attack step 3 due to computation overhead. (a) Clean accuracy stands for the accuracy of clean images. (b) Robustness is calculated with PGD-20 attack(ϵ = 8./255., step size=ϵ/10).

Figure 5: Effect of augmentation

Figure 7: Effect of type of attack

Figure 8: Overall model figure of TROBA.

5 framework, following description in the paper. D ADDITIONAL EXPERIMENTAL RESULTS OF ROBUSTNESS D.1 ROBUSTNESS ON UNSEEN DOMAINS WITH DIFFERENT META-LEARNING FRAMEWORK AND DIFFERENT ITERATIONS OF BILEVEL ATTACK

Results of transferable robustness in 5-way 5-shot unseen domain tasks that are trained on 5-way 5-shot CIFAR-FS. Rob. stands for accuracy(%) that is calculated with PGD-20 attack (ϵ = 8./255., step size=ϵ/10). Clean stands for test accuracy(%) of clean images. All models are trained with PGD-7 attacks on ResNet12.

Results of transferable robustness in 5-way 5-shot unseen domain tasks that are trained on 5-way 5-shot Mini-ImageNet. Rob. stands for accuracy(%) that is calculated with PGD-20 attack (ϵ = 8./255.). Clean stands for test accuracy(%) of clean images. All models are trained with PGD-7 attacks on ResNet12.

Comparison in the 5-shots seen domain tasks. All models are trained on CIFAR-FS and Mini-ImageNet, respectively, with PGD-7 attack in ResNet12. * stands for reported results inWang et al. (2021).Goldblum et al., 2020) 73.49 28.49 39.47 13.52 RMAML(Wang et al., 2021)  *  57.95 35.30 43.98 21.47  Ours(Li et al., 2017)  64.90 43.34 47.56 18.18

Results of TROBA with a different metalearning frameworks in 5-shot tasks. All models are trained on CIFAR-FS and Mini-ImageNet, respectively, with PGD-7 attacks in ResNet12.Finn et al., 2017) 52.79 32.50 37.58 14.23 +FOMAML(Finn et al., 2017) 53.42 35.95 33.87 15.60  +Meta-SGD (Li et al., 2017)  64.90 43.34 47.56 18.18

Ablation study of our proposed bilevel augmentation. Test accuracy(%) on seen domain (CIFAR-FS) and unseen domains (Mini-ImageNet, Flower, Cars) of 5-way 5-shot task. Robustness is calculated with PGD-20 attack(ϵ = 8./255., step size=ϵ/10), clean stands for accuracy of clean images. All models are adversarially meta-trained on CIFAR-FS with attack step 3 due to computation costs.

Ablation study of our proposed bilevel attack. Test accuracy(%) on seen domain (CIFAR-FS) and unseen domains (Mini-ImageNet, Tiered-ImageNet, Flower, Cars) of 5-way 5-shot task. Clean stands for accuracy of clean images. Rob. stands for robust accuracy that is calculated with PGD-20 attack(ϵ = 8./255.). All models are adversarially meta-trained on CIFAR-FS with attack step 3 due to computation costs.

Experiments results in robust full-finetuning of TROBA and the state-of-the-art adversarial selfsupervised learning (SSL) models. While TROBA is trained on CIFAR-FS, other models are trained on the CIFAR-100. TROBA is pre-trained with bilevel attacks with 3 steps due to computational overhead, others are pre-traiend with PGD-7 attacks. All models are trained on ResNet18. We evaluate all models with PGD-20 steps and AutoAttack (AA)(Croce & Hein, 2020) with ϵ=8/255.

. All l ∞ PGD attacks are conducted with the norm ball size ϵ = 8./255., step size α = 8./2550., and with 20 steps of inner maximization. AutoAttack 2 is a combination of 4 different types of attacks (i.e., APGD-CE, APGD-T, FAB-T, and Square). We use the standard version of AutoAttack in the test time.SELF-SUPERVISED ROBUST LINEAR EVALUATIONTo compare TROBA with self-supervised pre-trained models, we apply robust full-finetuning. In robust full-finetuning, the parameters of the entire network, including the feature extractor and the fc layer, are trained with adversarial examples. We generate perturbed examples with l ∞ PGD-10 attack with ϵ = 8./255. and step size α = 2./255. in training. All adversarially full-finetuned models are evaluated against l ∞ PGD-20 attack (ϵ = 8./255., α = 8./2550.) and AutoAttack

Ablation study on adversarial loss in meta-objectives of TROBA. Test accuracy(%) on benchmark data sets for 5-shots. Robustness is calculated with PGD-20 attack (ϵ = 8./255., step size=ϵ/10), clean is for clean images. All models are adversarially meta-trained on CIFAR-FS, with ResNet18 as the base encoder. Clean PGD ℓ ∞ Clean PGD ℓ ∞ Clean PGD ℓ ∞ Clean PGD ℓ ∞

REPRODUCIBILITY

• Datasets. We use CIFAR-FS, Mini-ImageNet, Tiered-ImageNet, CUB, Flower, and Cars for few-shot learning tasks. Further, we use CIFAR-10, CIFAR-100, STL-10 for standard image classification tasks in transfer learning. More details are in Supplementary B.1. • Meta-train. We trained our models on CIFAR-FS and Mini-ImageNet with ResNet12 and ResNet18 as the base encoder. All models are trained on 5-way 5-shot support set images and 5-way 15-shot query images. More details are in Supplementary B.2. • Meta-test. We evaluate our models on few-shot learning tasks as described in Supplementary B.4. For adversarial setting for few-shot robustness is written in Supplementary B.5. while not done do Sample tasks {τ }, Support set S(x s , y s ), Query set Q(x q , y q )Fine-tune model with t 1 (x s ), t 2 (x s ), y s and updates parameter θ τ Generate adversarial examples t 1 (x q ) adv = A(t 1 (x q ), t 2 (x q ), θ τ ), t 2 (x q ) adv = A(t 2 (x q ), t 1 (x q ), θ τ ) Calculate losses on query set imagesTo demonstrate how bilevel parameter augmentation is more effective than image augmentation in adversarial self-supervised meta-learning, we experiment in the same environment except for parameter augmentation in inner adaptation. Specifically, we generate augmented parameters of the encoder adapted with two differently transformed support set images simultaneously, while TROBA augments parameters independently for each augmented view. A detailed algorithm for applying image-only augmentation in adversarial self-supervised meta-learning is described in Algorithm 2. Experiment results are reported in Section 4.2.

C.2 ABLATION STUDY OF BILEVEL ATTACK

The bilevel attack is based on the instance-wise attack (Kim et al., 2020) which does not require label information to generate adversaries, while the class-wise attack utilizes label to maximize the cross-entropy loss in the inner maximization of Equation 3. The bilevel class-wise attack is applied with the bilevel augmented parameters as done in the bilevel attack. We use l ∞ PGD attack with strength 8./255., step size 2./255., and the same number of iterations with bilevel attacks in all comparisons in the main paper.

C.3 CKA ANALYSIS

We calculate CKA to demonstrate that bilevel attack constructs more confusing perturbed examples than class-wise attack, which helps in obtaining robust transferability. Specifically, we randomly selected 600 samples for the same tasks in charge of 10% for the entire dataset of CIFAR-FS that has 6,000 images per class. With randomly sampled tasks and the same pre-trained models, the parameter of the encoder is adapted with two differently transformed support set images as done in meta-training (i.e., bilevel parameter augmentation). Here, we generate adversaries with class-wise attack and bilevel attack independently with adapted encoders for each multiple view. Then we calculate CKA between the features of the clean and adversarial query set images with the same view, and average the CKA values over selected tasks. Results are reported in Section 4.2. 

Model

Accuracy ACL (Jiang et al., 2020) 68.6BYORL (Gowal et al., 2020) 69.01AQ (Goldblum et al., 2020) 66.16 TROBA 67.9In the main paper, we validate our models on unseen domains with larger benchmark datasets for standard image classification, which are CIFAR-10 and STL-10. Furthermore, we also demonstrate the robust transferability of our models in benchmark few-shot image classification tasks, which are Cars, CUB, and Aircraft that have 196, 200, and 100 classes, respectively. Especially, we train our models on ResNet18 with bilevel attacks with 3 steps while other self-supervised models are trained with PGD-7 attacks due to computation costs. We use the same hyperparameters to validate with robust full-finetuning for all datasets, as we explained in Appendix B.5. Although our models utilize only scarce data to train, and even apply bilevel attacks with fewer gradient steps, we show even better robust representations compared to self-supervised pre-trained models while preserving clean accuracy (Table 10 ). Especially, our methods show a larger gap in fine-grained datasets, which have highly different distribution from meta-trained domains (i.e., CIFAR-FS). Further, we hope that our models to be robust in real-world adversarial perturbation such as common corruption (Hendrycks & Dietterich, 2019) , we evaluate our fully finetuned models with adversarial examples on CIFAR-10, with common corruption datasets on CIFAR-10. TROBA shows comparable accuracy with self-supervised pre-trained models on common corruption tasks, even trained with little data and bilevel attacks with fewer inner maximization iterations (Table 11 ). From these results, we prove that TROBA learns good generalized representations with little data effectively.

E EFFECT OF SELF-SUPERVISED CONCEPT IN TROBA E.1 SELF-SUPERVISED LOSS

Existing adversarial meta-learning works are vulnerable to unseen domains, since the regularization to learn good representations themselves is insufficient but normally focuses on optimization to rapidly adapt to new tasks. To learn good representations, we leverage self-supervised learning in adversarial meta-learning. Specifically, we enforce the consistent features between bilevel attacked images by using cosine similarity loss as a regularization. In robust full-finetuning, where larger data is used to finetune the model parameters, we observe that the model trained with self-supervised loss mostly shows better clean accuracy and robust accuracy on unseen domains (Table 12 ). Especially, self-supervised loss helps in obtaining generalized features more as the distribution of the target domains is more different from the distribution of the meta-trained dataset.

E.2 TROBA ON TOP OF THE PRE-TRAINED SELF-SUPERVISED MODEL

We further verify whether our methods can support self-supervised pre-trained models to obtain transferable robustness in few-shot tasks on unseen domains. Especially, we utilize ROCL (Kim et al., 2020) pre-trained on CIFAR-100, and further meta-train with TROBA on top of that model. As shown in Table 13 , since self-supervised models learn representations that are color-invariant where colorjitter is normally used in data augmentation, TROBA is less effective in obtaining robust representations in the fine-grained datasets (i.e., CUB, Flowers, Cars) that are easily affected by colors. On the other hand, the meta-trained self-supervised pre-trained model achieves even better clean accuracy and robustness than TROBA, especially in general datasets (i.e, Mini-ImageNet, Tiered-ImageNet), where TROBA helps in generalization on few-shot tasks on general domains.

F OBFUSCATED GRADIENT

All of the robust accuracies in our paper are calculated with the strength ϵ = 8./255 accuracy should be the same as robust accuracy from our original evaluation setting. Specifically, we demonstrate TROBA trained on CIFAR-FS with ResNet12 as the base encoder, and further on top of the FOMAML reported in Table 9 . As shown in Table 14 , we verify that our models do not have any obfuscated gradient issues.

G VISUALIZATION OF LOSS SURFACE

We visualize the loss surface of our model and baseline AQ (Goldblum et al., 2020) 

