REPURPOSING PRETRAINED MODELS FOR ROBUST OUT-OF-DOMAIN FEW-SHOT LEARNING

Abstract

Model-agnostic meta-learning (MAML) is a popular method for few-shot learning but assumes that we have access to the meta-training set. In practice, training on the meta-training set may not always be an option due to data privacy concerns, intellectual property issues, or merely lack of computing resources. In this paper, we consider the novel problem of repurposing pretrained MAML checkpoints to solve new few-shot classification tasks. Because of the potential distribution mismatch, the original MAML steps may no longer be optimal. Therefore we propose an alternative meta-testing procedure and combine MAML gradient steps with adversarial training and uncertainty-based stepsize adaptation. Our method outperforms "vanilla" MAML on same-domain and cross-domains benchmarks using both SGD and Adam optimizers and shows improved robustness to the choice of base stepsize.

1. INTRODUCTION

Deep learning approaches have shown improvements based on massive datasets and enormous computing resources. Despite their success, it is still challenging to apply state-of-the-art methods in the real world. For example, in semiconductor manufacturing (Nishi & Doering, 2000) , collecting each new data point is costly and time consuming because it requires setting up a new manufacturing process accordingly. Moreover, in the case of a "destructive inspection", the cost is very high because the wafer must be destroyed for measurement. Therefore, learning from small amounts of data is important for practical purposes. Meta-learning (learning-to-learn) approaches have been proposed for learning under limited data constraints. A meta-learning model optimizes its parameters for the best performance on the distribution of tasks. In particular, few-shot learning (FSL) formulates "learning from limited data" as an n-way k-shot problem, where n is the number of classes and k is the number of labeled samples per class. For each task in FSL, a support set is provided for training, while a query set is provided for evaluation. Ideally, a meta-learning model trained over a set of tasks (meta-training) will generalize well to new tasks (meta-testing). Model-agnostic meta-learning (MAML) (Finn et al., 2017) is a general end-to-end approach for solving few-shot learning tasks. MAML is trained on the meta-training tasks to learn a model initialization (also known as checkpoint) such that a few gradient steps on the support set will yield the best predictions on the query set. However, in practice it may not always be possible to retrain or finetune on the meta-training set. This situation may arise when the meta-training data is confidential, subject to restrictive licences, contains private user information, or protected intellectual property such as semiconductor manu-facturing know-how. Another reason is that one may not have the computing resources necessary for running large-scale meta-training. In this paper, we consider the novel problem of repurposing MAML checkpoints to solve new fewshot classification tasks, without the option of (re)training on the meta-training set. Since the metatesting set (new tasks) may differ in distribution from the meta-training set, the implicit assumption -in end-to-end learning-of identically distributed tasks may not hold, so there is no reason why the meta-testing gradient steps should match the meta-training. Therefore, we investigate various improvements over the default MAML gradient steps for test time adaptation. Conceptually, our approach consists of collecting information while training the model on a new support set and then proposing ways to use this information to improve the adaptation. In this paper, we consider the variance of model parameters during ensemble training as a source of information to use. We propose algorithms that uses this information both to adapt the stepsizes for MAML as well as to generate "task-specific" adversarial examples to help robust adaptation to the new task. Our main contributions are the following: • We motivate the novel problem of repurposing MAML checkpoints to solve cross-domain fewshot classification tasks, in the case where the meta-training set is inaccessible, and propose a method based on uncertainty-based stepsize adaptation and adversarial data augmentation, which has the particularity that meta-testing differs from meta-training steps. • Compared to "vanilla" MAML, our method shows improved accuracy and robustness to the choice of base stepsizes on popular cross-domain and same-domain benchmarks, using both the SGD and Adam (Kingma & Ba, 2014) optimizers. Our results also indicate that adversarial training (AT) is helpful in improving the model performance during meta-testing. To the best of our knowledge, our work is the first few-shot learning method to combine the use of ensemble methods for stepsize computation and generating adversarial examples from the metatesting support set for improved robustness. Moreover, our empirical observation of improving over the default meta-testing procedure of MAML motivates further research on alternative ways to leverage published model checkpoints.

2. RELATED WORK

2.1 META-LEARNING AND MODEL-AGNOSTIC META-LEARNING FSL approaches deal with an extremely small amount of training data and can be classified into three categories. First, metric-based approaches solve few-shot tasks by training a feature extractor that maximizes inter-class similarity and intra-class dissimilarity (Vinyals et al., 2016; Snell et al., 2017; Sung et al., 2018) . Second, memory-based approaches utilize previous tasks for new tasks with external memories (Santoro et al., 2016; Mishra et al., 2018) . Third, optimization-based approaches search for good initial parameters during training and adapt the pretrained model for new tasks (Finn et al., 2017; Lee & Choi, 2018; Grant et al., 2018) . We focus on the optimization-based approaches and suggest better training especially for the general family of MAML methods.

2.2. UNCERTAINTY FOR MODEL TRAINING

Uncertainty is an important criterion for measuring the robustness of a neural network (NN). Bayesian neural networks (BNN) (Blundell et al., 2015) obtain model uncertainty by placing prior distributions over the weights p(ω). This uncertainty has been used to adapt the stepsizes during continual learning in Uncertainty-guided Continual BNNs (UCB) (Ebrahimi et al., 2020) . For each parameter, UCB scales its stepsize inversely proportional to the uncertainty of the parameter in the BNN to reduce changes in important parameters while allowing less important parameters to be modified faster in favor of learning new tasks. Our approach also decreases the stepsizes for "uncertain" parameters, but using a different notion of uncertainty, and instead in the context of FSL with a pretrained MAML checkpoint. Unlike BNNs, deep ensembles (Lakshminarayanan et al., 2017) estimate uncertainty of NNs using ensembles over randomly initialized parameters combined with adversarial training (AT). Deep ensembles have been successfully used to boost predictive performance and AT has been used to im-

