REPURPOSING PRETRAINED MODELS FOR ROBUST OUT-OF-DOMAIN FEW-SHOT LEARNING

Abstract

Model-agnostic meta-learning (MAML) is a popular method for few-shot learning but assumes that we have access to the meta-training set. In practice, training on the meta-training set may not always be an option due to data privacy concerns, intellectual property issues, or merely lack of computing resources. In this paper, we consider the novel problem of repurposing pretrained MAML checkpoints to solve new few-shot classification tasks. Because of the potential distribution mismatch, the original MAML steps may no longer be optimal. Therefore we propose an alternative meta-testing procedure and combine MAML gradient steps with adversarial training and uncertainty-based stepsize adaptation. Our method outperforms "vanilla" MAML on same-domain and cross-domains benchmarks using both SGD and Adam optimizers and shows improved robustness to the choice of base stepsize.

1. INTRODUCTION

Deep learning approaches have shown improvements based on massive datasets and enormous computing resources. Despite their success, it is still challenging to apply state-of-the-art methods in the real world. For example, in semiconductor manufacturing (Nishi & Doering, 2000) , collecting each new data point is costly and time consuming because it requires setting up a new manufacturing process accordingly. Moreover, in the case of a "destructive inspection", the cost is very high because the wafer must be destroyed for measurement. Therefore, learning from small amounts of data is important for practical purposes. Meta-learning (learning-to-learn) approaches have been proposed for learning under limited data constraints. A meta-learning model optimizes its parameters for the best performance on the distribution of tasks. In particular, few-shot learning (FSL) formulates "learning from limited data" as an n-way k-shot problem, where n is the number of classes and k is the number of labeled samples per class. For each task in FSL, a support set is provided for training, while a query set is provided for evaluation. Ideally, a meta-learning model trained over a set of tasks (meta-training) will generalize well to new tasks (meta-testing). Model-agnostic meta-learning (MAML) (Finn et al., 2017) is a general end-to-end approach for solving few-shot learning tasks. MAML is trained on the meta-training tasks to learn a model initialization (also known as checkpoint) such that a few gradient steps on the support set will yield the best predictions on the query set. However, in practice it may not always be possible to retrain or finetune on the meta-training set. This situation may arise when the meta-training data is confidential, subject to restrictive licences, contains private user information, or protected intellectual property such as semiconductor manu-

