BOIL: TOWARDS REPRESENTATION CHANGE FOR FEW-SHOT LEARNING

Abstract

Model Agnostic Meta-Learning (MAML) is one of the most representative of gradient-based meta-learning algorithms. MAML learns new tasks with a few data samples using inner updates from a meta-initialization point and learns the meta-initialization parameters with outer updates. It has recently been hypothesized that representation reuse, which makes little change in efficient representations, is the dominant factor in the performance of the meta-initialized model through MAML in contrast to representation change, which causes a significant change in representations. In this study, we investigate the necessity of representation change for the ultimate goal of few-shot learning, which is solving domain-agnostic tasks. To this aim, we propose a novel meta-learning algorithm, called BOIL (Body Only update in Inner Loop), which updates only the body (extractor) of the model and freezes the head (classifier) during inner loop updates. BOIL leverages representation change rather than representation reuse. This is because feature vectors (representations) have to move quickly to their corresponding frozen head vectors. We visualize this property using cosine similarity, CKA, and empirical results without the head. BOIL empirically shows significant performance improvement over MAML, particularly on cross-domain tasks. The results imply that representation change in gradient-based meta-learning approaches is a critical component.

1. INTRODUCTION

Meta-learning, also known as "learning to learn," is a methodology that imitates human intelligence that can adapt quickly with even a small amount of previously unseen data through the use of previous learning experiences. To this aim, meta-learning with deep neural networks has mainly been studied using metric-and gradient-based approaches. Metric-based meta-learning (Koch, 2015; Vinyals et al., 2016; Snell et al., 2017; Sung et al., 2018) compares the distance between feature embeddings using models as a mapping function of data into an embedding space, whereas gradient-based meta-learning (Ravi & Larochelle, 2016; Finn et al., 2017; Nichol et al., 2018) quickly learns the parameters to be optimized when the models encounter new tasks. Model-agnostic meta-learning (MAML) (Finn et al., 2017) is the most representative gradient-based meta-learning algorithm. MAML algorithm consists of two optimization loops: an inner loop and an outer loop. The inner loop learns task-specific knowledge, and the outer loop finds a universally good meta-initialized parameter allowing the inner loop to quickly learn any task from the initial point with only a few examples. This algorithm has been highly influential in the field of meta-learning, and numerous follow-up studies have been conducted (Oreshkin et al., 2018; Rusu et al., 2018; Zintgraf et al., 2018; Yoon et al., 2018; Finn et al., 2018; Triantafillou et al., 2019; Sun et al., 2019; Na et al., 2019; Tseng et al., 2020) . Very recent studies (Raghu et al., 2020; Arnold et al., 2019) have attributed the success of MAML to high-quality features before the inner updates from the meta-initialized parameters. For instance, Raghu et al. (2020) claimed that MAML learns new tasks by updating the head (the last fully connected layer) with almost the same features (the output of the penultimate layer) from the metainitialized network. In this paper, we categorize the learning patterns as follows: A small change in the representations during task learning is named representation reuse, whereas a large change is named representation change.foot_0 Thus, representation reuse was the common belief of MAML.



In our paper, representation reuse and representation change correspond to feature reuse and rapid learning in(Raghu et al., 2020), respectively. To prevent confusion from terminology, we re-express the terms.1

