BOIL: TOWARDS REPRESENTATION CHANGE FOR FEW-SHOT LEARNING

Abstract

Model Agnostic Meta-Learning (MAML) is one of the most representative of gradient-based meta-learning algorithms. MAML learns new tasks with a few data samples using inner updates from a meta-initialization point and learns the meta-initialization parameters with outer updates. It has recently been hypothesized that representation reuse, which makes little change in efficient representations, is the dominant factor in the performance of the meta-initialized model through MAML in contrast to representation change, which causes a significant change in representations. In this study, we investigate the necessity of representation change for the ultimate goal of few-shot learning, which is solving domain-agnostic tasks. To this aim, we propose a novel meta-learning algorithm, called BOIL (Body Only update in Inner Loop), which updates only the body (extractor) of the model and freezes the head (classifier) during inner loop updates. BOIL leverages representation change rather than representation reuse. This is because feature vectors (representations) have to move quickly to their corresponding frozen head vectors. We visualize this property using cosine similarity, CKA, and empirical results without the head. BOIL empirically shows significant performance improvement over MAML, particularly on cross-domain tasks. The results imply that representation change in gradient-based meta-learning approaches is a critical component.

1. INTRODUCTION

Meta-learning, also known as "learning to learn," is a methodology that imitates human intelligence that can adapt quickly with even a small amount of previously unseen data through the use of previous learning experiences. To this aim, meta-learning with deep neural networks has mainly been studied using metric-and gradient-based approaches. Metric-based meta-learning (Koch, 2015; Vinyals et al., 2016; Snell et al., 2017; Sung et al., 2018) compares the distance between feature embeddings using models as a mapping function of data into an embedding space, whereas gradient-based meta-learning (Ravi & Larochelle, 2016; Finn et al., 2017; Nichol et al., 2018) quickly learns the parameters to be optimized when the models encounter new tasks. Model-agnostic meta-learning (MAML) (Finn et al., 2017) is the most representative gradient-based meta-learning algorithm. MAML algorithm consists of two optimization loops: an inner loop and an outer loop. The inner loop learns task-specific knowledge, and the outer loop finds a universally good meta-initialized parameter allowing the inner loop to quickly learn any task from the initial point with only a few examples. This algorithm has been highly influential in the field of meta-learning, and numerous follow-up studies have been conducted (Oreshkin et al., 2018; Rusu et al., 2018; Zintgraf et al., 2018; Yoon et al., 2018; Finn et al., 2018; Triantafillou et al., 2019; Sun et al., 2019; Na et al., 2019; Tseng et al., 2020) . Very recent studies (Raghu et al., 2020; Arnold et al., 2019) have attributed the success of MAML to high-quality features before the inner updates from the meta-initialized parameters. For instance, Raghu et al. ( 2020) claimed that MAML learns new tasks by updating the head (the last fully connected layer) with almost the same features (the output of the penultimate layer) from the metainitialized network. In this paper, we categorize the learning patterns as follows: A small change in the representations during task learning is named representation reuse, whereas a large change is named representation change.foot_0 Thus, representation reuse was the common belief of MAML. Herein, we pose an intriguing question: Is representation reuse sufficient for meta-learning? We believe that the key to successful meta-learning is closer to representation change than to representation reuse. More importantly, representation change is crucial for cross-domain adaptation, which is considered the ultimate goal of meta-learning. By contrast, the MAML accomplished with representation reuse might be poorly trained for cross-domain adaptation since the success of representation reuse might rely heavily on the similarity between the source and the target domains. To answer this question, we propose a novel meta-learning algorithm that leverages representation change. Our contributions can be summarized as follows: • We emphasize the necessity of representation change for meta-learning through crossdomain adaptation experiments. • We propose a simple but effective meta-learning algorithm that learns the Body (extractor) of the model Only in the Inner Loop (BOIL). We empirically show that BOIL improves the performance over most of benchmark data sets and that this improvement is particularly noticeable in fine-grained data sets or cross-domain adaptation. • We interpret the connection between BOIL and the algorithm using preconditioning gradients (Flennerhag et al., 2020) and show their compatibility, improving performance. • We demonstrate that the BOIL algorithm enjoys representation layer reuse on the low-/midlevel body and representation layer change on the high-level body using the cosine similarity and the Centered Kernel Alignment (CKA). We visualize the features between before and after an adaptation, and empirically analyze the effectiveness of the body of BOIL through an ablation study on eliminating the head. • For ResNet architectures, we propose a disconnection trick that removes the backpropagation path of the last skip connection. The disconnection trick strengthens representation layer change on the high-level body.

2. PROBLEM SETTING

2.1 META-LEARNING FRAMEWORK (MAML) The MAML algorithm (Finn et al., 2017) attempts to meta-learn the best initialization of the parameters for a task-learner. It consists of two main optimization loops: an inner loop and an outer loop. First, we sample a batch of tasks within a data set distribution. Each task τ i consists of a support set S τi and a query set Q τi . When we sample a support set for each task, we first sample n labels from the label set and then sample k instances for each label. Thus, each support set contains n × k instances. For a query set, we sample instances from the same labels with the support set. With these tasks, the MAML algorithm conducts both meta-training and meta-testing. During metatraining, we first sample a meta-batch consisting of B tasks from the meta-training data set. In the



In our paper, representation reuse and representation change correspond to feature reuse and rapid learning in(Raghu et al., 2020), respectively. To prevent confusion from terminology, we re-express the terms.



(a) MAML/ANIL. (b) BOIL.

Figure 1: Difference in task-specific (inner) updates between MAML/ANIL and BOIL. In the figure, the lines represent the decision boundaries defined by the head (classifier) of the network. Different shapes and colors mean different classes. (a) MAML mainly updates the head with a negligible change in body (extractor); hence, representations on the feature space are almost identical. ANIL does not change in the body during inner updates, and they are therefore identical. However, (b) BOIL updates only the body without changing the head during inner updates; hence, representations on the feature space change significantly with the fixed decision boundaries. We visualize the representations from various data sets using UMAP (Uniform Manifold Approximation and Projection for dimension reduction) (McInnes et al., 2018) in Appendix B.

