TOWARDS EFFICIENT GRADIENT-BASED META-LEARNING IN HETEROGENOUS ENVIRONMENTS Anonymous authors Paper under double-blind review

Abstract

A challenging problem for machine learning is few-shot learning, as traditionally, models trained with SGD require many training samples to converge. Since metalearning models have strong fine-tuning capabilities for the distribution of tasks, many of them have been applied to few-shot learning. Model-agnostic metalearning (MAML) is one of the most popular ones. Recent studies showed that MAML-trained models tend to reuse learned features and do not perform strong adaption, especially in the earlier layers. This paper presents an in-detail analysis of this phenomenon by analyzing MAML's components for different variants. Our results show an interesting relationship between the importance of fine-tuning earlier layers and the difference in the distribution between training and testing. As a result, we determine a fundamental weakness of existing MAML variants when the task distribution is heterogeneous, e.g., the numbers of classes or the domain do not match during testing and training. We propose a novel nonparametric version of MAML that overcomes these issues while still being able to perform cross-domain adaption.

1. INTRODUCTION

Learning tasks from only a few observations is known as few-shot learning and of major interest in the machine learning community (Finn et al., 2017; Vinyals et al., 2016; Snell et al., 2017; Cai & Shen, 2020; Tseng et al., 2020) . Usually, a problem is solved by minimizing the empirical risk with many training samples in many iterations. However, humans learn new tasks very quickly by using knowledge obtained in their lives a priori (Salakhutdinov et al., 2012) . Meta-learning is motivated by how humans learn, where the goal is learning how to learn and a common approach to solve few-shot learning problems due to its ability to efficiently leverage information from many tasks. Model-Agnostic Meta-Learning (MAML) has been one of the most successful meta-learning algorithms for few-shot learning in recent years (Finn et al., 2017) . In MAML, the network is metaoptimized for fast gradient-descent based fine-tuning on an unseen task. Its formulation of the meta-learning objective inspired a plethora of research (Yoon et al., 2018; Li et al., 2017; Vuorio et al., 2019; Finn et al., 2018) , to the extent that MAML exists both as a concrete meta-learning algorithm but also as a paradigm that influences meta-learning methods to this day. Previous work has discussed whether MAML actually allows rapid fine-tuning or simply leverages its meta-representations effectively (called feature reuse). Raghu et al. (2020) found out that freezing the earlier layers of a network during fine-tuning improves the performance, meaning that fine-tuning of the network body is not the major factor contributing to its few-shot capabilities, indicating feature reuse. Oh et al. (2021) discovered that in the case of cross-domain adaptation, a change in the earlier layers is beneficial and proposed to fix the network head instead to enforce earlier weight change, a method they call body only inner loop (BOIL). However, as we will argue in Section 3, its fixed final layer is impractical when the numbers of classes differ across tasks, which is a considerable limitation in real-world scenarios. In this paper, we develop a novel technique called NP-MAML, which has a nonparametric head but is still trainable via gradients. Similar to BOIL, NP-MAML enforces changes in earlier layers to solve cross-domain tasks. In addition, it is flexible to the heterogeneous task distribution. We

