THE CURSE OF LOW TASK DIVERSITY: ON THE FAILURE OF TRANSFER LEARNING TO OUTPERFORM MAML AND THEIR EMPIRICAL EQUIVALENCE Anonymous

Abstract

Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by 1. proposing a novel metric -the diversity coefficient -to measure the diversity of tasks in a few-shot learning benchmark and 2. by comparing Model-Agnostic Meta-Learning (MAML) and transfer learning under fair conditions (same architecture, same optimizer, and all models trained to convergence). Using the diversity coefficient, we show that the popular Mini-ImageNet and CIFAR-FS few-shot learning benchmarks have low diversity. This novel insight contextualizes claims that transfer learning solutions are better than meta-learned solutions in the regime of low diversity under a fair comparison. Specifically, we empirically find that a low diversity coefficient correlates with a high similarity between transfer learning and MAML learned solutions in terms of accuracy at meta-test time and classification layer similarity (using feature based distance metrics like SVCCA, PWCCA, CKA, and OPD). To further support our claim, we find this meta-test accuracy holds even as the model size changes. Therefore, we conclude that in the low diversity regime, MAML and transfer learning have equivalent meta-test performance when both are compared fairly. We also hope our work inspires more thoughtful constructions and quantitative evaluations of meta-learning benchmarks in the future.

1. INTRODUCTION

The success of deep learning in computer vision (Krizhevsky et al., 2012; He et al., 2015) , natural language processing (Devlin et al., 2018; Brown et al., 2020 ), game playing (Silver et al., 2016; Mnih et al., 2013; Ye et al., 2021) , and more keeps motivating a growing body of applications of deep learning on an increasingly wide variety of domains. In particular, deep learning is now routinely applied to few-shot learning -a research challenge that assesses a model's ability to learn to adapt to new tasks, new distributions, or new environments. This has been the main research area where meta-learning algorithms have been applied -since such a strategy seems promising in a small data regime due to its potential to learn to learn or learn to adapt. However, it was recently shown (Tian et al., 2020) that a transfer learning model with a fixed embedding can match and outperform many modern sophisticated meta-learning algorithms on numerous few-shot learning benchmarks (Chen et al., 2019; 2020; Dhillon et al., 2019; Huang and Tao, 2019) . This growing body of evidence -coupled with these surprising results in meta-learning -raises the question if researchers are applying meta-learning with the right inductive biases (Mitchell, 1980; Shai Shalev-Shwartz, 2014) and designing appropriate benchmarks for meta-learning. Our evidence suggests this is not the case. Our work is motivated by the inductive bias that when the diversity of tasks in a benchmark is low then a meta-learning solution should provide no advantage to a minimally meta-learned algorithme.g. like only fine-tuning the final layer. Therefore in this work, we quantitatively show that when the task diversity -a novel measure of variability across tasks -is low, then MAML (Model-Agnostic Meta-Learning) (Finn et al., 2017) learned solutions have the same accuracy as transfer learning (i.e., a supervised learned model with a fine-tuned final linear layer). We want to emphasize the

