OPTIMAL ALLOCATION OF DATA ACROSS TRAINING TASKS IN META-LEARNING

Abstract

Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are tested on benchmarks with a fixed number of data-points for each training task, and this number is usually arbitrary, for example, 5 instances per class in few-shot classification. It is unknown how the performance of meta-learning is affected by the distribution of data across training tasks. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget b of labels to distribute across tasks, should we use a small number of highly labelled tasks, or many tasks with few labels each? In MAML applied to mixed linear regression, we prove that the optimal number of tasks follows the scaling law √ b. We develop an online algorithm for data allocation across tasks, and show that the same scaling law applies to nonlinear regression. We also show preliminary experiments on few-shot image classification. Our work provides a theoretical guide for allocating labels across tasks in meta-learning, which we believe will prove useful in a large number of applications.

1. INTRODUCTION

Deep learning (DL) models require a large amount of data in order to perform well, when trained from scratch, but labeling data is expensive and time consuming. An effective approach to avoid the costs of collecting and labeling large amount of data is transfer learning: train a model on one big dataset, or a few related datasets that are already available, and then fine-tune the model on the target dataset, which can be of much smaller size (Donahue et al. (2014) ). In this context, there has been a recent surge of interest in the field of meta-learning, which is inspired by the ability of humans to learn how to learn Hospedales et al. (2020) . A model is meta-trained on a large number of tasks, each characterized by a small dataset, and meta-tested on the target dataset. The number of data points per task is usually set to an arbitrary number in standard meta-learning benchmarks. For example, in few-shot image classification benchmarks, such as mini-ImageNet (Vinyals et al. ( 2017), Ravi & Larochelle ( 2017)) and CIFAR-FS (Bertinetto et al. ( 2019)), this number is usually set to 1 or 5. So far, there has not been any reason to optimize this number, as in most circumstances the performance of a model will improve with the number of data points (see Nakkiran et al. (2019) for exceptions). However, if the total number of labels across training tasks is limited, is it better to have a large number of tasks with very small data in each, or a relatively smaller number of highly labelled tasks? Since data-labeling is costly, the answer to this question may inform the design of new meta-learning datasets and benchmarks. In this work, to our knowledge, we answer this question for the first time, for a specific meta-learning algorithm: MAML (Finn et al. (2017) ). We study the problem of optimizing the number of metatraining tasks, with a fixed budget b of total data-points to distribute across tasks. We study the application of MAML to three datasets: mixed linear regression, sinusoid regression, and CIFAR. In the case of mixed linear regression, we derive an approximation for the meta-test loss, and according to which the optimal number of tasks follows the scaling rule √ b. In order to optimize the number of tasks empirically, we design an algorithm for online allocation of data across training tasks, and we validate the algorithm by performing a grid search over a large set of possible allocations. In summary, our contributions are:

