VARIABLE-SHOT ADAPTATION FOR ONLINE META-LEARNING

Abstract

Few-shot meta-learning methods consider the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks. However, in many real world settings, it is more natural to view the problem as one of minimizing the total amount of supervision -both the number of examples needed to learn a new task and the amount of data needed for metalearning. Such a formulation can be studied in a sequential learning setting, where tasks are presented in sequence. When studying meta-learning in this online setting, a critical question arises: can meta-learning improve over the sample complexity and regret of standard empirical risk minimization methods, when considering both meta-training and adaptation together? The answer is particularly non-obvious for meta-learning algorithms with complex bi-level optimizations that may demand large amounts of meta-training data. To answer this question, we extend previous meta-learning algorithms to handle the variable-shot settings that naturally arise in sequential learning: from many-shot learning at the start, to zero-shot learning towards the end. On sequential learning problems, we find that meta-learning solves the full task set with fewer overall labels and achieves greater cumulative performance, compared to standard supervised methods. These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.

1. INTRODUCTION

Standard machine learning methods typically consider a static training set, with a discrete training phase and test phase. However, in the real world, this process is almost always cyclical: machine learning systems might be improved with the acquisition of new data, repurposed for new tasks via finetuning, or might simply need to be adjusted to suit the needs of a changing, non-stationary world. Indeed, the real world is arguably so complex that, for all practical purposes, learning is never truly finished, and any real system in open-world settings will need to improve and finetune perpetually (Chen & Asch, 2017; Zhao et al., 2019) . In this continual learning process, metalearning provides the appealing prospect of accelerating how quickly new tasks can be acquired using past experience, which in principle should make the learning system more and more efficient over the course of its lifetime. However, current meta-learning methods are typically concerned with asymptotic few-shot performance (Finn et al., 2017; Snell et al., 2017) . For a continual learning system of this sort, we instead need a method that can minimize both the number of examples per task, and the number of tasks needed to accelerate the learning process. Few-shot meta-learning algorithms aim to learn the structure that underlies data coming from a set of related tasks, and use this structure to learn new tasks with only a few datapoints. While these algorithms enable efficient learning for new tasks at test time, it is not clear if these efficiency gains persist in online learning settings, where the efficiency of both meta-training and few-shot adaptation is critical. Indeed, simply training a model on all received data, i.e. standard supervised learning with empirical risk minimization, is a strong competitor since supervised learning methods are known to generalize well to in-distribution tasks in a zero-shot manner. Moreover, it's not clear that meta-learning algorithms can improve over such methods by leveraging shared task structure in online learning settings. Provided that it is possible for a single model to fully master all of the tasks with enough data, both meta-learning and standard empirical risk minimization approaches should produce a model of equal competence. However, the key hypothesis of this work is that meta-learned models will become more accurate more quickly in the middle of the online learning process, while data is still being collected, resulting in lower overall regret in realistic problem settings. To test this hypothesis, we consider a practical online learning problem setting, which we refer to as online incremental learning, where the algorithm must learn a sequence of tasks, and datapoints from each task are received sequentially. Once a model reaches a certain level of proficiency, the algorithm may move on to training the model on the next task. This problem is crucial to solve in the real world, especially in settings where online data collection and supervision signals are costly to obtain. One example of such a problem definition is a setting where a company receives requests for object classifiers, sequentially at different points in time. Data collection and labels are expensive, and the company wants to spend the least amount of money on acquiring a good classifier for each request. A major challenge for meta-learning that arises in this problem setting is to design a meta-learning algorithm that can generalize with variable shots: As data from new tasks is incrementally introduced, at any given point in time, the model may have access to zero, a few, or many datapoints for a provided task. The goal of this work is to achieve variable-shot adaptation while minimizing the total amount of supervision in terms of number of shots required for each added task. The desired online incremental meta-learning algorithm is expected to generalize to a new task with decreasing number of shots over the course of training. We visualize our problem setting in Figure 1 . The key contributions of this work are (a) a new meta-learning algorithm that can adapt to variable amounts of data, and (b) an online version of this algorithm that addresses the above problem setting of online incremental learning. We theoretically derive our variable-shot meta-learning algorithm and combine it with deep neural networks for effective online learning on challenging sequential problem settings. Perhaps surprisingly, we find that our approach can outperform empirical risk minimization and a previous online meta-learning method (Finn et al., 2019) on two online image classification problems consisting of sequences of classification tasks and one online regression problem. Further, we find that, in the offline setting, our approach performs comparably to previous state-of-the-art algorithms in few-shot learning, and provides considerable gains in the variable-shot setting.

2. PRELIMINARIES

Meta-learning algorithms optimize for efficient adaptation to new tasks. To define the meta-learning problem, let p(T ) denote a task distribution, where each task T i ∼ p(T ) consists of a dataset D i := {x, y} with i.i.d. input and output pairs. If we have a predictive model h(x; θ) with some parameter θ and a loss function , such as the cross entropy between the predicted label distribution and the true label distribution in a classification problem, the risk of T i , f i , can be computed as f i (θ) = E (x,y)∼Di [ (h(x; θ), y)] . At meta-training time, N tasks {T i } N i=1 are sampled from p(T ) and meta-learning algorithms aim to learn how to quickly learn these tasks such that, at meta-test time, the population risk f j (θ) of the unseen tasks T j ∼ p(T ) is minimized quickly. In this work, we build on the model-agnostic meta-learning (MAML) algorithm (Finn et al., 2017) . MAML achieves fast adaptation to new tasks by optimizing a set of initial parameters θ MAML that can be quickly adapted to the meta-training tasks {T i } N i=1 . Thus, at meta-test time, after a small number of gradient steps on θ MAML with K datapoints from D j , the model can minimize f j for the new task T j . Note that K is a small and fixed number across all tasks. Formally, MAML achieves such an initialization by optimizing the following objective: min θMAML 1 N N i=1 f i (U i (θ MAML , α, K i )) where



Figure 1: Online incremental meta-learning. We visualize our online incremental meta-learning problem setting in the above figure using the Incremental Pose Prediction dataset discussed in Section 7. At round t, the current task is to predict the pose of a sofa with a small training set containing several datapoints of the sofa with different orientations. After one epoch of training, we evaluate the few-shot generalization performance on a test datapoint where the number of shots equals the size the task's training set. If the test performance exceeds some proficiency threshold C, we advance to the next task, i.e. predicting the pose of the airplane. Otherwise, we add another training example of the sofa to the training set and repeat the above process.

