DEFINING BENCHMARKS FOR CONTINUAL FEW-SHOT LEARNING Anonymous authors Paper under double-blind review

Abstract

In recent years there has been substantial progress in few-shot learning, where a model is trained on a small labeled dataset related to a specific task, and in continual learning, where a model has to retain knowledge acquired on a sequence of datasets. Both of these fields are different abstractions of the same real world scenario, where a learner has to adapt to limited information from different changing sources and be able to generalize in and from each of them. Combining these two paradigms, where a model is trained on several sequential few-shot tasks, and then tested on a validation set stemming from all those tasks, helps by explicitly defining the competing requirements for both efficient integration and continuity. In this paper we propose such a setting, naming it Continual Few-Shot Learning (CFSL). We first define a theoretical framework for CFSL, then we propose a range of flexible benchmarks to unify the evaluation criteria. As part of the benchmark, we introduce a compact variant of ImageNet, called SlimageNet64, which retains all original 1000 classes but only contains 200 instances of each one (a total of 200K data-points) downscaled to 64 × 64 pixels. We provide baselines for the proposed benchmarks using a number of popular few-shot and continual learning methods, exposing previously unknown strengths and weaknesses of those algorithms. The dataloader and dataset will be released with an open-source license.

1. INTRODUCTION

Two capabilities vital for an intelligent agent with finite memory are few-shot learning, the ability to learn from a handful of data-points, and continual learning, the ability to sequentially learn new tasks without forgetting previous ones. Taken individually these two areas have recently seen dramatic improvements mainly due to the introduction of proper benchmark tasks and datasets used to systematically compare different methods (Chen et al., 2019; Lesort et al., 2019a; Parisi et al., 2019) . For the set-to-set few-shot setting (Vinyals et al., 2016) such benchmarks include Omniglot (Lake et al., 2015) , CUB-200 (Welinder et al., 2010 ), Mini-ImageNet (Vinyals et al., 2016) and Tiered-ImageNet (Ren et al., 2018b) . For the single-incremental-task continual setting (Maltoni & Lomonaco, 2019) However, none of those benchmarks is particularly well suited for evaluating the hybrid setting of low-data sequential streams. One of the main reasons behind the scarce consideration of the liaison between the two settings is that these problems have been often treated separately and handled by two distinct communities. Historically the research on continual learning has focused on the problem of avoiding the loss of previous knowledge when new tasks are presented to the learner, known as catastrophic forgetting (McCloskey & Cohen, 1989) , without paying much attention to the low-data regime. On the other hand, the research on few-shot learning has mainly focused on achieving good generalization over new tasks, without caring about possible future knowledge gain or loss. Scarce attention has been given to few-shot learning in the more practical continual learning scenario. In this paper we propose to bridge the gap between the two settings by injecting the sequential component of continual learning into the framework of few-shot learning, calling this new paradigm Continual Few-Shot Learning (CFSL). CFSL can be useful to the research community as a frame-



and the multi-task continual setting (Zenke et al., 2017; Lopez-Paz & Ranzato, 2017) the benchmarks include permuted/rotated-MNIST (Zenke et al., 2017; Goodfellow et al., 2013), CIFAR10/100 (Krizhevsky et al., 2009), and CORe50 (Lomonaco & Maltoni, 2017).

annex

Figure 1 : High level overview of the proposed benchmark. Left block: from the left, the learner acquires task-specific information from each set, one-by-one, without being allowed to view previous sets (memory constraint). The learner can store knowledge in a shared memory bank and use it in a classification model. On the rightmost side, future tasks are inaccessible to the learner. On the bottom, the same process is repeated on the second support set. Note that the first support set is now inaccessible. Right block: once the learner has viewed all support sets, it is given an evaluation set (target set) containing new examples of classes contained in the support-sets, and tasked with producing predictions for those samples. The evaluation procedure has access to the target set labels, and can establish a generalization measure for the model.work for studying continual learning under memory constraints, and for testing meta-learning systems that are capable of continual learning. While we formally define the problem in Section 3, a high-level diagram is shown in Figure 1 . Our main contributions can be summarized as follows:1. We formalize a highly general and flexible continual few-shot learning setting, taking into account recent considerations expressed in the literature.2. We propose a novel benchmark and a compact dataset (SlimageNet64), releasing them under an open source license.3. We compare recent state-of-the-art methods on our benchmark, showing how CFSL is effective in highlighting the strengths and weaknesses of those methods.

1.1. MOTIVATION AND APPLICATIONS

Consider a user in a fast changing environment who must learn from the many scenarios that are encountered. There is the significant challenge of integrating common information from very few data points in each scenario in an online fashion. The small number of data points makes this distinct from normal continual learning setting: the very high uncertainty in each scenario due to the low data volume makes adaptation more challenging and makes the commonality between scenarios even more critical. The online nature of learning makes this distinct from few-shot learning, where integration from different scenarios must be learnt without access to earlier data. These two requirements for online learning without forgetting and efficient integration under uncertainty are competing: both require memory capacity of the learner. Because of these competing requirements, it is valuable to consider continual learning and few-shot learning together rather than in isolation.For a concrete example, consider typical user interfaces such as those used in online stores. The size of data points collected from each user is rather small (few-shot) and is generally stored in a sequential buffer or priority queue (continual). Suppose an underlying learning model has been deployed to enhance the user experience by suggesting new products that are likely to be of interest. This model should be able to rapidly adapt to each user (task) by accessing the sequential buffer while learning on the fly. There are multiple variants to take into account. For instance, if the user is unknown or previous data is not accessible (e.g. under privacy policies) the model has to rapidly infer the user preferences from a single task. On the other hand, if the user profile is known the model should retain knowledge about previous interactions without the need of being retrained

