BAYESIAN ONLINE META-LEARNING

Abstract

Neural networks are known to suffer from catastrophic forgetting when trained on sequential datasets. While there have been numerous attempts to solve this problem for large-scale supervised classification, little has been done to overcome catastrophic forgetting for few-shot classification problems. Few-shot metalearning algorithms often require all few-shot tasks to be readily available in a batch for training. The popular gradient-based model-agnostic meta-learning algorithm (MAML) is a typical algorithm that suffers from these limitations. This work introduces a Bayesian online meta-learning framework to tackle the catastrophic forgetting and the sequential few-shot tasks problems. Our framework incorporates MAML into a Bayesian online learning algorithm with Laplace approximation or variational inference. This framework enables few-shot classification on a range of sequentially arriving datasets with a single meta-learned model and training on sequentially arriving few-shot tasks. The experimental evaluations demonstrate that our framework can effectively prevent catastrophic forgetting and is capable of online meta-learning in various few-shot classification settings.

1. INTRODUCTION

Image classification models and algorithms often require an enormous amount of labelled examples for training to achieve state-of-the-art performance. Labelled examples can be expensive and time-consuming to acquire. Human visual systems, on the other hand, are able to recognise new classes after being shown a few labelled examples. Few-shot classification (Miller et al., 2000; Li et al., 2004; 2006; Lake et al., 2011) tackles this issue by learning to adapt to unseen classes (known as novel classes) with very few labelled examples from each class. Recent works show that metalearning provides promising approaches to few-shot classification problems (Santoro et al., 2016; Finn et al., 2017; Li et al., 2017; Ravi & Larochelle, 2017) . Meta-learning or learning-to-learn (Schmidhuber, 1987; Thrun & Pratt, 1998) takes the learning process a level deeper -instead of learning from the labelled examples in the training classes (known as base classes), meta-learning learns the example-learning process. The training process in meta-learning that utilises the base classes is called the meta-training stage, and the evaluation process that reports the few-shot performance on the novel classes is known as the meta-evaluation stage. Despite being a promising solution to few-shot classification problems, meta-learning methods suffer from several limitations: 1. Unable to continually learn from sequential few-shot tasks: It is mandatory to have all base classes readily available for meta-training. Such meta-learning algorithms often require sampling a number of few-shot tasks in every iteration for optimisation. 2. Unable to retain few-shot classification ability on sequential datasets that have evident distributional shift: A meta-learned model is restricted to perform few-shot classification on a specific dataset, in the sense that the base and novel classes have to originate from the same dataset distribution. A meta-learned model loses its few-shot classification ability on previous datasets as new ones arrive subsequently for meta-training. We emphasise that the task mentioned in this paper refers to the few-shot task for meta-learning. This paper considers meta-learning a single model for few-shot classification in the sequential datasets and sequential few-shot tasks settings respectively.

