INCREMENTAL FEW-SHOT LEARNING VIA VECTOR QUANTIZATION IN DEEP EMBEDDED SPACE

Abstract

The capability of incrementally learning new tasks without forgetting old ones is a challenging problem due to catastrophic forgetting. This challenge becomes greater when novel tasks contain very few labelled training samples. Currently, most methods are dedicated to class-incremental learning and rely on sufficient training data to learn additional weights for newly added classes. Those methods cannot be easily extended to incremental regression tasks and could suffer from severe overfitting when learning few-shot novel tasks. In this study, we propose a nonparametric method in deep embedded space to tackle incremental few-shot learning problems. The knowledge about the learned tasks is compressed into a small number of quantized reference vectors. The proposed method learns new tasks sequentially by adding more reference vectors to the model using few-shot samples in each novel task. For classification problems, we employ the nearest neighbor scheme to make classification on sparsely available data and incorporate intra-class variation, less forgetting regularization and calibration of reference vectors to mitigate catastrophic forgetting. In addition, the proposed learning vector quantization (LVQ) in deep embedded space can be customized as a kernel smoother to handle incremental few-shot regression tasks. Experimental results demonstrate that the proposed method outperforms other state-of-the-art methods in incremental learning.

1. INTRODUCTION

Incremental learning is a learning paradigm that allows the model to continually learn new tasks on novel data, without forgetting how to perform previously learned tasks (Cauwenberghs & Poggio, 2001; Kuzborskij et al., 2013; Mensink et al., 2013) . The capability of incremental learning becomes more important in real-world applications, in which the deployed models are exposed to possible out-of-sample data. Typically, hundreds of thousands of labelled samples in new tasks are required to re-train or fine-tune the model (Rebuffi et al., 2017) . Unfortunately, it is impractical to gather sufficient samples of new tasks in real applications. In contrast, humans can learn new concepts from just one or a few examples, without losing old knowledge. Therefore, it is desirable to develop algorithms to support incremental learning from very few samples. While a natural approach for incremental few-shot learning is to fine-tune part of the base model using novel training data (Donahue et al., 2014; Girshick et al., 2014) , the model could suffer from severe over-fitting on new tasks due to a limited number of training samples. Moreover, simple fine-tuning also leads to significant performance drop on previously learned tasks, termed as catastrophic forgetting (Goodfellow et al., 2014) . Recent attempts to mitigate the catastrophic forgetting are generally categorized into two streams: memory relay of old training samples (Rebuffi et al., 2017; Shin et al., 2017; Kemker & Kanan, 2018) and regularization on important model parameters (Kirkpatrick et al., 2017; Zenke et al., 2017) . However, those incremental learning approaches are developed and tested on unrealistic scenarios where sufficient training samples are available in novel tasks. They may not work well when the training samples in novel tasks are few (Tao et al., 2020b) . To the best of our knowledge, the majority of incremental learning methodologies focus on classification problems and they cannot be extended to regression problems easily. In class-incremental learning, the model has to expand output dimensions to learn N novel classes while keeping the knowledge of existing N classes. Parametric models estimate additional classification weights for novel classes, while nonparametric methods compute the class centroids for novel classes. In comparison, output dimensions in regression problems do not change in incremental learning as neither additional weights nor class centroids are applicable to regression problems. Besides, we find that catastrophic forgetting in incremental few-shot classification can be attributed to three reasons. First, the model is biased towards new classes and forgets old classes because the model is fine-tuned on new data only (Hou et al., 2019; Zhao et al., 2020) . Meanwhile, the prediction accuracy on novel classes is not good due to over-fitting on few-shot training samples. Second, features of novel samples could overlap with those of old classes in the feature space, leading to ambiguity among classes in the feature space. Finally, features of old classes and classification weights are no longer compatible after the model is fine-tuned with new data. In this paper, we investigate the problem of incremental few-shot learning, where only a few training samples are available in new tasks. A unified model is learned sequentially to jointly recognize all classes or regression targets that have been encountered in previous tasks (Rebuffi et al., 2017; Wu et al., 2019) . To tackle aforementioned problems, we propose a nonparametric method to handle incremental few-shot learning based on learning vector quantization (LVQ) (Sato & Yamada, 1996) in deep embedded space. As such, the adverse effects of imbalanced weights in a parametric classifier can be completely avoided (Mensink et al., 2013; Snell et al., 2017; Yu et al., 2020) . Our contributions are three fold. First, a unified framework is developed, termed as incremental deep learning vector quantization (IDLVQ), to handle both incremental classification (IDLVQ-C) and regression (IDLVQ-R) problems. Second, we develop intra-class variance regularization, less forgetting constraints and calibration factors to mitigate catastrophic forgetting in class-incremental learning. Finally, the proposed methods achieve state-of-the-art performance on incremental fewshot classification and regression datasets.

2. RELATED WORK

Incremental learning: Some incremental learning approaches rely on memory replay of old exemplars to prevent forgetting previously learned knowledge. Old exemplars can be saved in memory (Rebuffi et al., 2017; Castro et al., 2018; Prabhu et al., 2020) or sampled from generative models (Shin et al., 2017; Kemker & Kanan, 2018; van de Ven et al., 2020) . However, explicit storage of training samples is not scalable if the number of classes is large. Furthermore, it is difficult to train a reliable generative model for all classes from very few training samples. In parallel, regularization approaches do not require old exemplars and impose regularization on network weights or outputs to minimize the change of parameters that are important to old tasks (Kirkpatrick et al., 2017; Zenke et al., 2017) . To avoid quick performance deterioration after learning a sequence of novel tasks in regularization approaches, semantic drift compensation (SDC) is developed by learning an embedding network via triplet loss (Schroff et al., 2015) and compensates the drift of class centroids using novel data only (Yu et al., 2020) . In comparison, IDLVQ-C saves only one exemplar per class and uses saved exemplars to regularize the change in feature extractor and calibrate the change in reference vectors. Few-shot learning: Few-shot learning attempts to obtain models for classification or regression tasks with only a few labelled samples. Few-shot models are trained on widely-varying episodes of fake few-shot tasks with labelled samples drawn from a large-scale meta-training dataset (Vinyals et al., 2016; Finn et al., 2017; Ravi & Larochelle, 2017; Snell et al., 2017; Sung et al., 2018) . Meanwhile, recent works attempt to handle novel few-shot tasks while retraining the knowledge of the base task. These methods are referred to as dynamic few-shot learning (Gidaris & Komodakis, 2018; Ren et al., 2019a; Gidaris & Komodakis, 2019) . However, dynamic few-shot learning is different from incremental few-shot learning, because they rely on the entire base training dataset and an extra meta-training dataset during meta-training. In addition, dynamic few-shot learning does not accumulate knowledge for multiple novel tasks sequentially. Incremental few-shot learning: Prior works on incremental few-shot learning focus on classification problems by computing the weights for novel classes in parametric classifiers, without iterative

