TOWARDS LEARNING TO REMEMBER IN META LEARNING OF SEQUENTIAL DOMAINS Anonymous

Abstract

Meta-learning has made rapid progress in past years, with recent extensions made to avoid catastrophic forgetting in the learning process, namely continual meta learning. It is desirable to generalize the meta learner's ability to continuously learn in sequential domains, which is largely unexplored to-date. We found through extensive empirical verification that significant improvement is needed for current continual learning techniques to be applied in the sequential domain meta learning setting. To tackle the problem, we adapt existing dynamic learning rate adaptation techniques to meta learn both model parameters and learning rates. Adaptation on parameters ensures good generalization performance, while adaptation on learning rates is made to avoid catastrophic forgetting of past domains. Extensive experiments on a sequence of commonly used real-domain data demonstrate the effectiveness of our proposed method, outperforming current strong baselines in continual learning. Our code is made publicly available online (anonymous) https://github.com/ ICLR20210927/Sequential-domain-meta-learning.git.

1. INTRODUCTION

Humans have the ability to quickly learn new skills from a few examples, without erasing old skills. It is desirable for machine-learning models to adopt this capability when learning under changing contexts/domains, which are common scenarios for real-world problems. These tasks are easy for humans, yet pose challenges for current deep-learning models mainly due to the following two reasons: 1) Catastrophic forgetting is a well-known problem for neural networks, which are prone to drastically losing knowledge on old tasks when a domain is shifted (McCloskey & Cohen, 1989) ; 2) It has been a long-standing challenge to make neural networks generalize quickly from a limited amount of training data (Wang et al., 2020a) . For example, the dialogue system can be trained on a sequence of domains, (hotel booking, insurance, restaurant, car services, etc) due to the sequential availability of dataset (Mi et al., 2020) . For each domain, each task is defined as learning one customer-specific model (Lin et al., 2019) . After finishing meta training, the model could be deployed to the previously trained domains, as the new (unseen) customers from previous domains may arrive later, they have their own (small) training data (support set) used for adapting the sequentially meta-learned models. After adaptation, the newly adapted model for the new customers can be deployed to make responses to the customers. We formulate the above problem as sequential domain few-shot learning, where a model is required to make proper decisions based on only a few training examples while undergoing constantly changing contexts/domains. It is expected that adjustments to a new context/domain should not erase knowledge already learned from old ones. The problem consists of two key components that have been considered separately in previous research: the ability to learn from a limited amount of data, referred to as few-shot learning; and the ability to learn new tasks without forgetting old knowledge, known as continual learning. The two aspects have been proved to be particularly challenging for deep learning models, explored independently by extensive previous work (Finn et al., 2017; Snell et al., 2017; Kirkpatrick et al., 2017; Lopez-Paz & Ranzato, 2017) . However, a more challenging yet useful perspective to jointly integrate the two aspects remains less explored. Generally speaking, meta-learning targets learning from a large number of similar tasks with a limited number of training examples per class. Most existing works focus on developing the general-

