EFFICIENT ARCHITECTURE SEARCH FOR CONTINUAL LEARNING

Abstract

Continual learning with neural networks is an important learning framework in AI that aims to learn a sequence of tasks well. However, it is often confronted with three challenges: (1) overcome the catastrophic forgetting problem, (2) adapt the current network to new tasks, and meanwhile (3) control its model complexity. To reach these goals, we propose a novel approach named as Continual Learning with Efficient Architecture Search, or CLEAS in short. CLEAS works closely with neural architecture search (NAS) which leverages reinforcement learning techniques to search for the best neural architecture that fits a new task. In particular, we design a neuron-level NAS controller that decides which old neurons from previous tasks should be reused (knowledge transfer), and which new neurons should be added (to learn new knowledge). Such a fine-grained controller allows finding a very concise architecture that can fit each new task well. Meanwhile, since we do not alter the weights of the reused neurons, we perfectly memorize the knowledge learned from previous tasks. We evaluate CLEAS on numerous sequential classification tasks, and the results demonstrate that CLEAS outperforms other state-of-the-art alternative methods, achieving higher classification accuracy while using simpler neural architectures.

1. INTRODUCTION

Continual learning, or lifelong learning, refers to the ability of continually learning new tasks and also performing well on learned tasks. It has attracted enormous attention in AI as it mimics a human learning process -constantly acquiring and accumulating knowledge throughout their lifetime (Parisi et al., 2019) . Continual learning often works with deep neural networks (Javed & White, 2019; Nguyen et al., 2017; Xu & Zhu, 2018) as the flexibility in a network design can effectively allow knowledge transfer and knowledge acquisition. However, continual learning with neural networks usually faces three challenges. The first one is to overcome the so-called catastrophic forgetting problem (Kirkpatrick et al., 2017) , which states that the network may forget what has been learned on previous tasks. The second one is to effectively adapt the current network parameters or architecture to fit a new task, and the last one is to control the network size so as not to generate an overly complex network. In continual learning, there are two main categories of strategies that attempt to solve the aforementioned challenges. The first category is to train all tasks within a network with fixed capacity. For example, (Rebuffi et al., 2017; Lopez-Paz & Ranzato, 2017; Aljundi et al., 2018) replay some old samples with the new task samples and then learn a new network from the combined training set. The drawback is that they typically require a memory system that stores past data. (Kirkpatrick et al., 2017; Liu et al., 2018) employ some regularization terms to prevent the re-optimized parameters from deviating too much from the previous ones. Approaches using fixed network architecture, however, cannot avoid a fundamental dilemma -they must either choose to retain good model performances on learned tasks, leaving little room for learning new tasks, or compromise the learned model performances to allow learning new tasks better. To overcome such a dilemma, the second category is to expand the neural networks dynamically (Rusu et al., 2016; Yoon et al., 2018; Xu & Zhu, 2018) . They typically fix the parameters of the old neurons (partially or fully) in order to eliminate the forgetting problem, and also permit adding new neurons to adapt to the learning of a new task. In general, expandable networks can achieve better model performances on all tasks than the non-expandable ones. However, a new issue appears: expandable

