CONTINUAL LEARNING USING HASH-ROUTED CONVO-LUTIONAL NEURAL NETWORKS

Abstract

Continual learning could shift the machine learning paradigm from data centric to model centric. A continual learning model needs to scale efficiently to handle semantically different datasets, while avoiding unnecessary growth. We introduce hash-routed convolutional neural networks: a group of convolutional units where data flows dynamically. Feature maps are compared using feature hashing and similar data is routed to the same units. A hash-routed network provides excellent plasticity thanks to its routed nature, while generating stable features through the use of orthogonal feature hashing. Each unit evolves separately and new units can be added (to be used only when necessary). Hash-routed networks achieve excellent performance across a variety of typical continual learning benchmarks without storing raw data and train using only gradient descent. Besides providing a continual learning framework for supervised tasks with encouraging results, our model can be used for unsupervised or reinforcement learning.

1. INTRODUCTION

When faced with a new modeling challenge, a data scientist will typically train a model from a class of models based on her/his expert knowledge and retain the best performing one. The trained model is often useless when faced with different data. Retraining it on new data will result in poor performance when trying to reuse the model on the original data. This is what is known as catastrophic forgetting (McCloskey & Cohen, 1989) . Although transfer learning avoids retraining networks from scratch, keeping the acquired knowledge in a trained model and using it to learn new tasks is not straightforward. The real knowledge remains with the human expert. Model training is usually a data centric task. Continual learning (Thrun, 1995) makes model training a model centric task by maintaining acquired knowledge in previous learning tasks. Recent work in continual (or lifelong) learning has focused on supervised classification tasks and most of the developed algorithms do not generate stable features that could be used for unsupervised learning tasks, as would a more generic algorithm such as the one we present. Models should also be able to adapt and scale reasonably to accommodate different learning tasks without using an exponential amount of resources, and preferably with little data scientist intervention. To tackle this challenge, we introduce hash-routed networks (HRN). A HRN is composed of multiple independent processing units. Unlike typical convolutional neural networks (CNN), the data flow between these units is determined dynamically by measuring similarity between hashed feature maps. The generated feature maps are stable. Scalability is insured through unit evolution and by increasing the number of available units, while avoiding exponential memory use. This new type of network maintains stable performance across a variety of tasks (including semantically different tasks). We describe expansion, update and regularization algorithms for continual learning. We validate our approach using multiple publicly available datasets, by comparing supervised classification performance. Benchmarks include Pairwise-MNIST, MNIST/Fashion-MNIST (Xiao et al., 2017) and SVHN/incremental-Cifar100 (Netzer et al., 2011; Krizhevsky et al., 2009) . Relevant background is introduced in section 2. Section 3 details the hash-routing algorithm and discusses its key attributes. Section 4 compares our work with other continual learning and dynamic network studies. A large set of experiments is carried out in section 5.

