SELF-ACTIVATING NEURAL ENSEMBLES FOR CON-TINUAL REINFORCEMENT LEARNING

Abstract

The ability of an agent to continuously learn new skills without catastrophically forgetting existing knowledge is of critical importance for the development of generally intelligent agents. Most methods devised to address this problem depend heavily on well-defined task boundaries which simplify the problem considerably. Our task-agnostic method, Self-Activating Neural Ensembles (SANE), uses a hierarchical modular architecture designed to avoid catastrophic forgetting without making any such assumptions. At each timestep a path through the SANE tree is activated; during training only activated nodes are updated, ensuring that unused nodes do not undergo catastrophic forgetting. Additionally, new nodes are created as needed, allowing the system to leverage and retain old skills while growing and learning new ones. We demonstrate our approach on MNIST and a set of grid world environments, demonstrating that SANE does not undergo catastrophic forgetting where existing methods do.

1. INTRODUCTION

Lifelong learning is of critical importance for the field of robotics; an agent that interacts with the world should be able to continuously learn from it, acting intelligently in a wide variety of situations. In marked contrast to this ideal, most standard deep reinforcement learning methods are centered around a single task. First, a task is defined, then a policy is learned to maximize the rewards the agent receives in that setting. If the task is changed, a completely new model is learned, throwing away the previous model and previous interactions. Task specification therefore plays a central role in current end-to-end deep reinforcement learning frameworks. But is task-driven learning scalable? In contrast, humans do not require concrete task boundaries to be able to effectively learn separate tasks -instead, we perform continual (lifelong) learning. The same model is used to learn new skills, leveraging the lessons of previous skills to learn more efficiently, without forgetting old behaviors. However, when placed into continual learning settings, current deep reinforcement learning approaches do neither: the transfer properties of these systems are negligible and they suffer from catastrophic forgetting (McCloskey & Cohen, 1989; French, 2006) . The core issue of catastrophic forgetting is that a neural network trained on one task starts to forget what it knows when trained on a second task, and this issue only becomes exacerbated as more tasks are added. The problem ultimately stems from training one network end-to-end sequentially; the shared nature of the weights and the backpropagation used to update them mean that later tasks overwrite earlier ones (McCloskey & Cohen, 1989; Ratcliff, 1990) . To handle this, past approaches have attempted a wide variety of ideas: from task-based regularization (Kirkpatrick et al., 2017) , to learning different sub-modules for different tasks (Rusu et al., 2016) , to dual-system slow/fast learners inspired by the human hippocampus (Schwarz et al., 2018) . The core problem of continual learning, which none of these methods address, is that the agent needs to autonomously determine how and when to adapt to changing environments, as it is infeasible for a human to indefinitely provide an agent with task-boundary supervision. Specifically, these approaches rely on the notion of tasks to identify when to spawn new sub-modules, when to freeze weights, when to save parameters, etc. Leaning on task boundaries is unscalable and side-steps the core problem. There are a few existing task-free methods. Some address the problem by utilizing fixed subdivisions of the input space (Aljundi et al., 2018; Veness et al., 2019) , which we believe limits their flexibility.

