MULTITASK REINFORCEMENT LEARNING BY OPTIMIZING NEURAL PATHWAYS

Abstract

Reinforcement learning (RL) algorithms have achieved great success in learning specific tasks, as evidenced by examples such as AlphaGo or fusion control. However, it is still difficult for an RL agent to learn how to solve multiple tasks. In this paper, we propose a novel multitask learning framework, in which multiple specialized pathways through a single network are trained simultaneously, with each pathway focusing on a single task. We show that this approach achieves competitive performance with existing multitask RL methods, while using only 5% of the number of neurons per task. We demonstrate empirically the success of our approach on several continuous control tasks, in both online and offline training.

1. INTRODUCTION

Our brain processes different languages, helps us perceive and act in a 3D world, coordinates between organs, and performs many other tasks. The brain continuously learns new things without catastrophic forgetting due to its plasticity (Zilles, 1992; Drubach, 2000; Sakai, 2020; Rorke, 1985) , i.e., its ability to continually strengthen more frequently used synaptic connections and eliminate synaptic connections that are rarely used, a phenomenon called synaptic pruning (Feinberg, 1982) . In this way, the brain creates neural pathways to transmit information. Different neural pathways (Rudebeck et al., 2006; Paus et al., 1999; Jürgens, 2002; Goodale et al., 1994) are used to complete different tasks. For example, the visual mechanisms that create the perception of objects are functionally and neurally distinct from those controlling the pre-shaping of the hand during grasping movements directed at those objects (Goodale et al., 1994) . In deep learning architectures, the hope is that gradient descent, possibly augmented by attention mechanisms, will create these types of pathways automatically. However, when performing multi-task learning, this approach can be insufficient to prevent interference between tasks, and often leads to loss of plasticity. This problem is especially exacerbated in reinforcement learning, where different tasks require policies which generate different data distributions. In this paper, we take inspiration from the presence of distinct neural pathways for different tasks in natural brains, and propose a novel approach to implement it in deep reinforcement learning. Similarly to synaptic pruning, we aim to identify the important connections among the neurons in a deep neural network that allow accomplishing a specific task. However, we leverage insights from recent lottery ticket hypothesis (Frankle & Carbin, 2019; Lee et al., 2018; Tanaka et al., 2020; Wang et al., 2020a) literature to construct task-specific neural pathways which utilise only a very small fraction (5%) of the parameters of the entire neural network, yet demonstrate expert-level performance in multitask reinforcement learning in both online and offline settings. The neural pathways are allowed to overlap, which enables leveraging information from multiple tasks in the updates of the same parameters, in order to improve generalization.

Contributions:

We propose Neural Pathway Framework (NPF), a novel multitask learning approach that generates neural pathways through a large network that are specific to single tasks. Our approach can be easily integrated with any online or offline algorithm without changing the learning objectivefoot_0 . Because the shift in data distribution during online reinforcement learning training makes it challenging to find neural pathways by using lottery network search techniques (Frankle & Carbin, 2019; Alizadeh, 2019; Wang et al., 2020a) , we propose a data adaptive pathway discovery method for



Code: https://github.com/anomICLR2023/2904 1

