MULTITASK REINFORCEMENT LEARNING BY OPTIMIZING NEURAL PATHWAYS

Abstract

Reinforcement learning (RL) algorithms have achieved great success in learning specific tasks, as evidenced by examples such as AlphaGo or fusion control. However, it is still difficult for an RL agent to learn how to solve multiple tasks. In this paper, we propose a novel multitask learning framework, in which multiple specialized pathways through a single network are trained simultaneously, with each pathway focusing on a single task. We show that this approach achieves competitive performance with existing multitask RL methods, while using only 5% of the number of neurons per task. We demonstrate empirically the success of our approach on several continuous control tasks, in both online and offline training.

1. INTRODUCTION

Our brain processes different languages, helps us perceive and act in a 3D world, coordinates between organs, and performs many other tasks. The brain continuously learns new things without catastrophic forgetting due to its plasticity (Zilles, 1992; Drubach, 2000; Sakai, 2020; Rorke, 1985) , i.e., its ability to continually strengthen more frequently used synaptic connections and eliminate synaptic connections that are rarely used, a phenomenon called synaptic pruning (Feinberg, 1982) . In this way, the brain creates neural pathways to transmit information. Different neural pathways (Rudebeck et al., 2006; Paus et al., 1999; Jürgens, 2002; Goodale et al., 1994) are used to complete different tasks. For example, the visual mechanisms that create the perception of objects are functionally and neurally distinct from those controlling the pre-shaping of the hand during grasping movements directed at those objects (Goodale et al., 1994) . In deep learning architectures, the hope is that gradient descent, possibly augmented by attention mechanisms, will create these types of pathways automatically. However, when performing multi-task learning, this approach can be insufficient to prevent interference between tasks, and often leads to loss of plasticity. This problem is especially exacerbated in reinforcement learning, where different tasks require policies which generate different data distributions. In this paper, we take inspiration from the presence of distinct neural pathways for different tasks in natural brains, and propose a novel approach to implement it in deep reinforcement learning. Similarly to synaptic pruning, we aim to identify the important connections among the neurons in a deep neural network that allow accomplishing a specific task. However, we leverage insights from recent lottery ticket hypothesis (Frankle & Carbin, 2019; Lee et al., 2018; Tanaka et al., 2020; Wang et al., 2020a) literature to construct task-specific neural pathways which utilise only a very small fraction (5%) of the parameters of the entire neural network, yet demonstrate expert-level performance in multitask reinforcement learning in both online and offline settings. The neural pathways are allowed to overlap, which enables leveraging information from multiple tasks in the updates of the same parameters, in order to improve generalization.

Contributions:

We propose Neural Pathway Framework (NPF), a novel multitask learning approach that generates neural pathways through a large network that are specific to single tasks. Our approach can be easily integrated with any online or offline algorithm without changing the learning objectivefoot_0 . Because the shift in data distribution during online reinforcement learning training makes it challenging to find neural pathways by using lottery network search techniques (Frankle & Carbin, 2019; Alizadeh, 2019; Wang et al., 2020a) , we propose a data adaptive pathway discovery method for the online setting. We demonstrate our approach in both offline and online multitask reinforcement learning, using Mujoco-based and MetaWorld environments. Our method is frugal in terms of the number of neurons, and multiple tasks are trained in parallel, thus saving significant training time. The algorithm finds neural pathways which use only 5% of the neural network weights for each task. Since our method requires a small fraction of weights to complete a task, it can provide fast inference (Molchanov et al., 2017; Luo et al., 2017) , reducing energy consumption (Yang et al., 2016) . Thus, our method is especially suitable for real-world applications in which large datasets need to be processed in real-time (e.g., self-driving cars, deploying bots in games, financial data analysis) or which are deployed on low-resource devices (i.e., embedded systems, edge devices, etc.). 

2. RELATED WORK

Many recent works point out that training a deep neural network for more than one task becomes difficult due to gradient interference, i.e., gradients for different tasks pointing in very different directions. Recent work proposes several possible solutions, such as constraining the conflicting gradient update (Yu et al., 2020; Suteu & Guo, 2019; Du et al., 2020; Chen et al., 2017; Sener & Koltun, 2018) , constraining data sharing among irrelevant tasks (Yu et al., 2021; Fifty et al., 2021) , and learning the underlying context of relevant tasks (Sodhani et al., 2021) . In all of these cases, success depends on the assumption that there is an underlying shared structure among these tasks, and on how well this structure can be captured during training. Many multitask learning methods propose complex ways to modularize the neural network, which allow sharing and reusing network components across tasks (Rusu et al., 2016; Fernando et al., 2017; Rosenbaum et al., 2017; Devin et al., 2016; Misra et al., 2016; Yang et al., 2020) . In this paper, we propose a completely new way of tackling multitask problems, and we show that a single deep neural network is sufficient for learning multiple complex tasks, by having multiple pathways through the network. Recent advances in finding sparse networks have proven that there exist sub-networks which contain a small fraction of the parameters of the dense deep neural network yet retain the same performance. There is a range of techniques to find such sub-networks through an iterative update during training (Frankle & Carbin, 2019; Feinberg, 1982; Gale et al., 2019; Blalock et al., 2020; Ohib et al., 2019; Yang et al., 2019) . Simpler alternatives find a sub-network single-shot (Lee et al., 2018; Wang et al., 2020b) at weight initialization. Building upon a three-decade-old saliency criterion used for pruning trained models (Mozer & Smolensky, 1988) , a recent technique to prune models at initialization was proposed by Lee et al. (2018) . Utilizing this criterion, they are able to predict, at initialization, the importance that each weight will have later in the training process. However, premature gradientbased pruning at initialization can lead to layer collapse: the premature pruning of an entire layer can render a network un-trainable. Many techniques (Wang et al., 2020b; Feinberg, 1982) try to maximize useful gradient flow through the deep layers. In order to discover neural pathways, we can leverage these methods in order to find task-dependent sub-networks, as long as we can avoid layer collapse.



Code: https://github.com/anomICLR2023/2904



Figure 1: During evaluation of a trained agent (global model), for any given task our proposed method activates specific part of the neural network.

