MULTI-TASK STRUCTURAL LEARNING USING LOCAL TASK SIMILARITY INDUCED NEURON CREATION AND REMOVAL

Abstract

Multi-task learning has the potential to improve generalization by maximizing positive transfer between tasks while reducing task interference. Fully achieving this potential is hindered by manually designed architectures that remain static throughout training. On the contrary, learning in the brain occurs through structural changes that are in tandem with changes in synaptic strength. Thus, we propose Multi-Task Structural Learning (MTSL) that simultaneously learns the multi-task architecture and its parameters. MTSL begins with an identical single task network for each task and alternates between a task learning phase and a structural learning phase. In the task learning phase, each network specializes in the corresponding task. In each of the structural learning phases, starting from the earliest layer, locally similar task layers first transfer their knowledge to a newly created group layer before they are removed. MTSL then uses the group layer in place of the corresponding removed task layers and moves on to the next layers. Our empirical results show that MTSL achieves competitive generalization with various baselines and improves robustness to out-of-distribution data. 1

1. INTRODUCTION

Artificial Neural Networks (ANNs) have exhibited strong performance in various tasks essential for scene understanding. Single-Task Learning (STL) (Yu et al., 2021; Wang et al., 2020b; Orsic et al., 2019) has been largely at the center of this exhibit driven by custom task-specific improvements. Despite these improvements, using single task networks for the multiple tasks required for scene understanding comes with notable problems such as a linear increase in computational cost and a lack of inter-task communication. Multi-Task Learning (MTL), on the other hand, with the aid of shared layers provides favorable benefits over STL such as improved inference efficiency and positive information transfer between tasks. However, a notable drawback of sharing layers is task interference. Existing works have attempted to alleviate task interference by modifying the architecture (Kanakis et al., 2020; Liu et al., 2019) , determining which tasks to group together using a similarity notion (Standley et al., 2020; Fifty et al., 2021; Vandenhende et al., 2020) , balancing task loss functions (Kendall et al., 2018; Liu et al., 2019; Yu et al., 2020; Lin et al., 2019) , or learning the architecture (Guo et al., 2020; Lu et al., 2017) . Although these methods have shown promise, progress can be made by drawing inspiration from the brain, which is the only known intelligent system that excels in multi-task learning. The inner mechanisms of the brain, although not fully understood, can guide research in ANNs through simplified notions. Neuron creation and neuron removal (Maile et al., 2022) are simplified notions that can aid in the automated design of Multi-Task Networks (MTNs). Neuron removal presents the opportunity to start from a dense set of neurons and move toward a sparse set of neurons. In the early stages of brain development, neural circuits consist of excess neurons and connections that provide a rich information pipeline (Maile et al., 2022) . This pipeline allows neural circuits to learn specialized functions while undergoing neuron removal and synaptic pruning (Riccomagno & Kolodkin, 2015) . Thereby, moving from a dense architecture consisting of multiple single-task networks to a sparse multi-task architecture could be beneficial. Neuron creation is an open-ended operation due to the difficulty involved in deciding where, how, and when to create neurons (Evci et al., 2022) . In the brain, local communication between neurons is an important part of learning. Learning rules that modulate synaptic strength are local in nature (Kudithipudi et al., 2022) and local neural activity could be responsible for the creation of neurons (Luhmann et al., 2016) and also neuron removal (Faust et al., 2021) . We explore local task similarity to drive neuron creation and removal, which together could improve learning. Structural learning pertains to the learning of the architecture and its parameters simultaneously (Maile et al., 2022) . The neural circuitry in the brain changes even during adulthood, undergoing morphological changes induced by structural plasticity (Kudithipudi et al., 2022) . Evidently, learning in the brain does not involve static architecture creation followed by modulation of synaptic strengths. Instead, architecture changes occur in tandem with changes in synaptic strength. Thus, utilizing structural learning with strategic neural operations could mitigate the effects of task interference and promote generalization in MTL. Therefore, we propose Multi-Task Structural Learning (MTSL) to simultaneously learn the multi-task architecture and its parameters. MTSL considers entire layers as computation units (Maile et al., 2022) and performs neuron creation and neuron removal on them. Inspired by the creation of a large number of neurons in the developmental stage of the brain, MTSL begins training by initializing each task with its own network. Similar to the brain, the excess layers of each task network provide a rich information flow to inform grouping decisions. Local task similarity is used to guide task learning through the alignment of task representations, and also to make decisions on grouping tasks. A positive decision to group tasks induces the creation of a group layer, and the associated task layers transfer their learned knowledge to the group layer before being removed. Finally, a few epochs of fine-tuning result in a learned MTN which persists the learned parameters for inference.

Contributions. (i)

We propose a structural learning algorithm for multi-task learning based on aligning local task representations, grouping similar task layers, transferring information from grouped task layers to a new group layer, and removing the concerned task layers. (ii) We compare against various state-of-the-art methods and show that MTSL shows improved generalization without the need to retrain. (iii) We show that MTSL improves the robustness to natural corruptions. (iv). We present an ablation on the various components of MTSL and show its utility.

2. RELATED WORKS

Although different lines of work, such as architecture modifications (Liu et al., 2019; Kanakis et al., 2020; Misra et al., 2016 ), task grouping (Standley et al., 2020; Fifty et al., 2021; Vandenhende et al., 2020) , or task loss balancing (Kendall et al., 2018; Liu et al., 2019; Yu et al., 2020; Lin et al., 2019) address task interference, they use hand-designed architectures that could be suboptimal. A variety of works in the MTL literature propose methods to learn the architecture, and we categorize these works into two groups. One group of works considers learning architectures that are capable of dynamically changing their structure based on the input (Hazimeh et al., 2021; Ahn et al., 2019; Rosenbaum et al., 2018) while the other group learns the branching structure (Guo et al., 2020; Bruggemann et al., 2020; Lu et al., 2017; Zhang et al., 2022; 2021; Raychaudhuri et al., 2022) . Input dependent dynamic architectures draw inspiration from the brain and provide many benefits, including improved computational efficiency (Han et al., 2021) . DSelect-k (Hazimeh et al., 2021) is a mixture of experts model that enables selecting a spare set of experts to infer an input sample. (Ahn et al., 2019) learn a selector network that learns to pick a subnetwork from a large estimator network based on input. In routing networks (Rosenbaum et al., 2018) , task-dependent agents are trained using reinforcement learning to pick a path within a large network to infer an input. While these approaches aim to optimize networks or subnetworks to specialize for a certain distribution of input samples, MTSL aims to optimize a shared network for all tasks. The branching structure of multi-task networks have been learned using different approaches. (Zhang et al., 2022) propose to estimate the accuracy of a branched multi-task network using two task networks with similar branching. They also suggest data structures and methods to ease branching decisions search in an arbitrary network similar to (Zhang et al., 2021) . Raychaudhuri et al. (2022) propose two controller networks that predict the branching structure and the weights of the cross-task edges based on user preferred task importance and budget constraints. (Guo et al., 2020) start from a



Code will be made available after acceptance.

