THE MULTIPLE SUBNETWORK HYPOTHESIS ENABLING MULTIDOMAIN LEARNING BY ISOLATING TASK-SPECIFIC SUBNET-WORKS IN FEEDFORWARD NEURAL NETWORKS

Abstract

Neural networks have seen an explosion of usage and research in the past decade, particularly within the domains of computer vision and natural language processing. However, only recently have advancements in neural networks yielded performance improvements beyond narrow applications and translated to expanded multitask models capable of generalizing across multiple data types and modalities. Simultaneously, it has been shown that neural networks are overparameterized to a high degree, and pruning techniques have proved capable of significantly reducing the number of active weights within the network while largely preserving performance. In this work, we identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks. We employ these methodologies on well-known benchmarking datasets for testing purposes and show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.

1. INTRODUCTION

It is well known and documented that artificial neural networks (ANNs) are often overparameterized, resulting in computational inefficiency (LeCun et al., 1990; Liu et al., 2019) . Applying unstructured pruning to an ANN pares down the number of active weights by identifying which subset of weights in an ANN is most important for the model's predictive performance and discarding those which are less important or even entirely unnecessary. This technique has been shown to reduce the computational cost of using and storing a model without necessarily affecting model accuracy (Frankle & Carbin, 2019; Suzuki et al., 2001; Han et al., 2016; Lis et al., 2019; Wang et al., 2021) . This phenomenon naturally raises a corollary question: are pruned weights entirely useless, or could weights identified as being unnecessary for one task be retained and used to learn other tasks? Further, can the ANN's performance on learned tasks be preserved while these weights learn to perform new tasks? The exploration of these questions leads us to propose the multitple subnetwork hypothesis -that a dense, randomly-initialized feedforward ANN contains within its architecture multiple disjoint subnetworks which can be utilized together to learn and make accurate predictions on multiple tasks, regardless of the degree of similarity between tasks or input types. Instead of focusing on matching or surpassing state of the art results on the data sets and tasks presented in this study, we focus on testing the multiple subnetwork hypothesis on a set of standardized network architectures and compare multitask model performance with traditionally trained, single-task models of identical architectures. An obstacle to developing multitask ANNs is the tendency of ANNs to exhibit catastrophic forgetting (CF), during which they destroy internal state representations used in learning previously acquired tasks when learning a new task (French, 1999; Goodfellow et al., 2013; Pfülb & Gepperth, 2019) ). CF is especially pronounced in continuous learning paradigms with a low degree of intertask relatedness (Aljundi et al., 2017; Ma et al., 2018; Masana et al., 2021) , posing a challenge to creating multitask models able to perform prediction or classification across very different tasks, input data types, or input shapes.

