ARE NEURAL NETS MODULAR? INSPECTING FUNC-TIONAL MODULARITY THROUGH DIFFERENTIABLE WEIGHT MASKS

Abstract

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.

1. INTRODUCTION

Modularity is an important organization principle in both artificial (Ballard, 1987; Baldwin & Clark, 2000) and biological (von Dassow & Munro, 1999; Lorenz et al., 2011; Clune et al., 2013) systems. It provides a natural way of achieving compositionality, which appears essential for systematic generalization, one of the areas where typical artificial neural networks (NNs) do not yet perform well (Fodor et al., 1988; Marcus, 1998; Lake & Baroni, 2018; Hupkes et al., 2020) . Recently, NNs with explicitly designed modules have demonstrated superior generalization capabilities (Clune et al., 2013; Andreas et al., 2016; Kirsch et al., 2018; Chang et al., 2019; Bahdanau et al., 2019; Goyal et al., 2021b) , which support this intuition. An implicit assumption behind such models is that NNs without hand-designed modularity do not learn to become modular by themselves. In contrast, it was recently shown that certain types of modular structures do emerge in standard NNs (Watanabe, 2019; Filan et al., 2020) . However, due to defining modules in terms of activation statistics or clustering connectivity, it remains unclear whether these correspond to a functional decomposition. This paper contributes new insights into the generalization capabilities of popular neural networks by investigating whether modules implementing specific functionality emerge and to what extent they enable compositionality. This calls for a functional definition of modules, which has not previously been studied in prior work. In particular, we consider functional modules given by subsets of weights (i.e. subnetworks) responsible for performing a specific 'target functionality', such as solving a subtask of the original task. By associating modules with performing a specific function they become easier to interpret. Moreover, depending on the chosen target functionality, modules at multiple different levels of granularity can be considered. To unveil whether a NN has learned to acquire functional modules we propose a novel analysis tool that works on pre-trained NNs. Given an auxiliary task corresponding to a particular target function of interest (e.g., train only on a specific subset of the samples from the original dataset), we train probabilistic, binary, but differentiable masks for all weights (while the NN's weights remain frozen). The result is a binary mask exhibiting the module necessary to perform the target function. Our approach is simple yet general, which readily enables us to analyze several popular NN architectures on a variety of tasks in this way, including recurrent NNs (RNNs), Transformers (Vaswani et al., 2017) , feedforward NNs (FNNs) and convolutional NNs (CNNs). To investigate whether the discovered functional modules are part of a compositional solution, we analyze whether the NN has the following two desirable properties: (P specialize ) it uses different modules for very different functions, and (P reuse ) it uses the same module for identical functions that may have to be performed multiple timesfoot_0 . Here we treat P specialize and P reuse as continuous quantities, which lets us focus on the degree to which functional modularity emerges. Further, since for many tasks it is unclear what precise amount of sharing is desirable, we will measure P specialize and P reuse by considering the change in performance as a result of applying different masks corresponding to a target function. This yields an easy to interpret metric that does not assume precise knowledge about the desired level of weight sharing. We experimentally show that many typical NNs exhibit P specialize but not P reuse . By additionally analyzing the capacity for transfer learning, we provide further insight into this issue. We offer a possible explanation: while simple data routing between modules in standard NNs is often highly desirable, it is hard to learn since the weights must also implement the data transformation. Indeed, our findings suggest that standard NNs have no bias towards separating these conceptually different goals of data transformation and information routing. We also demonstrate how the functional modules discovered by typical NNs do not tend to encourage compositional solutions. For example, we analyze encoder-decoder LSTMs (Hochreiter & Schmidhuber, 1997) and Transformers (Vaswani et al., 2017) on the SCAN dataset (Lake & Baroni, 2018) designed to test systematic generalization based on textual commands. We show that combinationspecific weights are learned to deal with certain command combinations, even when they are governed by the same rules as the other combinations. The existence of such weights indicates that the learned solution is non-compositional and fails at performing the more symbolic manipulation required for systematic generalization on SCAN. To demonstrate that this issue is present even in more real-world scenarios, we highlight identical behavior on the challenging Mathematics Dataset (Saxton et al., 2019) . Finally, we study whether functional modules emerge in CNNs trained for image classification, which are thought to rely heavily on shared features. Surprisingly, we can identify subsets of weights solely responsible for single classes: when removing these weights the performance on its class drops significantly. By analyzing the resulting confusion matrices, we identify classes relying on similar features.

2. DISCOVERING MODULES VIA WEIGHT-LEVEL INTROSPECTION

To investigate whether functional modules emerge in neural networks one must perform a weight-level analysis. This precludes the use of existing methods, which discover modular structure in NNs based on clustering individual units according to their similarity (Watanabe, 2019; Filan et al., 2020) and that may not always be enough to draw meaningful conclusions. Units can be shared even when their weights, which perform the actual computation, are not. Indeed, units can be viewed as mere "wires" for transmitting information. Consider for example a gated RNN, such as an LSTM, where gates can be controlled either by the inputs or the state, yet make use of different weights to project to the same gating units. To overcome this limitation, we propose a novel method to inspect pre-trained NNs at the level of individual weights. It works as follows. First, we formulate a target task corresponding to the specific function for which we want to investigate if a module has been learned. For example, this can be a subset of the original problem (i.e. a subtask), or based on a particular dataset split, e.g. to test generalization. Next, we train a weight mask on this target task while keeping the weights themselves frozen. The resulting mask then reveals the module (subnetwork) responsible for the target task. To train the mask, we treat all N weights seperately of each other. Let i ∈ [1, N ] to denote the weight index. The mask's probabilities are represented as learned logits l i ∈ R, which are initialized to keep the weights with high probability (0.9). If one were to apply continuous masks to the weights it would be possible to scale them arbitrarily, thereby potentially modifying the function the network performs. To prevent this, we binarize masks, which only provides for keeping or removing individual weights. The binarization is achieved using a Gumbel-Sigmoid with a straight-through estimator, which we derive from the Gumbel-Softmax (Jang et al., 2017; Maddison et al., 2017) in Appendix A.1. A



We emphasize the distinction between the ability to reuse modules and the ability to compose them: a compositional solution may fail to reuse a module to implement the same behavior multiple times. Similarly, weights can be reused without them being composed to yield a compositional solution. Further, we consider specialization of modules a special case of modularization where modules are specialized to implement a particular functionality that is semantically meaningful.

