ARE NEURAL NETS MODULAR? INSPECTING FUNC-TIONAL MODULARITY THROUGH DIFFERENTIABLE WEIGHT MASKS

Abstract

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.

1. INTRODUCTION

Modularity is an important organization principle in both artificial (Ballard, 1987; Baldwin & Clark, 2000) and biological (von Dassow & Munro, 1999; Lorenz et al., 2011; Clune et al., 2013) systems. It provides a natural way of achieving compositionality, which appears essential for systematic generalization, one of the areas where typical artificial neural networks (NNs) do not yet perform well (Fodor et al., 1988; Marcus, 1998; Lake & Baroni, 2018; Hupkes et al., 2020) . Recently, NNs with explicitly designed modules have demonstrated superior generalization capabilities (Clune et al., 2013; Andreas et al., 2016; Kirsch et al., 2018; Chang et al., 2019; Bahdanau et al., 2019; Goyal et al., 2021b) , which support this intuition. An implicit assumption behind such models is that NNs without hand-designed modularity do not learn to become modular by themselves. In contrast, it was recently shown that certain types of modular structures do emerge in standard NNs (Watanabe, 2019; Filan et al., 2020) . However, due to defining modules in terms of activation statistics or clustering connectivity, it remains unclear whether these correspond to a functional decomposition. This paper contributes new insights into the generalization capabilities of popular neural networks by investigating whether modules implementing specific functionality emerge and to what extent they enable compositionality. This calls for a functional definition of modules, which has not previously been studied in prior work. In particular, we consider functional modules given by subsets of weights (i.e. subnetworks) responsible for performing a specific 'target functionality', such as solving a subtask of the original task. By associating modules with performing a specific function they become easier to interpret. Moreover, depending on the chosen target functionality, modules at multiple different levels of granularity can be considered. To unveil whether a NN has learned to acquire functional modules we propose a novel analysis tool that works on pre-trained NNs. Given an auxiliary task corresponding to a particular target function of interest (e.g., train only on a specific subset of the samples from the original dataset), we train probabilistic, binary, but differentiable masks for all weights (while the NN's weights remain frozen). The result is a binary mask exhibiting the module necessary to perform the target function. Our approach is simple yet general, which readily enables us to analyze several popular NN architectures on a variety of tasks in this way, including recurrent NNs (RNNs), Transformers (Vaswani et al., 2017) , feedforward NNs (FNNs) and convolutional NNs (CNNs).

