ON THE SPECIALIZATION OF NEURAL MODULES

Abstract

A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional architectures which aim to learn specialized modules dedicated to structures in a task that can be composed to solve novel problems with similar structures. While the compositionality of these architectures is guaranteed by design, the modules specializing is not. Here we theoretically study the ability of network modules to specialize to useful structures in a dataset and achieve systematic generalization. To this end we introduce a minimal space of datasets motivated by practical systematic generalization benchmarks. From this space of datasets we present a mathematical definition of systematicity and study the learning dynamics of linear neural modules when solving components of the task. Our results shed light on the difficulty of module specialization, what is required for modules to successfully specialize, and the necessity of modular architectures to achieve systematicity. Finally, we confirm that the theoretical results in our tractable setting generalize to more complex datasets and non-linear architectures.

1. INTRODUCTION

Humans frequently display the ability to systematically generalize, that is, to leverage specific learning experiences in diverse new settings (Lake et al., 2019) . For instance, exploiting the approximate compositionality of natural language, humans can combine a finite set of words or phonemes into a near-infinite set of sentences, words, and meanings. Someone who understands "brown dog" and "black cat" also likely understands "brown cat," to take one example from Szabó (2012). The result is that a human's ability to reason about situations or phenomena extends far beyond their ability to directly experience and learn from all such situations or phenomena. Deep learning techniques have made great strides in tasks like machine translation and language prediction, providing proof of principle that they can succeed in quasi-compositional domains. However, these methods are typically data hungry and the same networks often fail to generalize in even simple settings when training data are scarce (Lake & Baroni, 2018b; Lake et al., 2019). Empirically, the degree of systematicity in deep networks is influenced by many factors. One possibility is that the learning dynamics in a deep network could impart an implicit inductive bias toward systematic structure (Hupkes et al., 2020); however, a number of studies have identified situations where depth alone is insufficient for structured generalization (Pollack, 1990; Niklasson & Sharkey, 1992; Phillips & Wiles, 1993; Lake & Baroni, 2018b; Mittal et al., 2022) . Another significant factor is architectural modularity, which can enable a system to generalize when modules are appropriately configured (Vani et al., 2021; Phillips, 1995) . However, identifying the right modularity through learning remains challenging (Mittal et al., 2022) . In spite of these (and many other) possibilities for improving systematicity (Hupkes et al., 2020) , it remains unclear when standard deep neural networks will exhibit systematic generalization (Dankers et al., 2021) , reflecting a long-standing theoretical debate stretching back to the first wave of connectionist deep networks (Rumelhart & McClelland, 1986; Pollack, 1990; Fodor & Pylyshyn, 1988; Smolensky, 1991; 1990; Hadley, 1993; 1994) . In this work we theoretically study the ability of neural modules to specialize to structures in a dataset. Our goal is to provide a formalism for systematic generalization and to begin to concretize some of

