IMPORTANCE AND COHERENCE: METHODS FOR EVALUATING MODULARITY IN NEURAL NETWORKS Anonymous

Abstract

As deep neural networks become more widely-used, it is important to understand their inner workings. Toward this goal, modular interpretations are appealing because they offer flexible levels of abstraction aside from standard architectural building blocks (e.g., neurons, channels, layers). In this paper, we consider the problem of assessing how functionally interpretable a given partitioning of neurons is. We propose two proxies for this: importance which reflects how crucial sets of neurons are to network performance, and coherence which reflects how consistently their neurons associate with input/output features. To measure these proxies, we develop a set of statistical methods based on techniques that have conventionally been used for the interpretation of individual neurons. We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights. We show that despite our partitioning algorithm using neither activations nor gradients, it reveals clusters with a surprising amount of importance and coherence. Together, these results support the use of modular interpretations, and graph-based partitionings in particular, for interpretability.

1. INTRODUCTION

Deep neural networks have achieved state-of-the-art performance in a variety of applications, but this success contrasts with the challenge of making them more intelligible. As these systems become more advanced and widely-used, there are a number of reasons we may need to understand them more effectively. One reason is to shed light on better ways to build and train them. A second reason is the importance of transparency, especially in settings which involve matters of safety, trust, or justice (Lipton, 2018) . More precisely, we want methods for analyzing a trained network that can be used to construct semantic and faithful descriptions of its inner mechanisms. We refer to this as mechanistic transparency. Toward this goal, we consider modularity as an organizing principle to achieve mechanistic transparency. In the natural sciences, we often try to understand things by taking them apart. Aside from subdivision into the standard architectural building blocks (e.g., neurons, channels, layers), are there other ways a trained neural network be meaningfully "taken apart"? We aim to analyze a network via a partitioning of its neurons into disjoint sets with the hope of finding that these sets are "modules" with distinct functions. Since there are many choices for how to partition a network, we would like metrics for anticipating how meaningful a given partition might be. Inspired by the field of program analysis (Fairley, 1978) , we apply the concepts of "dynamic" and "static" analysis to neural networks. Dynamic analysis includes performing forward passes and/or computing gradients, while static analysis only involves analyzing architecture and parameters. In a concurrent submission (Anonymous et al., 2021) , we use spectral clustering to study the extent to which networks form clusters of neurons that are highly connected internally but not externally and find that in many cases, networks are structurally clusterable. This approach is static because the partitioning is produced according to the network's weights only, using neither activations nor gradients. Here, we build off of this concurrent submission by working to bridge graph-based clusterability and functional modularity. To see how well neurons within each cluster share meaningful similarities, we introduce two proxies: importance and coherence. Importance refers to how crucial clusters are to the network's perfor-mance overall and lends insight into how well a partition identifies clusters that are individually key to the network's function. Coherence refers to how consistently the neurons within a cluster correspond in their activations to particular features in data. We analyze coherence both with respect to input features and output labels. To measure these proxies, we utilize dynamic interpretability methods that have been conventionally used for single-neuron analysis to the study of these partitions. We conduct a set of experiments and hypothesis tests in networks scaling from the MNIST to the ImageNet level. In doing so, we show that spectral clustering is capable of identifying functionally important and coherent clusters of neurons. This new finding the and methods we present for combining spectral clustering with dynamic methods supports the use of modular decompositions of neurons toward mechanistic transparency. Our key contributions are threefold: 1. Introducing two proxies, importance and coherence, to assess whether a given partitioning of a network exhibits modularity. 2. Quantifying these two proxies with interpretability methods equipped with statistical hypothesis testing procedures. 3. Applying our methods on the partitions produced by the spectral clustering technique of Anonymous et al. ( 2021) on a range of networks, and finding evidence of modularity among these clusters.

2. GENERATING PARTITIONINGS WITH SPECTRAL CLUSTERING

In our concurrent submission, we introduce and study in-depth a procedure to partition a neural network into disjoint clusters of neurons (Anonymous et al., 2021) based only on its weights. We found that trained networks are more clusterable than randomly initialized ones, and they are also often more clusterable than similar networks with identical weight distributions. The experimental procedure consists of three steps: (1) "Graphification" -transforming the network into an undirected edge-weighted graph; (2) Spectral clustering -obtaining a partitioning via spectral clustering of the graph. Graphification: To perform spectral clustering, a network must be represented as an undirected graph with non-negative edges. For MLPs (multilayer perceptrons), each graph vertex corresponds to a neuron in the network including input and output neurons. If two neurons have a weight connecting them in the network, their corresponding vertices are connected by an edge giving its absolute value. For CNNs (convolutional neural networks), a vertex corresponds to a single feature map (which we also refer to as a "neuron") in a convolutional layer. Here, we do not use input, output, or fully-connected layers. If two feature maps are in adjacent convolutional layers, their corresponding vertices are connected with an edge giving the L 1 norm for the corresponding 2 dimensional kernel slice. If convolutional layers are separated by a batch normalization layer (Ioffe & Szegedy, 2015) , we multiply weights by γ/(σ + ε) where γ is the scaling factor, σ is the moving standard deviation, and ε is a small constant. Spectral Clustering: We run normalized spectral clustering on the resulting graph (Shi & Malik, 2000) to obtain a partition of the neurons into clusters. For all experiments, we set the number of clusters to 12 unless explicitly mentioned otherwise. We choose 12 because (1) it is computationally tractable, (2) it is larger than the number of classes in MNIST and CIFAR-10, and (3) it is small compared to the number of neurons in the layers of all of our networks. However, in Appendix A.6, we show results for k = 8 and k = 18 for a subset of experiments and find no major differences. We use the scikit-learn implementation (Pedregosa et al., 2011) with the ARPACK eigenvalue solver (Borzì & Borzì, 2006) . Refer to appendix A.1 for a complete description of the algorithm.

3. EVALUATION OF MODULARITY USING IMPORTANCE AND COHERENCE

Clusters of neurons produced by spectral clustering span more than one layer. However, layers at different depths of a network tend to develop different representations. To control for these differences, we study the neurons in clusters separately per layer. We call these sets of neurons within the same cluster and layer "sub-clusters." In our experiments, we compare these sub-clusters to other sets of random units of the same size and same layer. When discussing these experiments, we refer

