NEURAL NETWORKS AS PATHS THROUGH THE SPACE OF REPRESENTATIONS

Abstract

Deep neural networks implement a sequence of layer-by-layer operations that are each relatively easy to understand, but the resulting overall computation is generally difficult to understand. We consider a simple hypothesis for interpreting the layer-by-layer construction of useful representations: perhaps the role of each layer is to reformat information to reduce the "distance" to the desired outputs. With this framework, the layer-wise computation implemented by a deep neural network can be viewed as a path through a high-dimensional representation space. We formalize this intuitive idea of a "path" by leveraging recent advances in metric representational similarity. We extend existing representational distance methods by computing geodesics, angles, and projections of representations, going beyond mere layer distances. We then demonstrate these tools by visualizing and comparing the paths taken by ResNet and VGG architectures on CIFAR-10. We conclude by sketching additional ways that this kind of representational geometry can be used to understand and interpret network training, and to describe novel kinds of similarities between different models.

1. INTRODUCTION

A core design principle of modern neural networks is that they process information serially, progressively transforming inputs until the information is in a format that is immediately usable for some task (Rumelhart et al., 1988; LeCun et al., 2015) . This idea of composing sets of simple units to construct more complicated functions is central to both artificial neural networks and how neuroscientists conceptualize various functions in the brain (Kriegeskorte, 2015; Richards et al., 2019; Barrett et al., 2019) . Our work is motivated by a spatial analogy for information-processing: we imagine that outputs are "far" from inputs if the mapping between them is complex, or "close" if it is simple. In this spatial analogy, any one layer of a neural network contributes a single step, and the composition of many steps transports representations along a path towards the desired target representation. Formalizing this intuition requires a method to quantify if any two representations are "close" (similar) or "far" (dissimilar) (Kriegeskorte, 2009; Kornblith et al., 2019) . In order to use this kind of spatial or geometric analogy for neural representations, we need some way to quantify the "distance" between representations. We build on recent work introducing metrics for quantifying representational dissimilarity (Williams et al., 2021; Shahbazi et al., 2021) . Representational dissimilarity is quantified using a function d(X, Y) : X × X → R + that takes in two matrices of neural data and outputs a nonnegative value for their dissimilarity. Here, X = n=1,2,3,... R m×n is the space of all m × n matrices for all n. The matrices X and Y could be, for instance, the values of two hidden layers in a network with n x and n y units, respectively, in response to m inputs. What are desirable properties of such a representational dissimilarity function? Previous work has argued that any sensible dissimilarity function should be nonnegative, so d(X, Y) ≥ 0, and should return zero between any equivalent representations, so d(X, Y) = 0 ⇔ X ∼ Y, where X ∼ Y means that X and Y are in the same equivalence class. For example, we may wish to design the function d so that d(X, Y) = 0 if Y is a shifted copy of X, or if it is a non-degenerate scaling,

