THE GEOMETRY OF INTEGRATION IN TEXT CLASSIFICATION RNNS

Abstract

Despite the widespread application of recurrent neural networks (RNNs), a unified understanding of how RNNs solve particular tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of text classification, building on earlier work studying the dynamics of binary sentiment-classification networks (Maheswaranathan et al., 2019). We study text-classification tasks beyond the binary case, exploring the dynamics of RNNs trained on both natural and synthetic datasets. These dynamics, which we find to be both interpretable and low-dimensional, share a common mechanism across architectures and datasets: specifically, these text-classification networks use low-dimensional attractor manifolds to accumulate evidence for each class as they process the text. The dimensionality and geometry of the attractor manifold are determined by the structure of the training dataset, with the dimensionality reflecting the number of scalar quantities the network remembers in order to classify. In categorical classification, for example, we show that this dimensionality is one less than the number of classes. Correlations in the dataset, such as those induced by ordering, can further reduce the dimensionality of the attractor manifold; we show how to predict this reduction using simple word-count statistics computed on the training dataset. To the degree that integration of evidence towards a decision is a common computational primitive, this work continues to lay the foundation for using dynamical systems techniques to study the inner workings of RNNs.

1. INTRODUCTION

Modern recurrent neural networks (RNNs) can achieve strong performance in natural language processing (NLP) tasks such as sentiment analysis, document classification, language modeling, and machine translation. However, the inner workings of these networks remain largely mysterious. As RNNs are parameterized dynamical systems tuned to perform specific tasks, a natural way to understand them is to leverage tools from dynamical systems analysis. A challenge inherent to this approach is that the state space of modern RNN architectures-the number of units comprising the hidden state-is often high-dimensional, with layers routinely comprising hundreds of neurons. This dimensionality renders the application of standard representation techniques, such as phase portraits, difficult. Another difficulty arises from the fact that RNNs are monolithic systems trained end-toend. Instead of modular components with clearly delineated responsibilities that can be understood and tested independently, neural networks could learn an intertwined blend of different mechanisms needed to solve a task, making understanding them that much harder. Recent work has shown that modern RNN architectures trained on binary sentiment classification learn low-dimensional, interpretable dynamical systems (Maheswaranathan et al., 2019) . These RNNs were found to implement an integration-like mechanism, moving their hidden states along a line of stable fixed points to keep track of accumulated positive and negative tokens. Later, Maheswaranathan & Sussillo (2020) showed that contextual processing mechanisms in these networkse.g. for handling phrases like not good-build on top of the line-integration mechanism, employing an additional subspace which the network enters upon encountering a modifier word. The understanding achieved in those works suggests the potential of the dynamical systems perspective, but it remained to be seen whether this perspective could shed light on RNNs in more complicated settings. In this work, we take steps towards understanding RNN dynamics in more complicated language tasks, illustrating recurrent network dynamics in multiple text-classification tasks with more than two categories. The tasks we study-document classification, review score prediction (from one to five stars), and emotion tagging-exemplify three distinct types of classification tasks. As in the binary sentiment case, we find integration of evidence to underlie the operations of these networks; however, in multi-class classification, the geometry and dimensionality of the integration manifold depend on the type of task and the structure of the training data. Understanding and precisely characterizing this dependence is the focus of the present work.

Our contributions

• We study three distinct types of text-classification tasks-categorical, ordered, and multilabeled-and find empirically that the resulting hidden state trajectories lie largely in a lowdimensional subspace of the full state space. • Within this low-dimensional subspace, we find a manifold of approximately stable fixed pointsfoot_0 near the network trajectories, and by linearizing the network dynamics, we show that this manifold enables the networks to integrate evidence for each classification as they processes the sequence. • We find (N -1)-dimensional simplex attractorsfoot_1 for N -class categorical classification, planar attractors for ordered classification, and attractors resembling hypercubes for multi-label classification, explaining these geometries in terms of the dataset statistics. • We show that the dimensionality and geometry of the manifold reflects characteristics of the training dataset, and demonstrate that simple word-count statistics of the dataset can explain the observed geometries. • We develop clean, simple synthetic datasets for each type of classification task. Networks trained on these synthetic datasets exhibit similar dynamics and manifold geometries to networks trained on corresponding natural datasets, furthering an understanding of the underlying mechanism. 2019) demonstrated that the continuous-time analogue could express a wide variety of dynamical features, including essentially nonlinear features like limit cycles.

Related work

Understanding and interpreting learned neural networks is a rapidly-growing field. Specifically in the context of natural language processing, the body of work on interpretability of neural models is reviewed thoroughly in Belinkov & Glass (2018) . Common methods of analysis include, for example, training auxiliary classifiers (e.g., part-of-speech) on RNN trajectories to probe the network's



As will be discussed in more detail below, by fixed points we mean hidden state locations that are approximately fixed on time-scales of order of the average phrase length for the task at hand. Throughout this work we will use the term fixed point manifold to be synonymous with manifolds of slow points. A 1-simplex is a line segment, a 2-simplex a triangle, a 3-simplex a tetrahedron, etc. A simplex is regular if it has the highest degree of symmetry (e.g. an equilateral triangle is a regular 2-simplex).



Our work builds directly on previous analyses of binary sentiment classification by Maheswaranathan et al. (2019) and Maheswaranathan & Sussillo (2020). Apart from these works, the dynamical properties of continuous-time RNNs have been extensively studied (Vyas et al., 2020), largely for connections to neural computation in biological systems. Such analyses have recently begun to yield insights on discrete-time RNNs: for example, Schuessler et al. (2020) showed that training continuous-time RNNs on low-dimensional tasks led to low-dimensional updates to the networks' weight matrices; this observation held empirically in binary sentiment LSTMs as well. Similarly, by viewing the discrete-time GRU as a discretization of a continuous-time dynamical system, Jordan et al. (

