INTERNEURONS ACCELERATE LEARNING DYNAMICS IN RECURRENT NEURAL NETWORKS FOR STATISTICAL ADAPTATION

Abstract

Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent linear neural networks that statistically whiten their inputs -one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent linear neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear neural networks.

1. INTRODUCTION

Efficient coding and redundancy reduction theories of neural coding hypothesize that early sensory systems decorrelate and normalize neural responses to sensory inputs (Barlow, 1961; Laughlin, 1989; Barlow & Földiák, 1989; Simoncelli & Olshausen, 2001; Carandini & Heeger, 2012; Westrick et al., 2016; Chapochnikov et al., 2021) , operations closely related to statistical whitening of inputs. Since the input statistics are often in flux due to dynamic environments, this calls for early sensory systems that can rapidly adapt (Wark et al., 2007; Whitmire & Stanley, 2016) . Decorrelating neural activities requires recurrent communication between neurons, which is typically indirect and mediated by local interneurons (Christensen et al., 1993; Shepherd et al., 2004) . Why do neuronal circuits for statistical adaptation mediate recurrent communication using interneurons, which take up valuable space and metabolic resources, rather than using direct recurrent connections? A common explanation for why communication between neurons is mediated by interneurons is Dale's principle, which states that each neuron has exclusively inhibitory or excitatory effects on all of its targets (Strata & Harvey, 1999) . While Dale's principle provides a physiological constraint that explains why recurrent interactions are mediated by interneurons, we seek a computational principle that can account for using interneurons rather than direct recurrent connections. This perspective is useful for a couple of reasons. First, perhaps Dale's principle is not a hard constraint; see (Saunders et al., 2015; Granger et al., 2020) for results along these lines. In this case, a computational benefit of interneurons would provide a normative explanation for the existence of interneurons to mediate In this work, to better understand the computational benefits of interneurons for statistical adaptation, we analyze the learning dynamics of two mathematically tractable recurrent neural networks that statistically whiten their inputs using Hebbian/anti-Hebbian learning rules -one with direct recurrent connections and the other with indirect recurrent interactions mediated by interneurons, Figure 1 . We show that the learning dynamics of the network with interneurons are more robust than the learning dynamics of the network with direct recurrent connections. In particular, we prove that the convergence time of the continuum limit of the network with direct lateral connections scales linearly with the spectrum of the initialization, whereas the convergence time of the continuum limit of the network with interneurons scales logarithmically with the spectrum of the initialization. We also numerically test the networks and, consistent with our theoretical results, find that the network with interneurons is more robust to initialization. Our results suggest that interneurons are computationally important for rapid adaptation to fluctuating input statistics. Our analysis is closely related to analyses of learning dynamics in feedforward linear networks trained using backpropagation (Saxe et al., 2014; Arora et al., 2018; Saxe et al., 2019; Gidel et al., 2019; Tarmoun et al., 2021) . The optimization problems for deep linear networks are overparameterizations of linear problems and this overparameterization can accelerate convergence of gradient descent or gradient flow optimization -a phenomenon referred to as implicit acceleration (Arora et al., 2018) . Our results can be viewed as an analogous phenomenon for gradient flows corresponding to recurrent linear networks trained using Hebbian/anti-Hebbian learning rules. In our setting, the network with interneurons is naturally viewed as an overparameterized solution of the whitening objective for the network with direct recurrent connections. In analogy with the feedforward setting, the interneurons can be viewed as a hidden layer that overparameterizes the optimization problem. In summary, our main contribution is a theoretical and numerical analysis of the synaptic dynamics of two linear recurrent neural networks for statistical whitening -one with direct lateral connections and one with indirect lateral connections mediated by interneurons. Our analysis shows that the synaptic dynamics converge significantly faster in the network with interneurons than the network with direct lateral connections (logarithmic versus linear convergence times). Our results have potential broader implications: (i) they suggest biological interneurons may facilitate rapid statistical adaptation, see also (Duong et al., 2023) ; (ii) including interneurons in recurrent neural networks for solving other learning tasks may also accelerate learning; (iii) overparameterized whitening objectives may be useful for developing online self-supervised learning algorithms in machine learning.

2. STATISTICAL WHITENING

Let n ≥ 2 and x 1 , . . . , x T be a sequence of n-dimensional centered inputs with positive definite (empirical) covariance matrix C xx := 1 T XX ⊤ , where X := [x 1 , . . . , x T ] is the n × T data matrix of concatenated inputs. The goal of statistical whitening is to linearly transform the inputs so that the n-dimensional outputs y 1 , . . . , y T have identity covariance; that is, C yy := 1 T YY ⊤ = I n , where Y := [y 1 , . . . , y T ] is the n × T data matrix of concatenated outputs.



Figure 1: Recurrent neural networks for ZCA whitening with direct recurrent connections (left, Algorithm 1) and with interneurons (right, Algorithm 2).

