ACTIVATION RELAXATION: A LOCAL DYNAMICAL APPROXIMATION TO BACKPROP IN THE BRAIN

Abstract

The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural circuitry. The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global signals as in standard backprop. Recently several algorithms for approximating backprop using only local signals have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes, or multiple sequential backwards phases with information being stored across phases. Here, we propose a novel algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, utilises only a single parallel backwards relaxation phase, and can operate on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.



In the last decade, deep artificial neural networks trained through the backpropagation of error algorithm (backprop) (Werbos, 1982; Griewank et al., 1989; Linnainmaa, 1970) have achieved substantial success on a wide range of difficult tasks such as computer vision and object recognition (Krizhevsky et al., 2012; He et al., 2016 ), language modelling (Vaswani et al., 2017; Radford et al., 2019; Brown et al., 2020) , unsupervised representation learning (Radford et al., 2015; Oord et al., 2018) , image and audio generation (Goodfellow et al., 2014; Salimans et al., 2017; Jing et al., 2019; Oord et al., 2016; Dhariwal et al., 2020) and reinforcement learning (Silver et al., 2017; Mnih et al., 2015; Schulman et al., 2017; Schrittwieser et al., 2019) . The impressive performance of backprop is due to the fact that it precisely computes the sensitivity of each parameter to the output (Lillicrap et al., 2020) , thus solving the credit assignment problem which is the task of determining the individual contribution of each parameter (potentially one of billions in a deep neural network) to the global outcome. Given the correct credit assignments, network parameters can be straightforwardly, and independently, updated in the direction which maximally reduces the global loss. The brain also faces a formidable credit assignment problem -it must adjust trillions of synaptic weights, which may be physically and temporally distant from their global output, in order to improve performance on downstream tasksfoot_0 . Given that backprop provides a successful solution to this problem (Baldi & Sadowski, 2016), a large body of work has investigated whether synaptic plasticity in the brain could be interpreted as implementing or approximating backprop (Whittington & Bogacz, 2019; Lillicrap et al., 2020) . Recently, this idea has been buttressed by findings that the representations learnt by backprop align closely with representations extracted from cortical neuroimaging data (Cadieu et al., 2014; Kriegeskorte, 2015) . Due to the nonlocality of its learning rules, a direct term-for-term implementation of backprop is likely biologically implausible (Crick, 1989) . It is important to note that biological-plausibility is an It is unlikely that the brain optimizes a single cost function, as assumed here. However, even if functionally segregated areas can be thought of as optimising some combination of cost functions, the core problem of credit assignment remains.

