SYNAPTIC DYNAMICS REALIZE FIRST-ORDER ADAP-TIVE LEARNING AND WEIGHT SYMMETRY Anonymous

Abstract

Gradient-based first-order adaptive optimization methods such as the Adam optimizer are prevalent in training artificial networks, achieving the state-of-the-art results. This work attempts to answer the question whether it is viable for biological neural systems to adopt such optimization methods. To this end, we demonstrate a realization of the Adam optimizer using biologically-plausible mechanisms in synapses. The proposed learning rule has clear biological correspondence, runs continuously in time, and achieves performance to comparable Adam's. In addition, we present a new approach, inspired by the predisposition property of synapses observed in neuroscience, to circumvent the biological implausibility of the weight transport problem in backpropagation (BP). With only local information and no separate training phases, this method establishes and maintains weight symmetry in the forward and backward signaling paths, and is applicable to the proposed biologically plausible Adam learning rule. The aforementioned mechanisms may shed light on the way in which biological synaptic dynamics facilitate learning.

1. INTRODUCTION

Gradient-based adaptive optimization is a widely used in many science and engineering applications including training of artificial neural networks (ANNs) (Rumelhart et al., 1986) . In particular, first-order methods are preferred over higher-order methods since their memory overhead is significantly lower, considering ANNs are often characterized by high dimension feature spaces and large numbers of parameters. Among the most well-known ANN training methods are stochastic gradient descent (SGD) with momentum (Rumelhart et al., 1986) , root mean square propagation (RMSProp) (Tieleman & Hinton, 2012) , and adaptive moment estimation (Adam) (Kingma & Ba, 2014) . Different from gradient descent, which optimizes an loss function over the complete dataset, SGD runs on a mini-batch. The momentum term accelerates the adjustment of SGD along the direction to a minima, and RMSProp impedes the search in the direction of oscillation. The Adam optimizer can be considered as the combination of the above two ideas; and it is computationally efficient, has fast convergence, works well with noisy/sparse gradient, and achieves the state-of-the-art results on many AI applications (Dosovitskiy et al., 2020; Wang et al., 2022) . Given the success of gradient-based adaptive optimization techniques, particularly Adam, in training ANNs, it is natural to ask the question whether it is viable for biological neural systems to adopt such optimization strategies. We attempt to answer this question by demonstrating an implementation of the Adam optimizer based on biologically plausible synaptic dynamics and a new solution to the well-known weight transport problem. We call our implementation Bio-Adam. Nevertheless, it is not immediately clear how to realize the Adam optimizer biologically realistically given its intricacies. Comparing to the classical SGD method, Adam has two major new ingredients: use of momentum m to smooth the gradient g of multiple batches, and division by a smooth estimation √ v of the root mean square of the gradient to constrain the step size, which is also known as the RMSProp term 1/( √ v t + ϵ), in which ϵ is a small number to prevent division by zero. Although signal smoothing is commonly done in biological modeling such as by using the leaky-integrate and fire (LIF) model of spiking neurons (Gerstner et al., 2014) , we identify that the root mean square calculation √ v and the existence of the division operator in Adam are biologically problematic. With respect to these difficulties, we define a new variable ρ to mimic the dynamics of RMSProp.

