ADAPTIVE SMOOTHING GRADIENT LEARNING FOR SPIKING NEURAL NETWORKS

Abstract

Spiking neural networks (SNNs) with biologically inspired spatio-temporal dynamics show higher energy efficiency on neuromorphic architectures. Error backpropagation in SNNs is prohibited by the all-or-none nature of spikes. The existing solution circumvents this problem by a relaxation on the gradient calculation using a continuous function with a constant relaxation degree, so-called surrogate gradient learning. Nevertheless, such solution introduces additional smoothness error on spiking firing which leads to the gradients being estimated inaccurately. Thus, how to adjust adaptively the relaxation degree and eliminate smoothness error progressively is crucial. Here, we propose a methodology such that training a prototype neural network will evolve into training an SNN gradually by fusing the learnable relaxation degree into the network with random spike noise. In this way, the network learns adaptively the accurate gradients of loss landscape in SNNs. The theoretical analysis further shows optimization on such a noisy network could be evolved into optimization on the embedded SNN with shared weights progressively. Moreover, we conduct extensive experiments on static images, dynamic event streams, speech, and instrumental sounds. The results show the proposed method achieves state-of-the-art performance across all the datasets with remarkable robustness on different relaxation degrees.

1. INTRODUCTION

Spiking Neural Networks (SNNs), composed of biologically plausible spiking neurons, present high potential for fast inference and low power consumption on neuromorphic architectures (Akopyan et al., 2015; Davies et al., 2018; Pei et al., 2019) . Instead of the expensive multiply-accumulation (MAC) operations presented in ANNs, SNNs operate with binary spikes asynchronously and offer sparse accumulation (AC) operations with lower energy costs. Additionally, existing research has revealed SNNs promise to realize machine intelligence especially on sparse spatio-temporal patterns (Roy et al., 2019) . Nevertheless, such bio-mimicry with the all-or-none firing characteristics of spikes brings inevitably difficulties to supervised learning in SNNs. Error backpropagation is the most promising methodology to develop deep neural networks. However, the nondifferentiable spike firing prohibits the direct application of backpropagation on SNNs. To address this challenge, two families of gradient-based training methods are developed: (1) surrogate gradient learning (Shrestha & Orchard, 2018; Wu et al., 2018; Neftci et al., 2019) and (2) Time-based learning (Mostafa, 2017; Zhang & Li, 2020) . For surrogate gradient learning, it adopts a smooth curve to estimate the ill-defined derivative of the Heaviside function in SNNs. The backpropagation, in this way, could be tractable at both spatial and temporal domains in an iterative manner. Meanwhile, surrogate gradient learning could substantially benefit from the complete ecology of deep learning. It has been widely used to solve complex pattern recognization tasks (Zenke & Vogels, 2021; Neftci et al., 2019) . However, the smooth curve distributes the gradient of a single spike into a group of analog items in temporal neighbors (Zhang & Li, 2020) , which is mismatched with the inherent dynamics of spiking neurons. So we identify the problem as gradient mismatching in this paper. As a result, most parameters are updated in a biased manner in surrogate gradient learning, which limits the performance of SNNs. Besides, different smoothness of surrogate functions may greatly affect the network performance (Hagenaars et al., 2021; Li et al., 2021c) .

