OPTIMAL CONVERSION OF CONVENTIONAL ARTIFI-CIAL NEURAL NETWORKS TO SPIKING NEURAL NET-WORKS

Abstract

Spiking neural networks (SNNs) are biology-inspired artificial neural networks (ANNs) that comprise of spiking neurons to process asynchronous discrete signals. While more efficient in power consumption and inference speed on the neuromorphic hardware, SNNs are usually difficult to train directly from scratch with spikes due to the discreteness. As an alternative, many efforts have been devoted to converting conventional ANNs into SNNs by copying the weights from ANNs and adjusting the spiking threshold potential of neurons in SNNs. Researchers have designed new SNN architectures and conversion algorithms to diminish the conversion error. However, an effective conversion should address the difference between the SNN and ANN architectures with an efficient approximation of the loss function, which is missing in the field. In this work, we analyze the conversion error by recursive reduction to layer-wise summation and propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and soft-reset mechanisms. This pipeline enables almost no accuracy loss between the converted SNNs and conventional ANNs with only ∼ 1/10 of the typical SNN simulation time. Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory. Codes are available at https://github.com/Jackn0/snn optimal conversion pipeline.

1. INTRODUCTION

Spiking neural networks (SNNs) are proposed to imitate the biological neural networks (Hodgkin & Huxley, 1952a; McCulloch & Pitts, 1943) with artificial neural models that simulate biological neuron activity, such as Hodgkin-Huxley (Hodgkin & Huxley, 1952b) , Izhikevich (Izhikevich, 2003) , and Resonate-and-Fire (Izhikevich, 2001) models. The most widely used neuron model for SNN is the Integrate-and-Fire (IF) model (Barbi et al., 2003; Liu & Wang, 2001) , where a neuron in the network emits a spike only when the accumulated input exceeds the threshold voltage. This setting makes SNNs more similar to biological neural networks. The past two decades have witnessed the success of conventional artificial neural networks (named as ANNs for the ease of comparison with SNNs), especially with the development of convolutional neural networks including AlexNet (Krizhevsky et al., 2012 ), VGG (Simonyan & Zisserman, 2014) and ResNet (He et al., 2016) . However, this success highly depends on the digital transmission of information in high precision and requires a large amount of energy and memory. So the traditional ANNs are infeasible to deploy onto embedded platforms with limited energy and memory.Distinct from conventional ANNs, SNNs are event-driven with spiking signals, thus more efficient in the energy and memory consumption on embedded platforms (Roy et al., 2019) . By far, SNNs have been implemented for image (Acciarito et al., 2017; Diehl & Cook, 2014; Yousefzadeh et al., 2017) and voice (Pei et al., 2019) recognition. Although potentially more efficient, current SNNs have their own intrinsic disadvantages in training due to the discontinuity of spikes. Two promising methods of supervised learning are backpropagation with surrogate gradient and weight conversion from ANNs. The first routine implants ANNs onto SNN platforms by realizing the surrogate gradient with the customized activation function (Wu et al., 2018) . This method can train SNNs with close or even better performance than the conventional ANNs on some small and moderate datasets (Shrestha & Orchard, 2018; Wu et al., 2019; Zhang & Li, 2020; Thiele et al., 2019) . However, the training procedure requires a lot of time and memory and suffers from the difficulty in convergence for large networks such as VGG and ResNet. The second routine is to convert ANNs to SNNs by co-training a source ANN for the target SNN that adopts the IF model and soft-reset mechanism (Rueckauer et al., 2016; Han et al., 2020) . When a neuron spikes, its membrane potential will decrease by the amount of threshold voltage instead of turning into the resting potential of a fixed value. A limitation of this mechanism is that it discretizes the numerical input information equally for the neurons on the same layer ignorant of the variation in activation frequencies for different neurons. As a consequence, some neurons are difficult to transmit information with short simulation sequences. Thus the converted SNNs usually require huge simulation length to achieve high accuracy (Deng et al., 2020) due to the trade-off between simulation length and accuracy (Rueckauer et al., 2017) . This dilemma can be partially relieved by applying the threshold balance on the channel level Kim et al. (2019) and adjusting threshold values according to the input and output frequencies (Han et al., 2020; Han & Roy, 2020) . However, as far as we know, it remains unclear how the gap between ANN and SNN formulates and how the simulation length and voltage threshold affect the conversion loss from layers to the whole network. In addition, the accuracy of converted SNN is not satisfactory when the simulation length is as short as tens. Compared to the previous methods that focus on either optimizing the conversion process or modifying the SNN structures, we theoretically analyze the conversion error from the perspective of activation values and propose a conversion strategy that directly modifies the ReLU activation function in the source ANN to approximate the spiking frequency in the target SNN based on the constructed error form. Our main contributions are summarized as follows: • We theoretically analyze the conversion procedure and derive the conversion loss that can be optimized layer-wisely. • We propose a conversion algorithm that effectively controls the difference of activation values between the source ANN and the target SNN with a much shorter simulation length than existing works. • We both theoretically and experimentally demonstrate the effectiveness of the proposed algorithm and discuss its potential extension to other problems.

2. PRELIMINARIES

Our conversion pipeline exploits the threshold balancing mechanism (Diehl et al., 2015; Sengupta et al., 2018) between ANN and SNN with modified ReLU function on the source ANN to reduce the consequential conversion error (Fig. 1 A ). The modification on regular ReLU function consists of thresholding the maximum activation and shifting the turning point. The thresholding operation suppresses the excessive activation values so that all neurons could be activated within a shorter simulation time. The shift operation compensates the deficit of the output frequency caused by the floor rounding when converting activation values to output frequencies. We here introduce common notations used in the current paper. Since the infrastructures of the source ANN and target SNN are the same, we use the same notation when it is unambiguous. For the l-th layer, we denote W l as the weight matrix. The threshold on activation value added to the ReLU function in the source ANN is y th and the threshold voltage of the spiking function in the target SNN is V th . The SNN is simulated in T time points, where v l (t) = {v l i (t)} is the vector collecting membrane potentials of neurons at time t, and θ l = {θ l i (t)} records the output to the next layer, i.e the post synaptic potential (PSP) released by l-th layer to the (l + 1)-th layer. Suppose that within the whole simulation time T , the l-th layer receives an average input a l and an average PSP a l from the (l -1)-th layer corresponding to the source ANN and target SNN, the forward process

