BRIDGING THE GAP BETWEEN ANNS AND SNNS BY CALIBRATING OFFSET SPIKES

Abstract

Spiking Neural Networks (SNNs) have attracted great attention due to their distinctive characteristics of low power consumption and temporal information processing. ANN-SNN conversion, as the most commonly used training method for applying SNNs, can ensure that converted SNNs achieve comparable performance to ANNs on large-scale datasets. However, the performance degrades severely under low quantities of time-steps, which hampers the practical applications of SNNs to neuromorphic chips. In this paper, instead of evaluating different conversion errors and then eliminating these errors, we define an offset spike to measure the degree of deviation between actual and desired SNN firing rates. We perform a detailed analysis of offset spike and note that the firing of one additional (or one less) spike is the main cause of conversion errors. Based on this, we propose an optimization strategy based on shifting the initial membrane potential and we theoretically prove the corresponding optimal shifting distance for calibrating the spike. In addition, we also note that our method has a unique iterative property that enables further reduction of conversion errors. The experimental results show that our proposed method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet datasets. For example, we reach a top-1 accuracy of 67.12% on ImageNet when using 6 time-steps. To the best of our knowledge, this is the first time an ANN-SNN conversion has been shown to simultaneously achieve high accuracy and ultralow latency on complex datasets.

1. INTRODUCTION

Acclaimed as the third generation of Artificial Neural Networks (Maass, 1997) , Spiking Neural Networks (SNNs) have brought brand-new inspiration to computational neuroscience. As the corresponding neuron fires spikes only when the current membrane potential exceeds the firing threshold, SNNs have the distinctive characteristics of binary output, high sparsity, and biological plausibility. Therefore, compared with traditional ANN models, SNNs can further improve computational efficiency and reduce power consumption, which facilitates their remarkable superiority in the application of neuromorphic chips (Merolla et al., 2014; Davies et al., 2018; DeBole et al., 2019) . Considering that an effective learning algorithm has not yet been found for SNNs, ANN-SNN conversion and backpropagation through time (BPTT) are still the two most commonly applied training methods. Compared with BPTT, ANN-SNN conversion provides a way around the nondifferentiable problem in the direct training procedure for SNNs and thus reduces the overall training complexity. The aim in ANN-SNN conversion is to establish the mapping relationship between the activation output and the average firing rate. Traditional conversion methods exploit larger time-steps to overcome conversion errors and thus achieve high performance (Diehl et al., 2015) . Many of the following works have attempted to optimize the performance from multiple perspectives, including using the soft-reset mechanism (Han et al., 2020) , proposing more adaptive activation functions (Ho & Chang, 2021; Bu et al., 2022b) , adopting a trainable threshold (Sengupta et al., 2019; Ding et al., 2021; Bu et al., 2022a) , etc. However, these strategies cannot effectively eliminate the errors caused by the deviation between the actual and desired firing rates, especially when the number of timesteps is small. Some recent works explore compensating for the errors by introducing burst spikes (Li & Zeng, 2022) and signed spiking neurons (Li et al., 2022) . Unlike these works, our paper attempts to eliminate the errors with vanilla spiking neurons and answer the question of how to improve the performance of a converted SNN and possibly approach the upper bound performance. In this paper, we observe and identify the source of conversion errors and propose an iterative optimization method based on shifting the initial membrane potential, which can fulfil accurate mapping between ANNs and SNNs under ideal situations. Our main contributions are summarized as follows: 1 We introduce the concept of offset spike to infer the deviation between the actual SNN firing rate and the desired SNN firing rate. We note that cases of firing one additional (or one less) spike are the main reason cause of conversion errors. 2 We propose a method to judge offset spike based on the residual membrane potential and an optimization method to eliminate conversion errors by shifting the initial membrane potential up or down. We derive the optimal shifting distance and prove that one spike can be increased or decreased under this condition. 3 We evaluate our methods on CIFAR-10/100 and ImageNet datasets. The proposed method outperforms the existing state-of-the-art ANN-SNN conversion methods using fewer timesteps. For example, we achieve 67.12% top-1 accuracy on ImageNet with only 6 timesteps (4 time-steps for calibration and 2 time-steps for inference). Moreover, it is worth noting that we have reached the same level of performance as BPTT under the condition of significantly reduced memory and computing resources requirements. 4 We discover that our proposed method has an iterative property. Under ideal circumstances, the deviation within the range of k spikes will be eliminated entirely after adopting our approach k times. After 4 iterations, the mean-square error between the actual and desired firing rates of the output layer can reach 0.001 for the VGG-16 model on CIFAR-100.

2. RELATED WORKS

The principle of ANN-SNN conversion is to map the parameters from pretrained ANN models to SNNs, which avoids training SNNs directly and reduces energy consumption significantly. The primary goal is to match the ANN activation value and the average SNN firing rate. Cao et al. (2015) were the research pioneers in this field, and they replaced ReLU activation layers in ANNs with spiking neurons to fulfil the conversion procedure. Ho & Chang (2021); Bu et al. (2022b) proposed new activation functions, which better fit the finiteness and discreteness of the spike firing rate. Rueckauer et al. (2017) ; Han et al. (2020) adopted "reset-by-subtraction" mechanism, which alleviated the problem of information loss and effectively improved the precision of the conversion process. For the setting of firing threshold, various strategies have been proposed, including Ro-bustNorm (Rueckauer et al., 2017) , SpikeNorm (Sengupta et al., 2019) , and adjustable threshold (Han et al., 2020; Ding et al., 2021; Ho & Chang, 2021; Bu et al., 2022a; b) , etc. Recently, spiking neural networks with high accuracy and low latency have become the focus and target of academic research. To reduce the time latency of the network, one must carefully address the exact spiking time of neurons. Deng & Gu (2021) ; Li et al. (2021) fine-tuned the bias in each layer under the uniform current assumption. Nevertheless, the actual current would never be distributed uniformly. In terms of expectation, Bu et al. (2022b) proved that one-half of the threshold is the optimal value for the initial membrane potential, and that charging at this value can prompt neurons to spike more uniformly. However, as the authors pointed out: there is still a mismatch between ANN and SNN due to the so called "unevenness error". In addition, other methods like burst spikes (Li & Zeng, 2022) and signed spiking neurons (Wang et al., 2022a; Li et al., 2022) , have also been introduced to further improve performance. These efforts have aimed to alleviate the conversion loss. However, they undermined the biological plausibility and binary property of spiking neurons. In addition to ANN-SNN conversion, backpropagation with exact spike time is another common way to train SNNs. The surrogate gradient (O'Connor et al., 2018; Zenke & Ganguli, 2018; Bellec et al., 2018; Wu et al., 2018; 2019; Kim & Panda, 2020; Zenke & Vogels, 2021) (Rathi et al., 2020) . These works alter weights when training and stress the importance of spike timing, which is usually ignored in conversion methods. Inspired by these approaches, we incorporate the concept of calibration spike timing by manipulating membrane potentials into the conversion pipeline to bridge the gap between ANNs and SNNs.

3.1. NEURON MODELS

For ANNs, the input a l-1 to layer l is mapped to the output a l by a linear transformation matrix W l and a nonlinear activation function f (•), that is (l = 1, 2, 3, • • • , L): a l = f (W l a l-1 ). (1) where f (•) is often set as the ReLU activation function. For SNNs, we adopt Integrate-and-Fire (IF) Neuron model (Gerstner & Kistler, 2002) , which is similar to the approach reported in previous works (Cao et al., 2015; Diehl et al., 2015) . To minimize information loss during inference, our neurons perform "reset-by-subtraction" mechanism (Han et al., 2020) , which means that the firing threshold θ l is subtracted from the membrane potential after firing. The overall kinetic equations of IF Neuron can be expressed as follows: v l (t) = v l (t -1) + I l (t) -s l (t)θ l , I l (t) = W l s l-1 (t)θ l-1 . Here v l (t) and I l (t) denote the membrane potential and input current of layer l at the t-th time-step, respectively. W l is the synaptic weight between layer l -1 and layer l, and θ l is the spike firing threshold in the l-th layer. s l (t) represents whether the spike fires at time-step t. For the i-th neuron, if the current potential exceeds the firing threshold θ l , the neuron will emit a spike. This firing rule can be described by the equation below. s l i (t) = 1, v l i (t -1) + I l i (t) ⩾ θ l 0, v l i (t -1) + I l i (t) < θ l . If not otherwise specified, the subscript x i denotes the i-th element of x.

3.2. ANN-SNN CONVERSION

The main principle of ANN-SNN conversion is to map the firing rates (or postsynaptic potential) of spiking neurons to the ReLU activation output of artificial neurons. Specifically, by summing equation 2 from t = 1 to t = T , and then substituting variable I l (t) with W l s l-1 (t)θ l-1 using equation 3, and finally dividing T on both sides, we obtain the following equation: T t=1 s l (t)θ l T = W l T t=1 s l-1 (t)θ l-1 T + - v l (T ) -v l (0) T . ( ) where T denotes the total simulation cycle. For simplicity, we use the average postsynaptic potential ϕ l (T ) as a substitute for the term T t=1 s l (t)θ l /T in equation 5, then we obtain ϕ l (T ) = W l ϕ l-1 (T ) + - v l (T ) -v l (0) T . ( ) /D\HU 3HUFHQWDJH ψ l = ±1ZRFRQVWUDLQW ψ l = ±2ZRFRQVWUDLQW ψ l = ±3ZRFRQVWUDLQW (a) /D\HU 3HUFHQWDJH Equation 6 can be approximated by a linear transformation between ϕ l (T ) and ϕ l-1 (T ) as T tends to infinity, which is exactly the same as the forward propagation (equation 1) in ANNs due to ϕ l (T ) ⩾ 0. This result implies that we can achieve lossless ANN-SNN conversion when T tends to infinity. However, the performance of converted SNNs degrades seriously under the condition of short time-steps T (Rueckauer et al., 2017; Han et al., 2020) . To achieve high-performance SNNs under low latency, Bu et al. (2022b) proposed replacing the commonly used ReLU activation function of source ANNs with the quantization clip-floor-shift (QCFS) function: ψ l = ±1ZRFRQVWUDLQW ψ l = ±2ZRFRQVWUDLQW ψ l = ±3ZRFRQVWUDLQW (b) /D\HU 3HUFHQWDJH ψ l = ±1ZFRQVWUDLQW ψ l = ±2ZFRQVWUDLQW ψ l = ±3ZFRQVWUDLQW (c) /D\HU 3HUFHQWDJH ψ l = ±1ZFRQVWUDLQW ψ l = ±2ZFRQVWUDLQW ψ l = ±3ZFRQVWUDLQW (d) a l = f (a l-1 ) = λ l L clip W l a l-1 L λ l + 1 2 , 0, L . where L denotes the ANN quantization step and λ l is the trainable threshold of the outputs in ANN layer l, which is mapped to the threshold θ l in SNN layer l. This paper follows the conversion framework of Bu et al. (2022b) with QCFS function.

4. METHODS

In this section, we first compare the outputs of ANNs and converted SNNs in each layer. We introduce offset spike to measure the degree of deviation between the actual firing rate and the desired firing rate in SNNs. Then, we demonstrate that the offset spike of being one accounts for the main part in each layer and is the main reason of conversion error. Based on this, we propose sufficient conditions to determine if offset spike exists and what the sign of offset spike value is, and we present a spike calibration strategy to eliminate conversion errors through shifting the initial membrane potential.

4.1. OFFSET SPIKE AND ITS DISTRIBUTION

ANN-SNN conversion errors can be divided into clipping error, quantization error (flooring error), and unevenness error (deviation error) (Bu et al., 2022b) . In previous works (Han et al., 2020; Li et al., 2021; Meng et al., 2022b) , those errors are eliminated (or reduced) separately, and thus far no method to eliminate the unevenness error (deviation error) has been identified. Since we find that the essential cause of most conversion errors comes from the remaining term -v l (T )-v l (0) T in equation 5, we consider reducing conversion errors directly based on the prior knowledge of the remaining term. To measure the degree of deviation between the actual firing rate and the desired firing rate, we first introduce the definition of offset spike. Definition 1. We define OFFSET SPIKE ψ l of layer l as the difference between the desired total spike count C l designed and the actual spike count C l actual during the interval [0, T ], that is ψ l = C l designed -C l actual = a l T θ l - T t=1 s l (t). ( ) where we set the maximum value λ l of output a l in ANNs equal to the threshold θ l in SNNs, that is, λ l = θ l . Thus, a l θ l denotes the normalized output in ANNs, which is mapped to the firing rates of 𝑣 𝑙 𝜃 𝑙 0 Before shifting After shifting shift distance = 𝜃 𝑙 0 1 2 3 4 5 6 Time-step (a) shift distance = min 𝑣 𝑙 𝑡 |𝑠 𝑙 𝑡 = 1 + 𝜖 0 1 2 3 4 5 6 Time-step 𝑣 𝑙 𝜃 𝑙 0 (b) shift distance = 𝜃 𝑙 0 1 2 3 4 5 6 Time-step 𝑣 𝑙 𝜃 𝑙 0 (c) shift distance = 𝜃 𝑙 + 𝜖 -max 𝑣 𝑙 𝑡 |𝑠 𝑙 𝑡 = 0 0 1 2 3 4 5 6 Time-step 𝑣 𝑙 𝜃 𝑙 0 (d) Figure 2 : Shifting up (down) the initial membrane potential can increase (decrease) one output spike. SNNs, and C l designed = a l T θ l denotes the desired total spike count. Note that ψ l i = ±k indicates that the gap between the actual and desired firing rate of the i-th neuron in layer l of the SNN is k spikes. We further investigate the detailed ANN and SNN outputs in each layer. We train the source ANN with the QCFS activation function (equation 7) and then convert it to an SNN (more details are in the Appendix). Fig. 1(a) -1(b) illustrates the distribution of offset spike for the converted SNNs with VGG-16 structure on CIFAR-10 and CIFAR-100, respectively. We have the following observation. Observation 1. ψ l = ±1 accounts for the main part in each layer and ψ l = ±3 rarely occurs. Considering the cumulative effect of conversion errors in the deep layer, the offset spike ψ l in layer l can be considered as the joint effects of the offset spike ψ l-1 in layer l -1 and conversion errors in layer l, and tends to increase with the increase in the number of layers. For a deeper analysis of the offset spike in each layer, we rectify ANN output in layer l -1 to make a l-1 = ϕ l-1 (T ) and ψ l-1 = 0 (Sec. A.1 for more details of the constraint), and then compute the offset spike ψ l in layer l. After the rectification for each layer, the distribution of offset spike for the converted SNNs with VGG-16 structure on CIFAR-10 and CIFAR-100 are shown in Fig. 1(c )-1(d), respectively. We have the following observation. Observation 2. With constraint, ψ l = ±1 accounts for the main part in each layer and |ψ l | > 1 rarely occurs. Observations 1-2 show that the firing of one additional (or one less) spike is the main cause of conversion errors, which implies that we can eliminate errors after adjusting T t=1 s l (t) with ±1.

4.2. JUDGE CONVERSION ERRORS THROUGH RESIDUAL MEMBRANE POTENTIAL

Before we propose the optimal strategy for adjusting the output spikes to eliminate the offset spike ψ l , we need to determine if the offset spike exists and what the sign of the offset spike value is. If the sign of the offset spike is positive, which corresponds to the situation in which the ANN output is larger than the SNN output, the spiking neurons should fire more spikes to eliminate the offset spike, otherwise, they should fire fewer spikes. In the practical application of SNNs, we cannot directly obtain the specific value of the offset spike ψ l . Fortunately, we find that we can determine the sign of ψ l according to the value of the residual membrane potential v l (T ). We have the following theorem: Theorem 1. Suppose that an ANN with QCFS activation function (equation 7) is converted to an SNN with L = T, λ l = θ l , v l (0) = θ l /2, and the inputs to the l-th layer of the ANN and the SNN are the same, that is, a l-1 = ϕ l-1 (T ). Then for any i-th element of the l-th layer, we can draw the following conclusions: (i) If ϕ l i (T ) > 0 and v l i (T ) < 0, we will have ϕ l i (T ) > a l i and ψ l i < 0. (ii) If ϕ l i (T ) < θ l and v l i (T ) ⩾ θ l , we will have ϕ l i (T ) < a l i and ψ l i > 0. The proof is provided in Appendix. (i) implies that if the postsynaptic potential is larger than 0 and the residual membrane potential is smaller than 0, we can conclude that the neuron fires more spikes than expected and the sign of the offset spike value is negative. (ii) implies that if the postsynaptic potential is smaller than 0 and the residual potential is larger than θ, we can conclude that the spiking neuron fires fewer spikes than expected and the sign of the offset spike value is positive.

4.3. ELIMINATE CONVERSION ERROR THROUGH SHIFTING INITIAL MEMBRANE POTENTIAL

Since the firing of one additional (or one less) spike (ψ l = ±1) is the main cause of conversion errors, we propose an optimization strategy to rectify the output spike T t=1 s l (t) by adding or subtracting one spike, thereby eliminating errors. Specifically, we consider adjusting the value of T t=1 s l (t) by shifting the corresponding initial membrane potential v l (0) up or down. One intuitive explanation is that a higher initial membrane potential will make the spiking neurons fire earlier and will increase the firing rates during the period [0, T ], while a lower initial membrane potential will make the spiking neurons fire later and will decrease the firing rates. The following theorem gives the optimal shifting distance when we attempt to move T t=1 s l (t) (and ψ l ) by a distance of ±1. Theorem 2. If we use s l i (t) and s l i (t) to denote the binary spike of the i-th neuron in layer l at timestep t before and after optimization, v l i (0) and v l i (0) to represent the initial membrane potential before and after optimization, then ∀ϵ ∈ (0, θ l ), we will have the following conclusions: (i) If we set v l i (0) = v l i (0) -max(θ l , min{v l i (t)|s l i (t) = 1} + ϵ), then T t=1 s l i (t) = T t=1 s l i (t) -1. (ii) If we set v l i (0) = v l i (0) + max(θ l , θ l + ϵ -max{v l i (t)|s l i (t) = 0}), then T t=1 s l i (t) = T t=1 s l i (t) + 1. The proof is provided in Appendix. Note that the variable ϵ illustrates that as long as the initial membrane potential is within a certain range, the number of output spikes can be guaranteed to increase or decrease by 1. Example 1. Fig. 2 shows four different scenarios before and after shifting v l (0) that verify the effectiveness of our theorem. Specifically, Fig. 2 By combining Theorems 1 and 2, we propose the complete spike calibration algorithm. Our method can be divided into two stages. First, for l-th layer, we spend ρ time-steps to determine the specific spike firing situation. According to Theorem 1, if v l i (ρ) < 0 (or v l i (ρ) ⩾ θ l ), by combining ϕ l i (ρ), we can infer that ϕ l i (ρ) is actually larger (or smaller) than the expected average postsynaptic potential a l i , and ψ l i < 0 (or ψ l i > 0). In addition, we will preserve the membrane potential after each time-step, which will be used to calculate the subsequent optimal shifting distance. In the second stage, based on Theorem 2, we will calculate the optimal shifting distance of the initial membrane potential for specific neurons with conversion errors. Generally speaking, if ψ l i < 0, we will shift its initial membrane potential down by calculating (i) from Theorem 2, if ψ l i > 0, we will shift its initial membrane potential up by adopting (ii) from Theorem 2. After optimizing the initial membrane potential, we will spend T time-steps implementing the test on corresponding datasets and deliver the output to the l + 1-th layer.

4.4. ITERATIVE PROPERTY OF OUR OPTIMIZATION METHOD

In the previous section, we show that shifting the initial membrane potential up (or down) can change a case of ψ l = ±1 to ψ l = 0. In fact, our method also converts the case of ψ l = ±k to ψ l = ±(k -1). As long as the offset spike ψ l is not zero, the performance of converted SNNs will degrade. One important problem to address is whether we can further eliminate the offset spike in situations where ψ l ⩾ 2. Fortunately, we find that our optimization method has an iterative property. One can reuse Theorem /D\HU 3HUFHQWDJH 2009) and ImageNet (Deng et al., 2009) datasets. The network architectures selected for evaluation include VGG-16 (Simonyan & Zisserman, 2014) , ResNet-18, ResNet-20 and ResNet-34 (He et al., 2016) . For the setting of the hyperparameter ρ, we set ρ = 4 for CIFAR-10/100 and ρ = 8 for ImageNet if there are no special instructions. More details of the experimental settings are provided in the Appendix. ψ l = ±1ZRVKLIW ψ l = ±2ZRVKLIW ψ l = ±3ZRVKLIW ψ l = ±1ZVKLIW ψ l = ±2ZVKLIW ψ l = ±3ZVKLIW (a) CIFAR-10,VGG-16 /D\HU 3HUFHQWDJH ψ l = ±1ZRVKLIW ψ l = ±2ZRVKLIW ψ l = ±3ZRVKLIW ψ l = ±1ZVKLIW ψ l = ±2ZVKLIW ψ l = ±3ZVKLIW (b) CIFAR-100,VGG-16

5.1. EFFECTIVENESS OF THE PROPOSED METHOD

To illustrate the effectiveness of our proposed initial membrane potential shifting operations, we compare bar charts of offset spike ψ l in each layer of SNNs before and after the shift. Fig. 3 illustrated the results of VGG-16 networks on the CIFAR-10 and CIFAR-100 datasets. It can be observed that the shifting operations significantly reduce offset spike, that is, the deviation between ϕ l (T ) and a l , for each layer. For vanilla settings without the shift operation (denoted as "w/o shift" in Fig. 3 ), one can discover a magnification effect of spike count error from the 1st to the 11th layer. In contrast, the apparent magnification is alleviated with the proposed methods. From Fig. 3 (b), we notice that the ±2 and ±3 offset spike in the VGG-16 model for CIFAR-100 have increased compared to those for CIFAR-10 (Fig. 3(a) ). Using our method can clearly decrease these deviations and achieve a comparable error-free conversion.

5.2. COMPARISON WITH STATE-OF-THE-ART METHODS

We compare our methods with previous state-of-the-art ANN-SNN conversion works, including RMP (Han et al., 2020) , SNM (Wang et al., 2022a) , SNNC-AP (Li et al., 2021) , OPI (Bu et al., 2022a) , QCFS (Bu et al., 2022b) , on CIFAR-10, CIFAR-100 and ImageNet datasets. Since we spend ρ time-steps in the first stage to acquire relevant temporal information about membrane potential, we will compare the performance of other works at time-step T + ρ with our performance at time-step T to ensure the fairness of comparison. Tab. 1 reports the results on the CIFAR-100 dataset. For VGG-16, our method at time-step 1 (ρ = 4) outperforms SNM and SNNC-AP at time-step 32. Moreover, we achieve 76.26% top-1 accuracy with 4 time-steps (ρ = 4), which is 2.30% higher than QCFS (73.96%, T=8) and 15.77% higher than OPI (60.49%, T=8). For ResNet-20, the performance of our method at time-step 1 (ρ = 4) surpasses the performance of RMP at time-step 32 (59.22% vs. 27.64%). The accuracy of our method is 65.18% at time-step 4 (ρ = 4), whereas accuracies of QCFS and OPI are 55.37% and 23.09% at time-step 8, respectively. More results on CIFAR-10 are listed in the Appendix. We further test the generalization of our method on the ImageNet (Tab. 1). For VGG-16, we achieve 73.82% top-1 accuracy at time-step 8 (ρ = 8), which outperforms QCFS (50.97%, T=16) by 22.85% and OPI (36.02%, T=16) by 37.80%. For ResNet-34, our method at time-step 1 (ρ = 8) outperforms SNM and SNNC-AP at time-step 32. Moreover, we achieve 74.17% with 8 time-steps (ρ = 8), which is 14.82% higher than QCFS (59.35%, T=16). These results show that our method can achieve better classification accuracy with fewer time-steps. In addition, we compare our method with other types of SNN training methods (Hybrid Training & BPTT), including Dual-Phase (Wang et al., 2022b) , Diet-SNN (Rathi & Roy, 2021) , RecDis-SNN (Guo et al., 2022) , HC-STDB (Rathi et al., 2020) , STBP-tdBN (Zheng et al., 2021) , PLIF (Fang et al., 2021) , TET (Deng et al., 2022) and DSR (Meng et al., 2022a) . Here we set ρ = 4 for the CIFAR-100 and ImageNet datasets. As reported in Tab. 2, our method achieves better accuracy on the CIFAR-100 dataset and comparable accuracy on the ImageNet dataset with the same quantity of time-steps. Note that compared to ANN-SNN conversion, the back-propagation approaches need to propagate the gradient through both spatial and temporal domains during the training process, /D\HU PHDQ||ψ l /D\HU 3HUFHQWDJH 7LPHVWHS $FFXUDF\ ρ = 1 ρ = 2 ρ = 4 ρ = 8 (a) 7LPHVWHS $FFXUDF\ ρ = 2 ρ = 4 ρ = 8 ρ = 16 (b) 7LPHVWHS $FFXUDF\ ρ = 4 ρ = 8 ρ = 16 ρ = 32 (c) 7LPHVWHS $FFXUDF\ ρ = 2 ρ = 4 ρ = 8 ρ = 16 (d) || 2 2 EDVHOLQH VKLIW×1 VKLIW×2 (a) /D\HU PHDQ||ψ l || 2 2 EDVHOLQH VKLIW×1 VKLIW×2 (b) /D\HU PHDQ||ψ l || 2 2 EDVHOLQH VKLIW×1 VKLIW×2 (c) /D\HU PHDQ||ψ l || 2 2 EDVHOLQH VKLIW×2 VKLIW×4 (d) ψ l = ±1VKLIW×0 ψ l = ±2VKLIW×0 ψ l = ±3VKLIW×0 ψ l = ±1VKLIW×1 ψ l = ±2VKLIW×1 ψ l = ±3VKLIW×1 ψ l = ±1VKLIW×2 ψ l = ±2VKLIW×2 ψ l = ±3VKLIW×2 (a) ResNet-20 on CIFAR-10 /D\HU 3HUFHQWDJH ψ l = ±1VKLIW×0 ψ l = ±2VKLIW×0 ψ l = ±3VKLIW×0 ψ l = ±1VKLIW×2 ψ l = ±2VKLIW×2 ψ l = ±3VKLIW×2 ψ l = ±1VKLIW×4 ψ l = ±2VKLIW×4 ψ l = ±3VKLIW×4 (b) ResNet-20 on CIFAR-100 Figure 6 : The distribution of offset spike after using iterative optimization which consumes large amounts of memory and computing resources. All these results demonstrate the superiority of our method.

5.3. EFFECT OF THE INFERENCE TIME-STEP ρ

We further explore the influence of the hyperparameter ρ in the first stage of our method. Fig. 4 shows the accuracy of the network with different values of ρ. For Fig. 4 (a)-4(d), the value of quantization level L in QCFS function is set to 4, 8, 16 and 8. We find that the SNN accuracy tends to converge as ρ gradually approaches L. This phenomenon can be understood as follows. We use the QCFS activation function in the source ANN and we have a l ∈ {kθ l /L|k = 0, 1, ..., L} and ϕ l (T ) ∈ {kθ l /ρ|k = 0, 1, ..., ρ}. Thus, the mapping relationship between a l and ϕ l (T ) will become more accurate when ρ approaches L, which makes the temporal information obtained from the first stage more precise to improve the performance of the network.

5.4. EFFECT OF THE ITERATIVE OPTIMIZATION

In section 4.4, we explain that our method has an iterative property that can reduce the offset spike through multiple iterations. To demonstrate this, we define the ratio as the percentage of a l i = ϕ l i (T ) in the output layer. Despite this, we also consider the indicators of mean-square error (MSE), which is defined as ||ψ l || 2 2 . Tab. 3 reports the ratio and MSE of the output layer, in which the baseline denotes the performance without using our methods and ×2 represents two iterations. Besides, we set ρ = L = T . From top to bottom in Tab. 3, the values of L are set to 4, 4, 4 and 8. From Tab. 3 and Fig. 5 , we can conclude that, generally, the ratio and MSE in each layer continue to decrease as the number of iterations increases, which is consistent with the results shown in Fig. 6 .

6. CONCLUSIONS

In this paper, we first define offset spike to measure the degree of deviation between the actual and desired SNN firing rates. Then we analyse the distribution of offset spike and demonstrate that we can infer the specific value of the deviation according to the corresponding residual membrane potential. Furthermore, we propose an optimization method to eliminate offset spike by shifting the initial membrane potential up and down. Finally, we demonstrate the superiority of our method on CIFAR-10/100 and ImageNet datasets. Our results will further facilitate the relevant research and application of SNNs to neuromorphic chips.

A APPENDIX A.1 THE NETWORK CONFIGURATION IN THE TRAINING PROCEDURE

We choose Stochastic Gradient Descent optimizer (Bottou, 2012) and Cosine Annealing scheduler (Loshchilov & Hutter, 2017) to train ANN models for 300 epochs. For CIFAR-10/100, the value of weight decay is set to 5 × 10 -4 , and the initial learning rates are 0.1 and 0.02, respectively. For ImageNet, we set the initial learning rate as 0.1 and weight decay as 1 × 10 -4 . In addition, we adopt data-augmentation techniques (DeVries & Taylor, 2017; Cubuk et al., 2019; Li et al., 2021) to further improve the performance of the models. In Fig. 1 , we train the source ANN with the QCFS activation function (equation 7) and then convert it to an SNN. We set T = L = 4. For Figure 1(c )-1(d), we add the constraint that the output a l-1 in layer l -1 of ANNs is the same as the output ϕ l-1 (T ) of SNNs, that is, a l-1 = ϕ l-1 (T ), and compute the offset spike with equation 8 and a l = f (W l ϕ l-1 (T )).

A.2 PROOF OF THEOREM

Theorem 1. Supposing that an ANN with QCFS activation function (equation 7) is converted to an SNN with L = T, λ l = θ l , v l (0) = θ l /2, and the inputs to the l-th layer of ANN and SNN are the same, that is, a l-1 = ϕ l-1 (T ). Then for any i-th element of the l-th layer, we will have the following conclusions: (i) If ϕ l i (T ) > 0 and v l i (T ) < 0, we will have ϕ l i (T ) > a l i and ψ l i < 0. (ii) If ϕ l i (T ) < θ l and v l i (T ) ⩾ θ l , we will have ϕ l i (T ) < a l i and ψ l i > 0. Proof. According to the preconditions and equation 5, we have: ϕ l i (T ) = T t=1 I l i (t) T - v l i (T ) -θ l /2 T . (S1) If T t=1 I l i (t) ∈ [-θ l /2, θ l T + θ l /2 ), based on the preconditions and equation 7, we get: a l i = θ l T       T t=1 I l i (t) θ l + 1 2       . (S2) When T t=1 I l i (t) ∈ [kθ l -θ l /2, kθ l + θ l /2 ), k = 0, 1, ..., T , from equation S2 we will have a l i = kθ l /T . For (i), by combining v l i (T ) < 0 and equation S1 we will have: ϕ l i (T ) = T t=1 I l i (t) T - v l i (T ) -θ l /2 T > T t=1 I l i (t)/T + θ l /2T ⩾ kθ l /T = a l i . (S3) When T t=1 I l i (t) < -θ l /2, a l i = 0, according to the precondition ϕ l i (T ) > 0, we have ϕ l i (T ) > a l i . In addition, if T t=1 I l i (t) ⩾ θ l T + θ l /2, according to v l i (T ) < 0 and equation S1, ϕ l i (T ) > θ l , which is impossible. Therefore, we can derive that ϕ l i (T ) > a l i , ψ l i = (a l i -ϕ l i (T ))T /θ l < 0 and we have already proved (i). Published as a conference paper at ICLR 2023 For (ii), we also first consider T t=1 I l i (t) ∈ [-θ l /2, θ l T + θ l /2). When T t=1 I l i (t) ∈ [kθ l -θ l /2, kθ l + θ l /2), k = 0, 1, ..., T , from equation S2 we will have a l i = kθ l /T , by combining v l i (T ) ⩾ θ l and equation S1, we will have: ϕ l i (T ) = T t=1 I l i (t) T - v l i (T ) -θ l /2 T ⩽ T t=1 I l i (t)/T -θ l /2T. < kθ l /T = a l i (S4) When T t=1 I l i (t) ⩾ θ l T + θ l /2, a l i = θ l , according to the precondition ϕ l i (T ) < θ l , we have ϕ l i (T ) < a l i . In addition, if T t=1 I l i (t) < -θ l /2, according to v l i (T ) ⩾ θ l and equation S1, ϕ l i (T ) < 0, which is impossible. Therefore, we can derive that ϕ l i (T ) < a l i , ψ l i = (a l i -ϕ l i (T ))T /θ l > 0 and we have proved (ii). Theorem 2. If we use s l i (t) and s l i (t) to denote the i-th element in the binary output of the l-th layer at time-step t before and after optimization, v l i (0) and v l i (0) to represent the initial membrane potential before and after optimization, then ∀ϵ ∈ (0, θ l ), we will have the following conclusions: (i) If we set v l i (0) = v l i (0) -max θ l , min {v l i (t)|s l i (t) = 1} + ϵ , then T t=1 s l i (t) = T t=1 s l i (t) -1. (ii) If we set v l i (0) = v l i (0)+max θ l , θ l + ϵ -max {v l i (t)|s l i (t) = 0} , then T t=1 s l i (t) = T t=1 s l i (t)+ 1. We use m l i (t), m l i (t) to represent the accumulative potential at time-step t before and after using optimization. Before the proof of theorem, we firstly introduce Lemma 1. Lemma 1. For situation (i) in Theorem 2, ∃t ∈ [1, T ], t k=1 s l i (k) = t k=1 s l i (k) + 1. For situation (ii) in Theorem 2, ∃t ∈ [1, T ], t k=1 s l i (k) = t k=1 s l i (k) -1. Proof. For situation (i) in Theorem 2, we use t o to denote the specific time when v l i (t o ) = min {v l i (t)|s l i (t) = 1} ∧ s l i (t o ) = 1. ∀t, as we optimize SNNs layer by layer, we have the following equation: m l i (t) = v l i (0) + t k=1 I l i (k) - t-1 k=1 s l i (k)θ l , m l i (t) = v l i (0) + t k=1 I l i (k) - t-1 k=1 s l i (k)θ l . (S6) As v l i (0) > v l i (0) and we use the same input t k=1 I l i (k) before and after optimization, when t-1 k=1 s l i (k)θ l = t-1 k=1 s l i (k)θ l , we will have m l i (t) > m l i (t) and further derive s l i (t) ⩾ s l i (t), which means that ∀t, t k=1 s l i (k)θ l ⩾ t k=1 s l i (k)θ l . If ∃t ′ ∈ [1, t o ), t ′ k=1 s l i (k) = t ′ k=1 s l i (k) + 1, then we have already found a qualified time t ′ . If to-1 k=1 s l i (k) = to-1 k=1 s l i (k), we will have: m l i (t o ) -m l i (t o ) = v l i (0) -v l i (0) = max θ l , min {v l i (t)|s l i (t) = 1} + ϵ = max θ l , v l i (t o ) + ϵ . (S7) As m l i (t o ) = v l i (t o ) + θ l , we will further have: m l i (t o ) = m l i (t o ) -max θ l , v l i (t o ) + ϵ ⩽ m l i (t o ) -v l i (t o ) -ϵ < θ l . (S8) From the above equation, we can derive m l i (t o ) < θ l and s l i (t o ) = 1, s l i (t o ) = 0, then we will have to k=1 s l i (k) = to k=1 s l i (k) + 1, which means that t o is a qualified time. For situation (ii) in Theorem 2, we use t o to denote the specific time when v l i (t o ) = max {v l i (t)|s l i (t) = 0} ∧ s l i (t o ) = 0. Similarly, we will derive ∀t, t k=1 s l i (k)θ l ⩽ t k=1 s l i (k)θ l according to equation S5-equation S6. If ∃t ′ ∈ [1, t o ), t ′ k=1 s l i (k) = t ′ k=1 s l i (k) -1, then we have already found a qualified time t ′ . If to-1 k=1 s l i (k) = to-1 k=1 s l i (k), we will have: m l i (t o ) -m l i (t o ) = v l i (0) -v l i (0) = max θ l , θ l + ϵ -max {v l i (t)|s l i (t) = 0} = max θ l , θ l + ϵ -v l i (t o ) . As m l i (t o ) = v l i (t o ), we will further have: m l i (t o ) = m l i (t o ) + max θ l , θ l + ϵ -v l i (t o ) ⩾ m l i (t o ) + θ l + ϵ -v l i (t o ) > θ l . (S10) From the above equation, we can derive m l i (t o ) > θ l and s l i (t o ) = 0, s l i (t o ) = 1, then we will have to k=1 s l i (k) = to k=1 s l i (k) -1, which means that t o is a qualified time. Now we will further prove Theorem 2.  Proof. Proof of (i). If θ l > min {v l i (t)|s l i (t) = 1} + ϵ, v l i (0) = v l i (0) -θ l . According to Lemma 1, ∃t s , ts k=1 s l i (k) = s l i (k) = T k=1 s l i (k) + 1. If θ l < min {v l i (t)|s l i (t) = 1} + ϵ, v l i (0) = v l i (0) -min {v l i (t)|s l i (t) = 1} -ϵ. According to Lemma 1, ∃t s , ts k=1 s l i (k) = ts k=1 s l i (k) + 1, then we will have m l i (t s + 1) = m l i (t s + 1) + min {v l i (t)|s l i (t) = 1} + ϵ -θ l , which means that m l i (t s + 1) > m l i (t s + 1). For m l i , if we set t ′ as the first spike firing time after t s , which means that m l i (t ′ ) = v l i (t ′ ) + θ l and t ′ -1 k=1 s l i (k) = t ′ -1 k=1 s l i (k) + 1, then we will have m l i (t ′ ) = m l i (t ′ ) -min {v l i (t)|s l i (t) = 1} - ϵ + θ l = v l i (t ′ ) -min {v l i (t)|s l i (t) = 1} -ϵ + 2θ l > θ l . which means that s l i (t ′ ) = s l i (t ′ ) = 1, t ′ k=1 s l i (k) = t ′ k=1 s l i (k) + 1. If we continue to use the above derivation process, we can finally have  T k=1 s l i (k) = T k=1 s l i (k) + 1. Proof of (ii). If θ l > θ l + ϵ -max {v l i (t)|s l i (t) = 0}, v l i (0) = v l i (0) + θ l . s l i (k) = T k=1 s l i (k) -1. If θ l < θ l + ϵ -max {v l i (t)|s l i (t) = 0}, v l i (0) = v l i (0) + θ l + ϵ -max {v l i (t)|s l i (t) = 0}. According to Lemma 1, ∃t s , ts k=1 s l i (k) = ts k=1 s l i (k) -1, then we will have m l i (t s + 1) = m l i (t s + 1) -ϵ + max {v l i (t)|s l i (t) = 0} , which means that m l i (t s + 1) < m l i (t s + 1). For m l i , if we set t ′ as the first spike firing time after t s , which means that m l i (t ′ ) = v l i (t ′ ) + θ l and t ′ -1 k=1 s l i (k) = t ′ -1 k=1 s l i (k) -1. Similar to situation (i), we need to prove that s l i (t ′ ) = s l i (t ′ ) = 1. However, it is not easy to make a direct proof. Therefore, we attempt to prove its inverse and negative thesis : when t ′ > t s ∧ t ′ -1 k=1 s l i (k) = t ′ -1 k=1 s l i (k) -1, if s l i (t ′ ) = 0, then we can have s l i (t ′ ) = 0. Under this condition, we can derive m l i (t ′ ) = v l i (t ′ ) ∧ m l i (t ′ ) = m l i (t ′ ) -ϵ + max {v l i (t)|s l i (t) = 0}, then we will have m l i (t ′ ) = v l i (t ′ ) + ϵ -max {v l i (t)|s l i (t) = 0}. As max {v l i (t)|s l i (t) = 0} ⩾ v l i (t ′ ), m l i (t ′ ) ⩽ ϵ < θ l , which means that s l i (t ′ ) = 0. Therefore, if we set t ′ as the first spike firing time for m l i after t s , we can prove that s l i (t ′ ) = s l i (t ′ ) = 1, which means that t ′ k=1 s l i (k) = t ′ k=1 s l i (k) -1. If we continue to use the above derivation process, we can finally have T k=1 s l i (k) = T k=1 s l i (k) -1. A.3 EXPERIMENTAL RESULTS ON CIFAR-10 DATASET Tab. S1 reports the results on CIFAR-10 dataset. For VGG-16, the accuracy of our proposed method is 95.46% with 4 time-step (ρ = 4), whereas the accuracies of OPI and QCFS are 90.96% and 94.95% with 8 time-step, respectively. For ResNet-18, we achieve 95.46% with 4 time-steps (ρ = 4), whereas the corresponding performance of OPI and QCFS are 75.44% and 95.04%. For ResNet-20, our method reaches 91.68% with 4 time-steps (ρ = 4), which is 2.13% higher than QCFS (89.55%, T=8) and 25.44% higher than OPI (66.24%, T=8). In Sections 4.4 and 5.4, we have pointed out the iterative property of our proposed method. Here we will make a discussion in detail. Firstly, we can infer the specific value of ψ l based on the residual membrane potential when the corresponding input current belongs to a specific interval, which is illustrated in the following theorem. Theorem 3. Supposing that an ANN with QCFS activation function (equation 7) is converted to an SNN with L = T, λ l = θ l , v l (0) = θ l /2, and the inputs to the l-th layer of ANN and SNN are the same, that is, a l-1 = ϕ l-1 (T ). Then for any i-th element of the l-th layer, we will have the following conclusions: If T t=1 I l i (t) ∈ [-θ l /2, θ l T + θ l /2), when v l i (T )/θ l ∈ [k, k + 1), we will have ψ l i = a l i T /θ l - T t=1 s l i (t) = k, where k ∈ Z. Proof. As the preconditions of Theorem 3 are same as the preconditions of equation S1 and equation S2, by combining equation S1 and equation S2, we will have: a l i T /θ l - T t=1 s l i (t) =       T t=1 I l i (t) θ l + 1 2       - T θ l ( T t=1 I l i (t) T - v l i (T ) -θ l /2 T ) = v l i (T )/θ l + T t=1 I l i (t)/θ l + 1/2 - T t=1 I l i (t)/θ l + 1/2 . (S11) As -1 < T t=1 I l i (t)/θ l + 1/2 - T t=1 I l i (t)/θ l + 1/2 ⩽ 0, when v l i (T )/θ l ∈ [k, k + 1), k -1 < a l i T /θ l - T t=1 s l i (t) < k + 1. Considering that a l i T /θ l - T t=1 s l i (t) ∈ Z, we have ψ l i = a l i T /θ l - T t=1 s l i (t) = k. In fact, even if the input current does not belong to the specific interval, from equation 7, we can derive that when can also directly determine the ψ l according to the value of ϕ l (T ). After we have already acquired the value of ψ l , we will adopt our optimization method for |ψ l i | times to eliminate the offset spike on i-th element neuron of the l-th layer. In Tab. 3, the Ratio after multiple iterations does not achieve 100%. We find that the non-zero MSE and Ratio in Tab. 3 are caused by the rounding of the floating-point numbers. Specifically, we carefully checked the Ratio, defined as the percentage of SNN input (output) equals ANN input (output) in each layer, to prove this, and we list the results in Tab. S2. We find that the Ratio of the output in layer 1 is 100%, but the Ratio of the input in layer 2 is close to 100%. Thus, the error must be caused by the floating point number precision problem in multiplication and division operations involved in the forward propagation between layer 1 and layer 2. Considering that SNNs will calculate T t=1 W l s l-1 (t)/T but ANNs will calculate W l ( T t=1 s l-1 (t)/T ) as the average input current for the l-th layer, these two corresponding inputs are not necessarily equal due to the rounding of the floating point number. We then conduct another experiment to prove that conversion errors can be reduced to zero if the rounding of the floating point number is eliminated. We force the input of spiking neurons to be the same as QCFS neurons in each layer and calculate the Ratio of the output. As shown in Tab. S2 (line 4), we find that the Ratio of the output in each SNN layer is 100%, which indicates that iterating the proposed method can finally reduce conversion error to zero.

A.5 COMPARISON WITH DIFFERENT INITIALIZATION STRATEGIES

We make a comparison among different initialization strategies on CIFAR-100 with VGG-16 structure, including random initialization, setting v l (0) = θ l /2 (Bu et al., 2022b) , using the residual membrane potential v l (ρ) of the first stage as the initial membrane potential and our proposed method. As shown in Table S3 , our proposed method outperforms other initialization strategies under low time-steps, which proves the superiority of our method. From Tab. S3 (line 4), we notice that using the residual membrane potential v l (ρ) as the initial membrane potential also achieves considerable performance. Therefore, besides our proposed method, we can also provide a lightweight optimization scheme: for each layer, we can consider directly selecting the residual membrane potential v l (ρ) after ρ steps as the initial membrane potential for our second stage. The idea is to make v l (T ) -v l (0) in equation 5 approach 0 to eliminate conversion errors (offset spike). Tab. S4 reports further results on the ImageNet dataset. Although the performance of our lightweight optimization scheme is weaker than our best solution, it is still much better than the current SOTA methods. Under the condition of using the lightweight scheme, we can avoid the extra calculation of the optimal shifting distance. We compare the running time among QCFS, our lightweight scheme, and our shifting method on CIFAR-100 with VGG-16 structure and 16 time-steps. The corresponding running time is 101s, 101s, and 134s, respectively. # Optimize the initial membrane potential with the optimal shifting distance 19: if Need to shift the initial membrane potential up according to Theorem 3 then if Need to shift the initial membrane potential down according to Theorem 3 then 23: end for 31: end for 32: return f SNN (W , θ, v, s)  f SNN .v l (epoch × ρ) = f SNN .v l ((



to increase (or decrease) one output spike each time. Of course, it comes at a significant computational cost. In the Experiments Section, we will show that the performance of the converted SNN increases with the iteration. Typically, we can achieve high-performance and low-latency SNNs with only one iteration.5 EXPERIMENTSIn this section, we choose image classification datasets to validate the effectiveness and performance of our proposed methods, includingCIFAR-10 (LeCun et al., 1998), CIFAR-100(Krizhevsky et al.,



Figure 1: The distribution of offset spike in each layer. (a) and (c): VGG-16 on CIFAR-10, (b) and (d): VGG-16 on CIFAR-100. w/ constraint denotes the constraint of ψ l-1 = 0 through the rectification of the spikes.

(a)-2(b)/Fig. 2(c)-2(d) indicates two cases of shifting down/up, which correspondes to (i)/(ii) in Theorem 2.

Figure 3: The distribution of offset spike before and after optimization

Figure 4: Influence of different ρ. (a) VGG-16 on CIFAR-100, (b) ResNet-20 on CIFAR-100, (c) VGG-16 on ImageNet, (d) ResNet-34 on ImageNet.

Figure 5: The MSE of conversion error after using iterative optimization. (a): VGG-16 on CIFAR-10, (b): ResNet-20 on CIFAR-10, (c): VGG-16 on CIFAR-100, (d): ResNet-20 on CIFAR-100.

(k)+1, then we will have m l i (t s +1) = m l i (t s +1) by combining equation S5 and equation S6, which means that T k=ts+1 s l i (k) = T k=ts+1 s l i (k) in the remaining time cycle. As a result, we can have T k=1

l (t) < -θ l /2, a l = 0 and when T t=1 I l (t) ⩾ θ l T + θ l /2, a l = θ l , then we

1: # Convert ANN to SNN 2: for l = 1 to f ANN .layers do 3: f SNN .θ l = f ANN .λ l 4: f SNN .v l (0) = 1 2 f SNN .θ l 5: f SNN .W l = f ANN .W l 6:end for 7: # Eliminate offset spike 8: for (Image, label) in D do l = 1 to f SNN .layers do 13: for epoch = 1 to ItNum do 14: # Acquire the residual membrane potential 15:for t = 1 to ρ do 16:f SNN .s l ((epoch -1) × ρ + t) = f l SNN (data l (t))

SNN .v l (epoch × ρ) = f SNN .v l ((epoch -1) × ρ) + max(θ l , θ l + ϵ -max{f SNN .v l (t)|f SNN .s l (t) = 0, t ∈ [(epoch -1) × ρ + 1, epoch × ρ]})

enables the network to adjust weights by focusing every exact time-step. On this basis,Rathi & Roy (2021);Guo et al. (2022) further attempted the optimization of hyper-parameters and gradients.Bohte et al. (2002); Kheradpisheh & Masquelier (2020);Zhang & Li (2020) proposed a timing-based learning method, which viewed the specific spike firing time as significant temporal information to transmit between layers. Nevertheless, this type of method only applies to shallow networks at present. In addition, hybrid training methods have recently attracted extensive attention.Wang et al. (2022b);Rathi & Roy (2021) combined ANN-SNN conversion with BPTT to obtain higher performance under low latency.Kim et al. (2020)  adopted rate-coding and time-coding simultaneously to train SNNs with fewer spikes.Mostafa (2017);Zhou et al. (2021);Zhang & Li (2020) established a linear transformation about the spike firing time from adjacent layers, which enabled the use of SNNs under the training mode of ANNs. In addition, BPTT can enable calibration of the spike time in the training phase

Comparison with existing state-of-the-art ANN-SNN conversion methods

Comparison with other types of SNN training methods

The ratio and MSE after multiple iterations

According to Lemma 1, ∃t s ,

Comparison with other ANN-SNN conversion methods on CIFAR-10 dataset

Input/Output Ratio for each layer of an SNN with VGG-16 on CIFAR-10 dataset

Comparison with different initialization strategies

Comparison with the state-of-the-art ANN-SNN conversion methods Algorithm for ANN-SNN conversion. Require: The quantity of time-steps to calculate residual membrane potential ρ; The quantity of time-steps to test dataset T ; The iteration number of the optimization strategy ItNum; The corresponding input for SNN layer l data l ; The shifting variable mentioned in Theorem 2 ϵ; Pretrained QCFS ANN model f ANN (W , λ); Dataset D. Ensure: SNN model f SNN (W , θ, v, s).

epoch -1) × ρ) -max(θ l , ϵ + min{f SNN .v l (t)|f SNN .s l (t) = 1, t ∈ [(epoch -1) × ρ + 1, epoch × ρ]}) SNN .s l (ItNum × ρ + t) = f l SNN (data l (t)) data l+1 (t) = f SNN .W l (f SNN .s l (ItNum × ρ + t)f SNN .θ l )

ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China under Grant No. 62176003 and No. 62088102. 

availability

Code is available at https://github.com/hzc1208/ANN2SNN_COS.

