A UNIFIED OPTIMIZATION FRAMEWORK OF ANN-SNN CONVERSION: TOWARDS OPTIMAL MAPPING FROM ACTIVATION VALUES TO FIRING RATES Anonymous authors Paper under double-blind review

Abstract

Spiking Neural Networks (SNNs) have attracted great attention as a primary candidate for running large-scale deep artificial neural networks (ANNs) in real-time due to their distinctive properties of energy-efficient and event-driven fast computation. Training an SNN directly from scratch is usually difficult because of the discreteness of spikes. Converting an ANN to an SNN, i.e., ANN-SNN conversion, is an alternative method to obtain deep SNNs. The performance of the converted SNN is determined by both the ANN performance and the conversion error. The existing ANN-SNN conversion methods usually redesign the ANN with a new activation function instead of the regular ReLU, train the tailored ANN and convert it to an SNN. The performance loss between the regular ANN with ReLU and the tailored ANN has never been considered, which will be inherited to the converted SNN. In this work, we formulate the ANN-SNN conversion as a unified optimization problem which considers the performance loss between the regular ANN and the tailored ANN, as well as the conversion error simultaneously. Following the unified optimization framework, we propose the SlipReLU activation function to replace the regular ReLU activation function in the tailored ANN. The SlipReLU is a weighted sum of the threhold-ReLU and the step function, which improves the performance of either as an activation function alone. The SlipReLU method covers a family of activation functions mapping from activation values in source ANNs to firing rates in target SNNs; most of the state-ofthe-art optimal ANN-SNN conversion methods are special cases of our proposed SlipReLU method. We demonstrate through two theorems that the expected conversion error between SNNs and ANNs can theoretically be zero on a range of shift values δ ∈ [-1 2 , 1 2 ] rather than a fixed shift term 1 2 , enabling us to achieve converted SNNs with high accuracy and ultra-low latency. We evaluate our proposed SlipReLU method on CIFAR-10/100 and Tiny-ImageNet datasets, and the results show that the SlipReLU outperforms the state-of-the-art ANN-SNN conversion methods and directly trained SNNs in both accuracy and latency. To our knowledge, this is the first work to explore high-performance ANN-SNN conversion method considering the ANN performance and the conversion error simultaneously, with ultra-low latency, especially for 1 time-step (T = 1).

1. INTRODUCTION

Spiking neural networks (SNNs) are biologically-inspired neural networks based on biological plausible spiking neuron models to process real-time signals (Hodgkin & Huxley, 1952; Izhikevich, 2003) . With the significant advantages of low power consumption and fast inference on neuromorphic hardware (Roy et al., 2019) , SNNs are therefore becoming a primary candidate to run large-scale deep artificial neural networks (ANNs) in real-time. The most commonly used neuron model in SNNs is the Integrate-and-Fire (IF) neuron model (Liu & Wang, 2001) . Each neuron in the SNNs emits a spike only when its accumulated membrane potential exceeds the threshold voltage, otherwise, it stays inactive in the current time-step. This setting makes SNNs more similar to biological neural networks. Compared to ANNs, event-driven SNNs have binarized/spiking activation values, resulting in low energy consumption when implemented on specialized neuromorphic hardware. Another significant property of SNNs is the pseudo-simultaneity of their inputs and outputs for making inferences in a spatial-temporal paradigm. Compared to conventional ANNs that present a whole input vector at once, and process layer-by-layer to produce one output value, the forwarding pass in SNN can efficiently process streaming time-varying inputs. Generally, there are two distinct routes to obtain an SNN: (1) training an SNN from scratch (Wu et al., 2018; Neftci et al., 2019; Zenke & Vogels, 2021) , and (2) ANN-SNN conversion (Cao et al., 2015; Diehl et al., 2015; Deng & Gu, 2021) , i.e., converting ANNs to SNNs. Training from scratch uses a gradient-based supervised optimization method in back-propagation, pretending that SNNs are specialized ANNs. Due to the non-differentiability of the binary activation function in SNNs, surrogate gradients are usually used (Neftci et al., 2019) , but it essentially optimizes different networks in forward and backward passes. This method can only train SNNs on small-and moderatesize datasets (Li et al., 2021) . ANN-SNN conversion is an effective method to obtain deep SNNs, with comparable performance as ANNs on large-scale datasets. There are two main types of ANN-SNN conversion mechanism: (1) one-step conversion, which converts the pre-trained ANN to SNN without changing the architecture of the pre-trained ANN, for example Diehl et al. ( 2015 In this work, we investigate the two-step ANN-SNN conversion methods, where we usually redesign the ANN by replacing the regular ReLU activation function to a new activation function, train the tailored ANN and convert it to an SNN. A tailored ANN that deviates too much from the regular ANN will degrade its performance, resulting in a performance loss which will be inherited to the converted SNN. However, the performance degradation between the regular ANN and the tailored ANN has never been considered in the existing ANN-SNN conversion studies. To achieve highaccuracy and low-latency SNNs (e.g., 1 or 2 time-steps), we are the first to consider the performance loss between the regular ANN with ReLU and the tailored ANN, as well as the conversion error simultaneously. Our main contributions are summarized as follows: (1) We formulate the ANN-SNN conversion as a unified optimization problem which considers the ANN performance as well as the conversion error simultaneously. (2) We propose to use the SlipReLU activation function in the tailored ANN, in order to minimize the layer-wise conversion error and keep tailored ANN performance as good as the regular ANN. (3) The SlipReLU method covers a family of activation functions mapping from activation values in source ANNs to firing rates in target SNNs; most of the state-of-the-art optimal ANN-SNN conversion methods are special cases of our proposed SlipReLU method. (4) We demonstrate through two theorems that the expected conversion error between SNNs and ANNs can theoretically be zero on a range of shift values δ ∈ [-1 2 , 1 2 ] rather than a fixed shift 1 2 . Experiment results also demonstrate the effectiveness of the proposed SlipReLU method.

2. PRELIMINARIES

Given a classification problem on an image dataset (x, y) ∈ D, where y ∈ {1, • • • , C} is the true class label for image x ∈ R m , we train a neural network f : x → f (x) in the form of an ANN/SNN, by optimizing the standard cross-entropy (CE) loss, L CE (y, p) = -C j=1 y c log(p c ), where y c and p c are the c-th elements of the label y and the network prediction p = f (x). Since the infrastructures of the source ANN and target SNN are the same, we use the same f notation when it is unambiguous. And f ANN or f SNN , otherwise. For the notations, refer to Table S1 . ANN Neuron Model. In conventional ANN, a whole input vector is presented to the network at one time, and processed layer-by-layer through continuous activation to produce one output value. In ANNs, the forwarding computation of analog neurons is formulated as a (ℓ) = F ANN (z (ℓ) ) = F ANN (W (ℓ) a (ℓ-1) ) , where z (ℓ) and a (ℓ) are the pre-activation and post-activation vectors of the ℓ-th layer considered, W (ℓ) denotes the weight matrix, and F ANN (•) is the activation function of the ANN. SNN Neuron Model. Compared with ANN, SNN employs binary activation (i.e. spikes) in each layer. To compensate the weak representation capacity of the binary activation, the time dimension (or the latency) is introduced to SNN, where the inputs of the forwarding pass in SNN are presented as streams of events, by repeating the forwarding pass T time-steps to get the final result.



); Li et al. (2021), and (2) two-step conversion, which involves redesigning the ANN, training it and converting it to SNN, for example Cao et al. (2015); Deng & Gu (2021); Bu et al. (2021).

