DIET-SNN: A LOW-LATENCY SPIKING NEURAL NETWORK WITH DIRECT INPUT ENCODING & LEAKAGE AND THRESHOLD OPTIMIZATION

Abstract

Bio-inspired spiking neural networks (SNNs), operating with asynchronous binary signals (or spikes) distributed over time, can potentially lead to greater computational efficiency on event-driven hardware. The state-of-the-art SNNs suffer from high inference latency, resulting from inefficient input encoding, and sub-optimal settings of the neuron parameters (firing threshold, and membrane leak). We propose DIET-SNN, a low-latency deep spiking network that is trained with gradient descent to optimize the membrane leak and the firing threshold along with other network parameters (weights). The membrane leak and threshold for each layer of the SNN are optimized with end-to-end backpropagation to achieve competitive accuracy at reduced latency. The analog pixel values of an image are directly applied to the input layer of DIET-SNN without the need to convert to spike-train. The first convolutional layer is trained to convert inputs into spikes where leaky-integrate-and-fire (LIF) neurons integrate the weighted inputs and generate an output spike when the membrane potential crosses the trained firing threshold. The trained membrane leak controls the flow of input information and attenuates irrelevant inputs to increase the activation sparsity in the convolutional and dense layers of the network. The reduced latency combined with high activation sparsity provides large improvements in computational efficiency. We evaluate DIET-SNN on image classification tasks from CIFAR and ImageNet datasets on VGG and ResNet architectures. We achieve top-1 accuracy of 69% with 5 timesteps (inference latency) on the ImageNet dataset with 12× less compute energy than an equivalent standard ANN. Additionally, DIET-SNN performs 20 -500× faster inference compared to other state-of-the-art SNN models.

1. INTRODUCTION

In recent years, a class of neural networks inspired by the event-driven form of computations in the brain has gained popularity for their promise of low-power computing (Painkras et al., 2013; Davies et al., 2018) . Spiking neural networks (SNNs) first emerged in computational neuroscience as an attempt to model the behavior of biological neurons (Mainen & Sejnowski, 1995) . They were pursued for low-complexity tasks implemented on bio-plausible neuromorphic platforms. At the same time in standard deep learning, the analog-valued artificial neural networks (ANNs) became the de-facto model for training various computer vision and natural language processing tasks (Krizhevsky et al., 2012; Hinton et al., 2012) . The skyrocketing performance and success of multi-layer ANNs came at a significant power and energy cost (Li et al., 2016) . Recently, major chip maker Nvidia estimated that 80 -90% of the energy cost of neural networks at data centers lies in inference processing (Freund, 2019) . The tremendous energy costs and the demand for edge intelligence on battery-powered devices have shifted the focus on exploring lightweight energy-efficient inference models for machine intelligence. To that effect, various techniques such as weight pruning (Han et al., 2015 ), model compression (He et al., 2018) , and quantization methods (Chakraborty et al., 2020) are proposed to reduce the size and computations in ANNs. Nonetheless, the inherent one-shot analog computation in ANNs requires the expensive operation of multiplying two real numbers (except when both weights and activations are 1-bit (Rastegari et al., 2016) ). In contrast, SNNs inherently compute and transmit information with binary signals distributed over time, providing a promising alternative for power-efficient machine intelligence. For a long time, the success of SNNs was delayed due to the unavailability of good learning algorithms. But in recent years, the advent of supervised learning algorithms for SNN has overcome many of the roadblocks surrounding the discontinuous derivative of the spike activation function. Since SNNs receive and transmit information through spikes, analog values need to be encoded into spikes. There are a plethora of input encoding methods like rate coding (Diehl et al., 2015; Sengupta et al., 2019) , temporal coding (Comsa et al., 2020) , rank-order coding (Kheradpisheh & Masquelier, 2020), and other special coding schemes (Almomani et al., 2019) . Among these, rate-coding has shown competitive performance on complex tasks (Diehl et al., 2015; Sengupta et al., 2019; Lee et al., 2019) while others are limited to simple tasks like learning the XOR function and classifying digits from the MNIST dataset. Also, dynamic vision sensors (DVS) record the change in image pixel intensities and directly convert it to spikes that can estimate optical flow (Lee et al., 2020) and classify hand gestures (Shrestha & Orchard, 2018) . In rate coding, the analog value is represented by the rate of firing of the neuron. In each timestep, the neuron either fires (output '1') or stays inactive (output '0'). The number of timestepsfoot_0 determines the discretization error in the representation of the analog value by spike-train. This leads to adopting a large number of timesteps for high accuracy at the expense of high inference latency (Sengupta et al., 2019) . The two other parameters that are crucial for SNNs are firing threshold of the neuron and membrane potential leak. The neuron fires when the membrane potential exceeds the firing threshold and the potential is reset after each firing. Such neurons are usually referred to as integrate-and-fire (IF) neurons. The threshold value is very significant for the correct operation of SNNs because a high threshold will prevent the neuron from firing ('dead-neuron' problem), and a lower threshold will lead to unnecessary firing, affecting the ability of the neuron to differentiate between two input patterns. Another neuron model, leaky-integrate-and-fire (LIF), introduces a leak factor that allows the membrane potential to keep shrinking over time (Gerstner & Kistler, 2002) . Most of the recent work on supervised learning in SNNs has either employed the IF or the LIF neuron model (Diehl et al., 2015; Sengupta et al., 2019; Lee et al., 2019; Rathi et al., 2020; Han et al., 2020) . Some proposals adopt kernel-based spike response models (Huh & Sejnowski, 2018; Bohte et al., 2000) , but for the most part, these approaches show limited performance on simple datasets and do not scale for deep networks. The leak provides an additional knob that can potentially be used to tune SNNs for better energy-efficiency. However, there has not been any exploration of the full design space of optimizing the leak and the threshold to achieve better latency (or energy) and accuracy tradeoff. Research, so far, has been mainly focused on using fixed leak for the entire network that can limit the capabilities of SNNs (Lee et al., 2019; Rathi et al., 2020) . The firing thresholds are also fixed (Lee et al., 2019) or selected based on some heuristics (Sengupta et al., 2019; Rueckauer et al., 2017) . Some recent works employ leak/threshold optimization but their application is limited to simple datasets (Fang, 2020; Yin et al., 2020) . The current challenges in SNN models are high inference latency and energy, long training time, and high training costs in terms of memory and computation. Most of these challenges arise due to in-efficient input encoding, and improper methods of selecting the membrane leak and the threshold. To address these challenges, this paper makes the following contributions: • We propose a gradient descent based training method that learns the correct membrane leak and firing threshold for each layer of a deep spiking network via error-backpropagation. The goal is to jointly optimize the neuron parameters (membrane leak and threshold) and the network parameters (weights) to achieve high accuracy at low inference latency. The tailored membrane leak and threshold for each layer leads to large improvement in activation sparsity and energy-efficiency. • We train the first convolutional layer to act as the spike-generator, whose spike-rate is a function of the weights, membrane leak, and threshold. This also eliminates the need for a generator function (and associated overheads) used in other coding schemes 2 .



Wall-clock time for 1 'timestep' is dependent on the number of computations performed and the underlying hardware(Frady et al., 2020). In simulation, 1 timestep is the time taken to perform 1 forward pass 2 For rate-coding, a Poisson generator is used to convert the analog values to spike-train(Diehl et al., 2015). The encoder generates random numbers every timestep and compares it with the analog values to produce the spikes.

