DCT-SNN: USING DCT TO DISTRIBUTE SPATIAL INFORMATION OVER TIME FOR LEARNING LOW-LATENCY SPIKING NEURAL NETWORKS Anonymous

Abstract

Spiking Neural Networks (SNNs) offer a promising alternative to traditional deep learning frameworks, since they provide higher computational efficiency due to event-driven information processing. SNNs distribute the analog values of pixel intensities into binary spikes over time. However, the most widely used input coding schemes, such as Poisson based rate-coding, do not leverage the additional temporal learning capability of SNNs effectively. Moreover, these SNNs suffer from high inference latency which is a major bottleneck to their deployment. To overcome this, we propose a scalable time-based encoding scheme that utilizes the Discrete Cosine Transform (DCT) to reduce the number of timesteps required for inference. DCT decomposes an image into a weighted sum of sinusoidal basis images. At each time step, a single frequency base, taken in order and modulated by its corresponding DCT coefficient, is input to an accumulator that generates spikes upon crossing a threshold. We use the proposed scheme to learn DCT-SNN, a low-latency deep SNN with leaky-integrate-and-fire neurons, trained using surrogate gradient descent based backpropagation. We achieve top-1 accuracy of 89.94%, 68.3% and 52.43% on CIFAR-10, CIFAR-100 and TinyImageNet, respectively using VGG architectures. Notably, DCT-SNN performs inference with 2-14X reduced latency compared to other state-of-the-art SNNs, while achieving comparable accuracy to their standard deep learning counterparts. The dimension of the transform allows us to control the number of timesteps required for inference. Additionally, we can trade-off accuracy with latency in a principled manner by dropping the highest frequency components during inference.

1. INTRODUCTION

Deep Learning networks have tremendously improved state-of-the-art performance for many tasks such as object detection, classification and natural language processing (Krizhevsky et al., 2012; Hinton et al., 2012; Deng & Liu, 2018) . However, such architectures are extremely energyintensive (Li et al., 2016) and hence require custom architectures and training methodologies for edge deployment (Howard et al., 2017) . To address this, Spiking Neural Networks (SNNs) have emerged as a promising alternative to traditional deep learning architectures (Maass, 1997; Roy et al., 2019) . SNNs are bio-plausible networks inspired from the learning mechanisms observed in mammalian brains. They are analogous in structure to standard networks, but perform computation in the form of spikes instead of fully analog values, as done in standard networks. For the rest of this paper, we refer to standard networks as Analog Neural Networks (ANNs) to distinguish them from their spiking counterparts with digital (spiking) inputs.The input and the correspondingly generated activations in SNNs are all binary spikes and inference is performed by accumulating the spikes over time. This can be visualized as distributing the one step inference of ANNs into a multi-step, very sparse inference scheme in the SNN. The primary source of energy efficiency of SNNs comes from the fact that very few neurons spike at any given timestep. This event driven computation and the replacement of every multiply-accumulate (MAC) operation in the ANN by an addition in SNN allows SNNs to infer with lesser energy. This energy benefit can be further enhanced using custom SNN implementations with architectural modifications (Ju et al., 2020) . (Li et al., 2017 ) have released a spiking version of the CIFAR-10 dataset based on inputs from neuromorphic sensors. IBM has designed a noncommercial processor 'TrueNorth' (F. Akopyan et al., 2015 ) , and Intel has designed its equivalent 'Loihi' (Davies et al., 2018 ) , that can train and infer on SNNs, and Blouw et al. (2019 ) have shown

