HOYER REGULARIZER IS ALL YOU NEED FOR ULTRA LOW-LATENCY SPIKING NEURAL NETWORKS

Abstract

Spiking Neural networks (SNN) have emerged as an attractive spatio-temporal computing paradigm for a wide range of low-power vision tasks. However, stateof-the-art (SOTA) SNN models either incur multiple time steps which hinder their deployment in real-time use cases or increase the training complexity significantly. To mitigate this concern, we present a training framework (from scratch) for onetime-step SNNs that uses a novel variant of the recently proposed Hoyer regularizer. We estimate the threshold of each SNN layer as the Hoyer extremum of a clipped version of its activation map, where the clipping threshold is trained using gradient descent with our Hoyer regularizer. This approach not only downscales the value of the trainable threshold, thereby emitting a large number of spikes for weight update with a limited number of iterations (due to only one time step) but also shifts the membrane potential values away from the threshold, thereby mitigating the effect of noise that can degrade the SNN accuracy. Our approach outperforms existing spiking, binary, and adder neural networks in terms of the accuracy-FLOPs trade-off for complex image recognition tasks. Downstream experiments on object detection also demonstrate the efficacy of our approach. Codes will be made publicly available.

1. INTRODUCTION & RELATED WORKS

Due to its high activation sparsity and use of cheaper accumulates (AC) instead of energy-expensive multiply-and-accumulates (MAC), SNNs have emerged as a promising low-power alternative to compute-and memory-expensive deep neural networks (DNN) (Indiveri et al., 2011; Pfeiffer et al., 2018; Cao et al., 2015) . Because SNNs receive and transmit information via spikes, analog inputs have to be encoded with a sequence of spikes using techniques such as rate coding (Diehl et al., 2016 ), temporal coding (Comsa et al., 2020 ), direct encoding (Rathi et al., 2020a) and rank-order coding (Kheradpisheh et al., 2018) . In addition to accommodating various forms of spike encoding, supervised training algorithms for SNNs have overcome various roadblocks associated with the discontinuous spike activation function (Lee et al., 2016; Kim et al., 2020) . Moreover, previous SNN efforts propose batch normalization (BN) techniques (Kim et al., 2020; Zheng et al., 2021) that leverage the temporal dynamics with rate/direct encoding. However, most of these efforts require multiple time steps which increases training and inference costs compared to non-spiking counterparts for static vision tasks. The training effort is high because backpropagation must integrate the gradients over an SNN that is unrolled once for each time step (Panda et al., 2020) . Moreover, the multiple forward passes result in an increased number of spikes, which degrades the SNN's energy efficiency, both during training and inference, and possibly offsets the compute advantage of the ACs. The multiple time steps also increase the inference complexity because of the need for input encoding logic and the increased latency associated with requiring one forward pass per time step. To mitigate these concerns, we propose one-time-step SNNs that do not require any non-spiking DNN pre-training and are more compute-efficient than existing multi-time-step SNNs. Without any temporal overhead, these SNNs are similar to vanilla feed-forward DNNs, with Heaviside activation functions (McCulloch & Pitts, 1943) . These SNNs are also similar to sparsity-induced or uni-polar binary neural networks (BNNs) (Wang et al., 2020b) that have 0 and 1 as two states. However, these BNNs do not yield SOTA accuracy like the bi-polar BNNs (Diffenderfer & Kailkhura, 2021) that has 1 and -1 as two states. A recent SNN work (Chowdhury et al., 2021 ) also proposed the use of one time-step, however, it required CNN pre-training, followed by iterative SNN training from 5 to 1 steps, significantly increasing the training complexity, particularly for ImageNet-level tasks. Note 1

