LT-SNN: SELF-ADAPTIVE SPIKING NEURAL NETWORK FOR EVENT-BASED CLASSIFICATION AND OBJECT DETECTION Anonymous

Abstract

Spiking neural networks (SNNs) have received increasing attention due to its high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation with event-based computer vision applications. Prior works investigated direct SNN training algorithm to overcome the non-differentiability of spike generation. However, most of the existing works employ a fixed threshold value for the membrane potential throughout the entire training process, which limits the dynamics of SNNs towards further optimizing the performance. The adaptiveness in the membrane potential threshold and the mismatched mechanism between SNN and biological nervous system remain under-explored in prior works. In this work, we propose LT-SNN, a novel SNN training algorithm with self-adaptive learnable potential threshold to improve SNN performance. LT-SNN optimizes the layer-wise threshold value throughout SNN training, imitating the self-adaptiveness of the biological nervous system. To stabilize the SNN training even further, we propose separate surrogate gradient path (SGP), a simple-yet-effective method that enables the smooth learning process of SNN training. We validate the proposed LT-SNN algorithm on multiple event-based datasets, including both image classification and object detection tasks. Equipped with high adaptiveness that fully captures the dynamics of SNNs, LT-SNN achieves state-of-the-art performance with compact models. The proposed LT-SNN based classification network surpasses SoTA methods where we achieved 2.71% higher accuracy together with 10.48× smaller model size. Additionally, our LT-SNN-YOLOv2 object detection model demonstrates 0.11 mAP improvement compared to the SoTA SNN-based object detection.

1. INTRODUCTION

In the biological nervous system, cortical neurons process information by encoding spatial-temporal inputs into action potentials for spike generation. Inspired by that, spiking neural networks (SNNs) accumulate the membrane potential by extracting information from the input features at each time step, and the resultant binary spikes (0 and 1) provides a sparse and succinct information representation. Such spatial-temporal computation promotes SNN as an attractive AI solution with both biological plausibility and energy efficiency in comparison to the conventional artificial neural networks (ANNs) (He et al., 2016) . Furthermore, layer-wise processing with binary spikes elevates the computation efficiency, which benefits the energy-constrained applications such as edge computing. Under the context of energy-efficient AI applications, the event-based camera or dynamic vision sensors (DVS) have emerged as an attractive and feasible solution for computer vision applications. Compared to the conventional frame-based camera, event cameras independently capture the absolute illumination changes of pixels, resulting in the asynchronous binary stream of events (Gallego et al., 2020) . The captured event is characterized by binary pixels and temporal resolutions, leading to highly sparse and energy-efficient visual representations. Such binarized spatial-temporal information naturally fits the computation mechanism of SNNs, bridging the gap between computer vision and neuromorphic computing. As the major inspiration of deep learning, the intricate nervous system achieves remarkable performance with a high degree of dynamics. Previous neuroscience works observed the locationdependent potential threshold (Kole & Stuart, 2008) in nervous systems, implying the adaptive firing procedure within the mechanism of spike generation. Inspired by this, some recent works on SNN training introduced the learning dynamics into the training process, albeit to a limited degree. (Fang et al., 2021b) optimized the membrane time constant throughout training, with the requirements of large-sized models. DSR (Meng et al., 2022) proposed the threshold-associated spikes with learnable potential threshold. However, the deterministic ratio between firing range and potential threhsold of DSR limits the adaptiveness of SNN learning. (Sun et al., 2022) removes such constraints by directly passing the gradient of potential threshold through the SG function. Nevertheless, the instability of the straight-through surrogate gradient still results in sub-optimal performance compared to state-of-the-art (SoTA) SNN training with fixed threshold (Deng et al., 2021) . Although (Deng et al., 2021) achieved the SoTA performance among prior works, the fixed threshold made the membrane potential to often overshoot, limiting the dynamics of SNNs. The limitations of all prior works motivate us to investigate the following question: How can we optimize the potential threshold of SNNs with high stability and superior accuracy? To answer this question, we propose LT-SNN, a novel self-adaptive SNN training algorithm with Learnable Threshold. Starting the training from scratch, LT-SNN fully optimizes the potential threshold without introducing any additional scaling or firing constraints. To achieve highly-stable training, we propose a simple-yet-effective technique, namely Separate Gradient Path (SGP). Compared to prior works, the proposed LT-SNN algorithm fully unleashes the advantage of layer-wise adaptive potential threshold, leading to superior performance compared to all prior SNN algorithms. We validate LT-SNN on multiple event-based computer vision datasets with various model architectures. LT-SNN achieves the new state-of-the-art performance with light-weight or quantized models, as shown in Figure 1 .



Figure 1: DVS-CIFAR10 classification accuracy of different SNN training methods. The proposed LT-SNN training algorithm achieves the state-of-art accuracy with the compact VGG models.

