LT-SNN: SELF-ADAPTIVE SPIKING NEURAL NETWORK FOR EVENT-BASED CLASSIFICATION AND OBJECT DETECTION Anonymous

Abstract

Spiking neural networks (SNNs) have received increasing attention due to its high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation with event-based computer vision applications. Prior works investigated direct SNN training algorithm to overcome the non-differentiability of spike generation. However, most of the existing works employ a fixed threshold value for the membrane potential throughout the entire training process, which limits the dynamics of SNNs towards further optimizing the performance. The adaptiveness in the membrane potential threshold and the mismatched mechanism between SNN and biological nervous system remain under-explored in prior works. In this work, we propose LT-SNN, a novel SNN training algorithm with self-adaptive learnable potential threshold to improve SNN performance. LT-SNN optimizes the layer-wise threshold value throughout SNN training, imitating the self-adaptiveness of the biological nervous system. To stabilize the SNN training even further, we propose separate surrogate gradient path (SGP), a simple-yet-effective method that enables the smooth learning process of SNN training. We validate the proposed LT-SNN algorithm on multiple event-based datasets, including both image classification and object detection tasks. Equipped with high adaptiveness that fully captures the dynamics of SNNs, LT-SNN achieves state-of-the-art performance with compact models. The proposed LT-SNN based classification network surpasses SoTA methods where we achieved 2.71% higher accuracy together with 10.48× smaller model size. Additionally, our LT-SNN-YOLOv2 object detection model demonstrates 0.11 mAP improvement compared to the SoTA SNN-based object detection.

1. INTRODUCTION

In the biological nervous system, cortical neurons process information by encoding spatial-temporal inputs into action potentials for spike generation. Inspired by that, spiking neural networks (SNNs) accumulate the membrane potential by extracting information from the input features at each time step, and the resultant binary spikes (0 and 1) provides a sparse and succinct information representation. Such spatial-temporal computation promotes SNN as an attractive AI solution with both biological plausibility and energy efficiency in comparison to the conventional artificial neural networks (ANNs) (He et al., 2016) . Furthermore, layer-wise processing with binary spikes elevates the computation efficiency, which benefits the energy-constrained applications such as edge computing. Under the context of energy-efficient AI applications, the event-based camera or dynamic vision sensors (DVS) have emerged as an attractive and feasible solution for computer vision applications. Compared to the conventional frame-based camera, event cameras independently capture the absolute illumination changes of pixels, resulting in the asynchronous binary stream of events (Gallego et al., 2020) . The captured event is characterized by binary pixels and temporal resolutions, leading to highly sparse and energy-efficient visual representations. Such binarized spatial-temporal information naturally fits the computation mechanism of SNNs, bridging the gap between computer vision and neuromorphic computing.

