CBP-QSNN: SPIKING NEURAL NETWORKS QUAN-TIZED USING CONSTRAINED BACKPROPAGATION

Abstract

Spiking Neural Networks (SNNs) support sparse event-based data processing at high power efficiency when implemented in event-based neuromorphic processors. However, the limited on-chip memory capacity of neuromorphic processors strictly delimits the depth and width of SNNs implemented. A direct solution is the use of quantized SNNs (QSNNs) in place of SNNs with FP32 weights. To this end, we propose a method to quantize the weights using constrained backpropagation (CBP) with the Lagrangian function (conventional loss function plus well-defined weight-constraint functions) as an objective function. This work utilizes CBP as a post-training algorithm for deep SNNs pre-trained using various state-of-the-art methods including direct training (TSSL-BP, STBP, and surrogate gradient) and DNN-to-SNN conversion (SNN-Calibration), validating CBP as a general framework for QSNNs. CBP-QSNNs highlight their high accuracy insomuch as the degradation of accuracy on CIFAR-10, DVS128 Gesture, and CIFAR10-DVS in the worst case is less than 1%. Particularly, CBP-QSNNs for SNN-Calibrationpretrained SNNs on CIFAR-100 highlight an unexpected large increase in accuracy by 3.72% while using small weight-memory (3.5% of the FP32 case).

1. INTRODUCTION

Spiking Neural Networks (SNNs) are time-dependent models with spiking neurons whose dynamics in conjunction with synaptic current dynamics constitutes the rich dynamics of SNNs (Jeong, 2018) . Deep SNNs are clearly distinguished from deep neural networks (DNNs) such that (i) presynaptic spiking neurons send out 1-bit data (spikes a.k.a. events) to their postsynaptic neurons unlike the nodes sending out real-valued activation values to the nodes in the next layer in a DNN and (ii) SNN operations are based on asynchronous sparse spikes unlike DNNs based on layerwise synchronous activation calculations (Jeong, 2018; Pfeiffer & Pfeil, 2018) . These distinct features endow SNNs with high power efficiency given minimum data movements and high sparsity in operations. Yet, SNNs leverage the efficiency only when implemented in neuromorphic processors that supports event-based operations. Neuromorphic processor design technologies are diverse, e.g., mixed analog/digital circuits (Merolla et al., 2014a; Moradi et al., 2018; Neckar et al., 2019) , and fully digital circuits (Merolla et al., 2014b; Davies et al., 2018; Frenkel et al., 2018; Kornijcuk et al., 2019) . Albeit diverse, all designs commonly suffer from their limited on-chip memory (SRAM) capacity. The on-chip memory is mainly assigned to neurons (state variables and hyperparameters), synapses (weights, state variables, and hyperparameters), and event-router (lookup tables). The largest portion of on-chip memory is dedicated to synaptic weights given a significant number of synapses in a deep SNN. Additionally, most neuromorphic processors hardly allow weight-reuse for convolutional SNNs because they are designed for dense SNNs. Although some compilers for weight-reuse, e.g., NXTF for Loihi (Rueckauer et al., 2021) , the weight-reuse rate is still far below the ideal rate. Consequently, the limited on-chip memory capacity strictly limits the size (depth and width) of SNNs implementable in neuromorphic processors. Considering the limitation of on-chip memory capacity, attempts to reduce the use of synaptic weight-memory have been made, which include unstructured SNN pruning (Neftci et al., 2016; Rathi et al., 2019; Martinelli et al., 2020; Chen et al., 2021; Deng et al., 2021; Kim et al., 2022; Chen et al., 2022) and weight-quantization (Rueckauer et al., 2017; Yousefzadeh et al., 2018; Srini- 

