RETHINKING DEEP SPIKING NEURAL NETWORKS: A MULTI-LAYER PERCEPTRON APPROACH

Abstract

By adopting deep convolution architectures, spiking neural networks (SNNs) have recently achieved competitive performances with their artificial counterparts in image classification, meanwhile with much lower computation cost due to eventdriven and sparse activation. However, the multiplication-free inference (MFI) principle makes SNNs incompatible with attention or transformer mechanisms which have shown significant performance gains on high resolution vision tasks. Inspired from recent works on multi-layer perceptrons (MLPs), we explore an efficient spiking MLP design using batch normalization instead of layer normalization in both the token and the channel block to be compatible with MFI. We further strengthen the network's local feature learning ability with a spiking patch encoding layer, which significantly improves the network performance. Based on these building blocks, we explore an optimal skip connection configuration and develop an efficient multi-stage spiking MLP network combining global receptive field and local feature extraction, achieving full spike-based computation. Without pre-training or other advanced SNN training techniques, the spiking MLP network achieves 66.39% top-1 accuracy on the ImageNet-1K dataset, surpassing the state-of-the-art directly trained spiking ResNet-34 by 2.67% under similar model capacity meanwhile with shorter simulation steps and much less computation cost. Another larger variant of the network achieves 68.84% top-1 accuracy, rivaling the spiking VGG-16 network with 4 times smaller model capacity. Our work demonstrates the effectiveness of an alternative deep SNN architecture combining both global and local learning abilities. More interestingly, finally we show a close resemblance of the trained receptive field of our network to cells in the cortex. Code_will_be_publicly_available.

1. INTRODUCTION

Spiking neural networks (SNNs) (Maass, 1997) have been proposed as models for cortical simulation (Izhikevich, 2004; Brette & Gerstner, 2005; Deco et al., 2008; Gerstner et al., 2014; Korcsak-Gorzo et al., 2022) and candidates for solving problems in machine learning (Tavanaei et al., 2019; Roy et al., 2019) . Nevertheless, SNNs following exact biological topology and constraints such as Dale's law have not been demonstrated to be equally effective as artificial neural networks (ANNs) in practice, especially when scaling up. By adopting structures and adapting learning algorithms from their artificial counterparts, SNNs have improved their performances and recently achieved higher accuracy in benchmark image classification problems (Shrestha & Orchard, 2018; Wu et al., 2019; Sengupta et al., 2019; Li et al., 2021; Fang et al., 2021; Deng et al., 2022) . Deep convolutional neural networks (CNNs) are the current de-facto architectures adopted by SNNs in various vision tasks (Kim et al., 2020; 2021; Ranc ¸on et al., 2021; Zhu et al., 2022) . Recently, ANNs with visual attention and transformer mechanisms (Dosovitskiy et al., 2020; Liu et al., 2021) have surpassed pure CNNs by learning global dependency of the image. However, these mechanisms usually involve matrix multiplication and softmax function which make them contradict to the multiplication-free inference (MFI) principle of SNNs (Roy et al., 2019; Rathi & Roy, 2021) . In this work, we explore a spike-based implementation of an alternative structure more compatible with this principle, i.e multi-layer perceptrons (MLPs), which have recently been demonstrated to be equally efficient as transformers (Tolstikhin et al., 2021) . The original MLP-Mixer architecture for ANNs still involves real-valued matrix multiplication which violates MFI. To this end, we design a

