RETHINKING DEEP SPIKING NEURAL NETWORKS: A MULTI-LAYER PERCEPTRON APPROACH

Abstract

By adopting deep convolution architectures, spiking neural networks (SNNs) have recently achieved competitive performances with their artificial counterparts in image classification, meanwhile with much lower computation cost due to eventdriven and sparse activation. However, the multiplication-free inference (MFI) principle makes SNNs incompatible with attention or transformer mechanisms which have shown significant performance gains on high resolution vision tasks. Inspired from recent works on multi-layer perceptrons (MLPs), we explore an efficient spiking MLP design using batch normalization instead of layer normalization in both the token and the channel block to be compatible with MFI. We further strengthen the network's local feature learning ability with a spiking patch encoding layer, which significantly improves the network performance. Based on these building blocks, we explore an optimal skip connection configuration and develop an efficient multi-stage spiking MLP network combining global receptive field and local feature extraction, achieving full spike-based computation. Without pre-training or other advanced SNN training techniques, the spiking MLP network achieves 66.39% top-1 accuracy on the ImageNet-1K dataset, surpassing the state-of-the-art directly trained spiking ResNet-34 by 2.67% under similar model capacity meanwhile with shorter simulation steps and much less computation cost. Another larger variant of the network achieves 68.84% top-1 accuracy, rivaling the spiking VGG-16 network with 4 times smaller model capacity. Our work demonstrates the effectiveness of an alternative deep SNN architecture combining both global and local learning abilities. More interestingly, finally we show a close resemblance of the trained receptive field of our network to cells in the cortex. Code_will_be_publicly_available.

1. INTRODUCTION

Spiking neural networks (SNNs) (Maass, 1997) have been proposed as models for cortical simulation (Izhikevich, 2004; Brette & Gerstner, 2005; Deco et al., 2008; Gerstner et al., 2014; Korcsak-Gorzo et al., 2022) and candidates for solving problems in machine learning (Tavanaei et al., 2019; Roy et al., 2019) . Nevertheless, SNNs following exact biological topology and constraints such as Dale's law have not been demonstrated to be equally effective as artificial neural networks (ANNs) in practice, especially when scaling up. By adopting structures and adapting learning algorithms from their artificial counterparts, SNNs have improved their performances and recently achieved higher accuracy in benchmark image classification problems (Shrestha & Orchard, 2018; Wu et al., 2019; Sengupta et al., 2019; Li et al., 2021; Fang et al., 2021; Deng et al., 2022) . Deep convolutional neural networks (CNNs) are the current de-facto architectures adopted by SNNs in various vision tasks (Kim et al., 2020; 2021; Ranc ¸on et al., 2021; Zhu et al., 2022) . Recently, ANNs with visual attention and transformer mechanisms (Dosovitskiy et al., 2020; Liu et al., 2021) have surpassed pure CNNs by learning global dependency of the image. However, these mechanisms usually involve matrix multiplication and softmax function which make them contradict to the multiplication-free inference (MFI) principle of SNNs (Roy et al., 2019; Rathi & Roy, 2021) . In this work, we explore a spike-based implementation of an alternative structure more compatible with this principle, i.e multi-layer perceptrons (MLPs), which have recently been demonstrated to be equally efficient as transformers (Tolstikhin et al., 2021) . The original MLP-Mixer architecture for ANNs still involves real-valued matrix multiplication which violates MFI. To this end, we design a spiking MLP-Mixer architecture using the MFI-friendly batch normalization (BN) with lightweight axial sampling in the token block. With the spiking MLP-Mixer as a basic building block, we propose a multi-stage spiking-MLP network achieving full spike-based computation. To enhance the local feature extraction of the MLP network, we propose a spiking patch encoding module based on directed acyclic graph structure to replace the original patch partition for downsampling. In addition, we identify the crucial role of skip connection configuration for an optimal spiking MLP-Mixer design. To our best knowledge, this is the first work to explore full spike-based token-sampling MLP architectures in the field of SNNs. To be specific, our contributions can be summarized as follows: • We develop an efficient spiking MLP-Mixer with MFI-friendly BN and lightweight axial sampling in the token block. In addition, we identify the crucial role of skip connection configuration for an optimal spiking MLP-Mixer design. • We propose a spiking patch encoding module to enhance local feature extraction and for downsampling, based on which we construct a multi-stage spiking-MLP network achieving full spike-based computation. • Our network achieves 66.39% top-1 accuracy on the classification of ImageNet-1K with 2.67% improvement compared to the current state-of-the-art deep spiking ResNet-34 network, meanwhile with similar model capacity, 2/3 of its simulation steps and much lower computation cost. With an equal simulation steps the same network improves to 69.09% accuracy achieving 5.37% improvement, slightly surpassing spiking VGG-16 network with 5.5 times smaller model capacity, demonstrating the effectiveness of an alternative architecture design for deep SNNs. • Finally, our networks pre-trained on ImageNet create new SNN records when fine-tune on CIFAR10 and CIFAR100 datasets, achieving 96.08% and 80.57% accuracy, demonstrating the general usage of our architecture as pre-trained models.

2.1. SPIKING NEURAL NETWORKS IN DEEP LEARNING

Originated from computational neuroscience, SNNs have been widely used for modeling brain function and dynamics (Izhikevich, 2004; Brette & Gerstner, 2005; Deco et al., 2008; Gerstner et al., 2014; Korcsak-Gorzo et al., 2022) . The success achieved and challenges faced by deep ANNs in solving machine learning problems have led a trend to use SNNs as an alternative and explore functional benefits of their bio-inspired properties in solving similar problems. It has been demonstrated that by adapting learning algorithms (Bohte et al., 2000; Wu et al., 2018; Neftci et al., 2019; Bellec et al., 2020) and adopting efficient architectures from ANNs, such as Boltzmann machines (Ackley et al., 1985) in the early phase of deep learning and the current dominant CNNs (LeCun et al., 1989) , SNNs can achieve competitive performances rivaling their artificial counterparts (Petrovici et al., 2016; Neftci et al., 2014; Leng et al., 2018; Shrestha & Orchard, 2018; Wu et al., 2019; Zhang & Li, 2020; Li et al., 2021; Fang et al., 2021; Deng et al., 2022) . Several recent works have applied neural architecture search (NAS) to obtain task-specific cells or network structures for SNNs (Na et al., 2022; Kim et al., 2022) . Despite that these networks have achieved further spike reduction and new state-of-the-art results in image classification, their architectures are still deep CNNs. (Tolstikhin et al., 2021) proposed an alternative architecture based exclusively on MLPs without convolution or self-attention layers. The approach contains two MLP blocks with a token mixing block applied on sliced image patches similar to vision transformer (ViT) (Dosovitskiy et al., 2020) 



BIOLOGICALLY PLAUSIBLE ATTENTIONThe recent success of attention and transformer mechanisms in speech(Vaswani et al., 2017)  and vision tasks(Dosovitskiy et al., 2020)  has motivated explorations of similar mechanisms in more biologically plausible forms. Works inWidrich et al. (2020); Ramsauer et al. (2020)  showed that the attention mechanism of transformer is equal to the update rule of modern Hopfield networks with continuous states and demonstrated its high storage capacity in large scale multiple instance learning. However, same as the original transformer, it involves matrix multiplication and softmax function which are not compatible with spike-based computation. The pioneering work of MLP-Mixer

