SPIKING CONVOLUTIONAL NEURAL NETWORKS FOR TEXT CLASSIFICATION

Abstract

Spiking neural networks (SNNs) offer a promising pathway to implement deep neural networks (DNNs) in a more energy-efficient manner since their neurons are sparsely activated and inferences are event-driven. However, there have been very few works that have demonstrated the efficacy of SNNs in language tasks partially because it is non-trivial to represent words in the forms of spikes and to deal with variable-length texts by SNNs. This work presents a "conversion + fine-tuning" two-step method for training SNNs for text classification and proposes a simple but effective way to encode pre-trained word embeddings as spike trains. We show empirically that after fine-tuning with surrogate gradients, the converted SNNs achieve comparable results to their DNN counterparts with much less energy consumption across multiple datasets for both English and Chinese. We also show that such SNNs are more robust to adversarial attacks than DNNs.

1. INTRODUCTION

Inspired by the biological neuro-synaptic framework, modern deep neural networks are successfully used in various applications (Krizhevsky et al., 2012; Graves & Jaitly, 2014; Mikolov et al., 2013b) . However, the amount of computational power and energy required to run state-of-the-art deep neural models is considerable and continues to increase in the past decade. For example, a neural language model of GPT-3 (Brown et al., 2020) consumes roughly 190, 000 kWh to train (Dhar, 2020; Anthony et al., 2020) , while the human brain performs perception, recognition, reasoning, control, and movement simultaneously with a power budget of just 20 W (Cox & Dean, 2014) . Like biological neurons, spiking neural networks (SNNs) use discrete spikes to compute and transmit information, which are more biologically plausible and also energy-efficient than deep learning models. Spike-based computing fuelled with neuromorphic hardware provides a promising way to realize artificial intelligence while greatly reducing energy consumption. Although many studies have shown that SNNs can produce competitive results in vision (mostly classification) tasks (Cao et al., 2015; Diehl et al., 2015; Rueckauer et al., 2017; Shrestha & Orchard, 2018; Sengupta et al., 2019) , there are very few works that have demonstrated their effectiveness in natural language processing (NLP) tasks (Diehl et al., 2016; Rao et al., 2022) . SNNs offer a promising opportunity for processing sequential data. Rao et al. (2022) showed that long-short term memory (LSTM) units can be implemented by spike-based neuromorphic hardware with the spike frequency adaptation mechanism. They tested the performance of such spike-based networks (called RelNet) on a question-answering dataset (Weston et al., 2015) , and observed that RelNet could solve 16 out of the 17 toy tasks. A task is considered to be solved if the network has an error rate at most 5% on unseen instances of the task. In their design, each word is encoded as a one-hot vector, and a sentence is also fed into the network in the form of one-hot coded spikes. Such a one-hot encoding schema limits the size of the vocabulary that could be used (otherwise, very high-dimensional vectors are required to represent words used in a language). Besides, it is impossible for spike-based networks to leverage the word embeddings learned from a large amount of text data. Diehl et al. ( 2016) used pre-trained word embeddings in their TrueNorth implementation of a recurrent neural network and achieved 74% accuracy in a question classification task. However, an external projection layer is required to project word embeddings to the vectors with positive values that can be further converted In this study, we propose a two-step recipe of "conversion + fine-tuning" to train spiking neural networks for NLP. A normally-trained neural network is first converted to a spiking neural network by simply duplicating its architecture and weights, and then the converted SNN is fine-tuned afterward. Before the conversion, a proper tailored network needs to be built and trained first. Taking a convolutional neural network for sentence classification, called TextCNN (Kim, 2014) , as an example (see Figure 1 ), the original TextCNN is first modified to a tailored CNN by replacing the max-pooling operation with average-pooling, the Sigmoid activation function with ReLU, and the word embeddings with positive-valued vectors (shifted). After the tailored network is trained on a dataset with the gradient descent algorithm, it is converted to a spiking neural network that is further fine-tuned with the surrogate gradient method (Zenke & Vogels, 2021) on the same datast. The SNNs trained with the proposed two-step training strategy yield comparable results to their DNN counterparts with much less energy consumption. The contribution of this study can be summarized as follows: • We present a two-step method for training SNNs for language tasks, which combines the conversion-based approach (also known as shallow training) and the backpropagation using surrogate gradients at the fine-tuning phase. • We propose a method to convert word embeddings to spike trains, which makes it possible for SNNs to leverage the word embeddings pre-trained from a large amount of text data. The ablation study shows that using pre-trained word embeddings can significantly improve the performance of SNNs. • This study is among the first to demonstrate that well-trained spiking neural networks can achieve comparable results to their DNN counterparts on 6 text classification datasets and for both English and Chinese languages. We also show that SNNs perform more robustly against adversarial attacks than traditional DNNs.

2. RELATED WORK

SNNs offer a promising computing paradigm due to their ability to capture the temporal dynamics of biological neurons. Several methods have been proposed for training SNNs, and they can be roughly divided into two categories: conversion-based and spike-based approaches. The conversion-based approaches are to train a non-spiking network first and convert it into an SNN that produces the same input-output mapping for a given task as that of the original network. In the spike-based approaches, SNNs are trained using spike-timing information in an unsupervised or supervised manner.



Figure 1: An illustration of a two-step method (conversion + fine-tuning) for training spiking neural networks for text classification: initialize an SNN with the weights of a tailored network trained with the gradient descent, and perform backpropagation with surrogate gradients on the converted SNN. The tailored network is obtained by replacing the max-pooling operation with average-pooling, the Sigmoid activation function with ReLU, and the word embeddings with their positive equivalents. into spike trains. Such a projection layer cannot be easily implemented by spike-based networks, and in fact, they used a hybrid architecture that combines artificial and spiking neural networks.

