BSTT: A BAYESIAN SPATIAL-TEMPORAL TRANS-FORMER FOR SLEEP STAGING

Abstract

Sleep staging is helpful in assessing sleep quality and diagnosing sleep disorders. However, how to adequately capture the temporal and spatial relations of the brain during sleep remains a challenge. In particular, existing methods cannot adaptively infer spatial-temporal relations of the brain under different sleep stages. In this paper, we propose a novel Bayesian spatial-temporal relation inference neural network, named Bayesian spatial-temporal transformer (BSTT), for sleep staging. Our model is able to adaptively infer brain spatial-temporal relations during sleep for spatial-temporal feature modeling through a well-designed Bayesian relation inference component. Meanwhile, our model also includes a spatial transformer for extracting brain spatial features and a temporal transformer for capturing temporal features. Experiments show that our BSTT outperforms state-ofthe-art baselines on ISRUC and MASS datasets. In addition, the visual analysis shows that the spatial-temporal relations obtained by BSTT inference have certain interpretability for sleep staging.

1. INTRODUCTION

Sleep staging is essential for assessing sleep quality and diagnosing sleep disorders. Sleep specialists typically classify sleep stages based on the AASM sleep standard and polysomnography (PSG) recordings to aid in diagnosis. The AASM standard not only provides criteria for determining each sleep period, but also documents conversion rules between different sleep stages, which is known as sleep transition rules, to help sleep specialists identify sleep stages when sleep transitions occur. However, artificial sleep staging takes a long time, and the classification results are greatly affected by professional level and subjectivity (Supratak et al., 2017) . Therefore, automatic classification methods are applied into sleep staging to improve efficiency. Traditional machine learning methods use artificially designed features for sleep staging, which improves the efficiency of staging to a certain extent (Fraiwan et al., 2012) . However, the accuracy of traditional machine learning methods relies heavily on feature engineering and feature selection, which still requires a lot of expert knowledge. To address the above problems, deep learning methods have been applied to sleep staging and achieved satisfactory classification performance (Phan et al., 2019; Jia et al., 2022a; b) . Most of the early deep learning methods focus on the temporal information of the sleep data, utilizing convolutional neural networks (CNN) and recurrent neural networks (RNN) to capture temporal features for sleep staging (Jain & Ganesan, 2021; Perslev et al., 2019) . In addition, some studies have shown that the spatial topology of the brain behave differently in different sleep stages (Khanal, 2019) , which means that both the temporal and spatial relations of the brain are both important during sleep. Therefore, some researchers try to use the spatial and temporal characteristics of the brain for sleep staging (Jia et al., 2020b; Phan et al., 2022; Jia et al., 2020a) . Although the above methods achieve good classification performance, it is challenging to model spatial and temporal relations. Specifically, for the modeling of temporal relations, some approaches attempt to capture sleep transition rules in sleep to serve the identification of specific sleep stages. However, it is difficult for these methods to explicitly demonstrate the relation of different sleep time slices in accordance with the AASM sleep standard. Besides, for the modeling of spatial relations, spatial convolution operation is employed to extract the spatial features of the brain, which is insufficient that it may ignore the spatial topology of the brain by most methods (Zhou et al., 2021a; Perslev et al., 2019) . A few researches utilize spatial topology and temporal relation information of brain for sleep staging by graph convolutional networks, but the constructed brain networks still lack interpretability to a certain extent (Jia et al., 2020b) . To address the above challenges, we propose a novel model called Bayesian spatial-temporal transformer (BSTT) for sleep staging. The proposed model integrates the transformer and Bayesian relation inference in a unified framework. Specifically, we design the spatial-temporal transformer architecture, which can capture the temporal and spatial features of the brain. Besides, we propose the Bayesian relational inference component which comes in two forms, Bayesian temporal relation inference and Bayesian spatial relation inference. Wherefore, it can infer the spatial-temporal relations of objects and generate the relation intensity graphs. Specifically, the main contributions of our BSTT are summarized as follows: • We design Bayesian relational inference component which can adaptively infer spatialtemporal relations of brain during sleep in the service of capturing spatial-temporal relations. • We apply the spatial-temporal transformer architecture to simultaneously model spatialtemporal relations. It can effectively capture the spatial-temporal features of the brain and enhance the model's ability to model spatial-temporal relations. • Experimental results show that the proposed BSTT achieves the state-of-the-art in multiple sleep staging datasets. The visual analysis shows that our model has a certain degree of interpretability for sleep staging.

2. RELATED WORK

Identifying sleep stages plays an important role in diagnosing and treating sleep disorders. Earlier, the support vector machine (SVM) and random forest (RF) are used for sleep staging (Fraiwan et al., 2012) . However, these methods need hand-crafted features, which require a lot of prior knowledge. Currently, deep learning methods have become the primary method for sleep staging. Early The proposed spatial-temporal graph convolutional network is used to extract spatial features and capture transition rules (Jia et al., 2020b) . They also propose a multi-view spatial-temporal graph convolutional network based on domain generalization, which models the multi-view-based spatial characteristics of the brain (Jia et al., 2021a) . Although the above models achieve good classification performance, these models do not adequately model spatial-temporal properties or effectively reason and capture spatial-temporal rela-



deep learning methods extract temporal features of sleep signals for classification. The earliest methods are based on the CNN models(Tsinalis et al., 2016; Chambon et al., 2018). For example, Chambon et al. propose a convolutional neural network that can extract temporal-invariant features from sleep signals(Chambon et al., 2018). Furthermore, Eldele et al. develope a multi-resolution CNN with adaptive feature recalibration to extract representative features(Eldele et al., 2021). In addition, RNN models have been gradually used for sleep staging(Phan et al., 2019; Perslev et al.,  2019; Phan et al., 2018). For example, Phan et al. propose a deep bidirectional RNN model with attention mechanism for single-channel EEG (Phan et al., 2018). They then design an end-to-end hierarchical RNN architecture for capturing different levels of EEG signal features (Phan et al., 2019). Some studies combine CNN with RNN (Supratak & Guo, 2020; Guillot & Thorey, 2021; Dong et al., 2017). For example, Suratak et al. propose a hybrid model combining CNN and RNN to extract rich temporal features (Supratak et al., 2017). In addition, Phan et al. introduce transformer into the sleep staging task to capture the temporal context features of sleep signals (Phan et al., 2022). Jia et al. design a fully convolutional model to capture the typical waveform of sleep signals (Jia et al., 2021b).Further, several studies have shown the importance of brain spatial relations for sleep staging(Khanal, 2019; Sakkalis, 2011). Some researchers try to model the spatial-temporal characteristics of sleep data. For example, Jia et al. propose an adaptive deep learning model for sleep staging.

