BSTT: A BAYESIAN SPATIAL-TEMPORAL TRANS-FORMER FOR SLEEP STAGING

Abstract

Sleep staging is helpful in assessing sleep quality and diagnosing sleep disorders. However, how to adequately capture the temporal and spatial relations of the brain during sleep remains a challenge. In particular, existing methods cannot adaptively infer spatial-temporal relations of the brain under different sleep stages. In this paper, we propose a novel Bayesian spatial-temporal relation inference neural network, named Bayesian spatial-temporal transformer (BSTT), for sleep staging. Our model is able to adaptively infer brain spatial-temporal relations during sleep for spatial-temporal feature modeling through a well-designed Bayesian relation inference component. Meanwhile, our model also includes a spatial transformer for extracting brain spatial features and a temporal transformer for capturing temporal features. Experiments show that our BSTT outperforms state-ofthe-art baselines on ISRUC and MASS datasets. In addition, the visual analysis shows that the spatial-temporal relations obtained by BSTT inference have certain interpretability for sleep staging.

1. INTRODUCTION

Sleep staging is essential for assessing sleep quality and diagnosing sleep disorders. Sleep specialists typically classify sleep stages based on the AASM sleep standard and polysomnography (PSG) recordings to aid in diagnosis. The AASM standard not only provides criteria for determining each sleep period, but also documents conversion rules between different sleep stages, which is known as sleep transition rules, to help sleep specialists identify sleep stages when sleep transitions occur. However, artificial sleep staging takes a long time, and the classification results are greatly affected by professional level and subjectivity (Supratak et al., 2017) . Therefore, automatic classification methods are applied into sleep staging to improve efficiency. Traditional machine learning methods use artificially designed features for sleep staging, which improves the efficiency of staging to a certain extent (Fraiwan et al., 2012) . However, the accuracy of traditional machine learning methods relies heavily on feature engineering and feature selection, which still requires a lot of expert knowledge. To address the above problems, deep learning methods have been applied to sleep staging and achieved satisfactory classification performance (Phan et al., 2019; Jia et al., 2022a; b) . Most of the early deep learning methods focus on the temporal information of the sleep data, utilizing convolutional neural networks (CNN) and recurrent neural networks (RNN) to capture temporal features for sleep staging (Jain & Ganesan, 2021; Perslev et al., 2019) . In addition, some studies have shown that the spatial topology of the brain behave differently in different sleep stages (Khanal, 2019) , which means that both the temporal and spatial relations of the brain are both important during sleep. Therefore, some researchers try to use the spatial and temporal characteristics of the brain for sleep staging (Jia et al., 2020b; Phan et al., 2022; Jia et al., 2020a) . Although the above methods achieve good classification performance, it is challenging to model spatial and temporal relations. Specifically, for the modeling of temporal relations, some approaches attempt to capture sleep transition rules in sleep to serve the identification of specific sleep stages. However, it is difficult for these methods to explicitly demonstrate the relation of

