SPATIOTEMPORAL MODELING OF MULTIVARIATE SIG-NALS WITH GRAPH NEURAL NETWORKS AND STRUC-TURED STATE SPACE MODELS

Abstract

Multivariate signals are prevalent in various domains, such as healthcare, transportation systems, and space sciences. Modeling spatiotemporal dependencies in multivariate signals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between sensors. To address these challenges, we propose representing multivariate signals as graphs and introduce GRAPHS4MER, a general graph neural network (GNN) architecture that captures both spatial and temporal dependencies in multivariate signals. Specifically, (1) we leverage Structured State Spaces model (S4), a state-of-the-art sequence model, to capture long-term temporal dependencies and (2) we propose a graph structure learning layer in GRAPHS4MER to learn dynamically evolving graph structures in the data. We evaluate our proposed model on three distinct tasks and show that GRAPHS4MER consistently improves over existing models, including (1) seizure detection from electroencephalography signals, outperforming a previous GNN with self-supervised pretraining by 3.1 points in AUROC; (2) sleep staging from polysomnography signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) traffic forecasting, reducing MAE by 8.8% compared to existing GNNs and by 1.4% compared to Transformer-based models.

1. INTRODUCTION

Multivariate signals are time series data measured by multiple sensors and are prevalent in many realworld applications, including healthcare (Mincholé et al., 2019) , transportation systems (Ermagun & Levinson, 2018) , power systems (Negnevitsky et al., 2009) , and space sciences (Camporeale et al., 2018) . An example multivariate signal is scalp electroencephalograms (EEGs), which measure brain electrical activities using sensors placed on an individual's scalp. Several challenges exist in modeling spatiotemporal dependencies in multivariate signals. First, many types of signals are sampled at a high sampling rate, which results in long sequences that can be up to tens of thousands of time steps. Moreover, multivariate signals often involve long-range temporal correlations (Berthouze et al., 2010) . Prior studies on modeling long signals often preprocess the raw signals using frequency transformations (Tang et al., 2022b; Asif et al., 2020; Shoeibi et al., 2021; Covert et al., 2019; Guillot et al., 2020; Guillot & Thorey, 2021) or divide the signals into short windows and aggregate model predictions post-hoc (Phan & Mikkelsen, 2022; Pradhan et al., 2022) . However, such preprocessing steps may discard important information encoded in raw signals, as well as neglect long-range temporal dependencies in the signals. Therefore, a model that is capable of modeling long-range temporal correlations in raw signals is needed. Deep sequence models, including recurrent neural networks (RNNs), convolutional neural networks (CNNs), and Transformers, have specialized variants for handling long sequences (Arjovsky et al., 2016; Erichson et al., 2021; Katharopoulos et al., 2020; Choromanski et al., 2021) . However, they struggle to scale to long sequences of tens of thousands of time steps (Tay et al., 2020) . Recently, the Structured State Space sequence model (S4) (Gu et al., 2022) , a deep sequence model based on the classic state space model, has achieved state-of-the-art performance on challenging long Second, sensors have complex, non-Euclidean spatial correlations. For example, EEG sensors measure highly correlated yet unique electrical activities from different brain regions (Michel & Murray, 2012) ; traffic speeds are correlated not only based on physical distances between traffic sensors, but also dependent on the traffic flows (Li et al., 2018b) . Graphs are data structures that can model complex, non-Euclidean correlations in the data (Chami et al., 2022; Bronstein et al., 2017) . Previous works have adopted temporal graph neural networks (GNNs) in modeling multivariate time series, such as EEG-based seizure detection (Covert et al., 2019) and classification (Tang et al., 2022b), traffic forecasting (Li et al., 2018b; Wu et al., 2019; Zheng et al., 2020b; Jiang & Luo, 2022; Tian & Chan, 2021) , and pandemic forecasting (Panagopoulos et al., 2021; Kapoor et al., 2020) . Nevertheless, most of these studies use sequences up to hundreds of time steps and require a predefined, static graph structure. However, the graph structure of multivariate signals may not be easily defined due to unknown sensor locations. For instance, while EEG sensors are typically placed according to the 10-20 standard placement (Jasper, 1958) , the exact locations of sensors vary in each individual's recordings due to the variability in individual head size. Moreover, the underlying graph connectivity can evolve over time due to temporal dynamics in the data. Hence, when graph structures cannot be easily predefined, the ability to dynamically learn the underlying graph structures is highly desired. Graph structure learning (GSL) aims to jointly learn an optimized graph structure and its node and graph representations (Zhu et al., 2021) . GSL techniques have been used in non-temporal graph applications, such as natural language processing (Xu et al., 2022) , molecular optimization (Fu et al., 2021) , learning on point clouds (Wang et al., 2019) , and improving GNN robustness against adversarial attacks (Zhang & Zitnik, 2020; Jin et al., 2020) . GSL has also been employed for spatiotemporal modeling of traffic flows (Zhang et al., 2020; Tang et al., 2022a; Shang et al., 2021; Wu et al., 2019; Bai et al., 2020) , irregularly sampled multivariate time series (Zhang et al., 2022) , functional magnetic resonance imaging (fMRI) (El-Gazzar et al., 2021; Gazzar et al., 2022b) , and sleep staging (Jia et al., 2020) , but they are limited to sequences of less than 1k time steps and do not capture dynamic graph structures evolving over time. In this study, we address the foregoing challenges by (1) leveraging S4 to enable long-range temporal modeling and (2) proposing a graph structure learning layer to learn dynamically evolving graph structures in multivariate signals. Our main contributions are: • We propose GRAPHS4MER (Figure 1 ), a general end-to-end GNN architecture for spatiotemporal modeling of multivariate signals. 



Figure 1: Architecture of GRAPHS4MER. The model has three main components: (1) stacked S4 layers to learn temporal dependencies in each sensor independently; (2) a graph structure learning (GSL) layer to learn dynamically evolving graph structures; (3) GNN layers to learn spatial dependencies based on S4 embeddings and learned graph structures. For GSL, we adopt (a) self-attention and (b) a learnable embedding for inductive and transductive settings, respectively.

Our model has two major advantages: (1) it leverages S4 to capture long-range temporal dependencies in signals and (2) it is able to dynamically learn the underlying graph structures in the data without a predefined graph. • We evaluate GRAPHS4MER on three datasets with distinct data types and tasks. Our model consistently outperforms existing methods on (1) seizure detection from EEG signals, outperforming a previous GNN with self-supervised pretraining by 3.1 points in AUROC; (2) sleep staging from polysomnography signals, outpeforming existing sleep staging models

