VARIATIONAL ADAPTIVE GRAPH TRANSFORMER FOR MULTIVARIATE TIME SERIES MODELING Anonymous authors Paper under double-blind review

Abstract

Multivariate time series (MTS) are widely collected by large-scale complex systems, such as internet services, IT infrastructures, and wearable devices. The modeling of MTS has long been an important but challenging task. To capture complex longrange dynamics, Transformers have been utilized in MTS modeling and achieved attractive performance. However, Transformers in general do not well capture the diverse relationships between different channels within MTS and have difficulty in modeling MTS with complex distributions due to the lack of stochasticity. In this paper, we first incorporate relational modeling into Transformer to develop an adaptive Graph Transformer (G-Trans) module for MTS. Then, we further consider stochastity by introducing a powerful embedding guided probabilistic generative module for G-Trans to construct Variational adaptive Graph Transformer (VG-Trans), which is a well-defined variational generative dynamic model. VG-Trans is utilized to learn expressive representations of MTS, being an plug-and-play framework that can be applied to forecasting and anomaly detection tasks of MTS. For efficient inference, we develop an autoencoding variational inference scheme with a combined prediction and reconstruction loss. Extensive experiments on diverse datasets show the efficient of VG-Trans on MTS modeling and improving the existing methods on a variety of MTS modeling tasks.

1. INTRODUCTION

Multivariate time series (MTS) is an important type of data that arises from a wide variety of domains, including internet services (Dai et al., 2021; 2022) , industrial devices (Finn et al., 2016; Oh et al., 2015) , health care (Choi et al., 2016b; a), and finance (Maeda et al., 2019; Gu et al., 2020) , to name a few. However, the modeling of MTS has always been a challenging problem as there exist not only complex temporal dependencies, as shown in the red box in Fig. 1 , but also diverse crosschannel dependencies, as shown in the blue box in Fig. 1 . Moreover, there exist inherently stochastic components, as shown in the green box in Fig. 1 , even if one can fully capture both temporal and cross-channel dependencies. To address these challenges, many deep learning based methods have been proposed for various MTS tasks, such as forecasting, anomaly detection, and classification. To model the temporal-dependencies of MTS, many dynamic methods based on recurrent neural networks (RNNs) have been developed (Malhotra et al., 2016; Zhang et al., 2019; Bai et al., 2019b; Tang et al., 2020; Yao et al., 2018) . Meanwhile, to take the stochasticity into consideration, some probabilistic dynamic methods have also been developed (Dai et al., 2021; 2022; Chen et al., 2020; 2022; Salinas et al., 2020) . With the development of Transformer (Vaswani et al., 2017) and due to its ability to capture long-range dependencies (Wen et al., 2022; Dosovitskiy et al., 2021; Dong et al., 2018; Chen et al., 2021) , and interactions, which is especially attractive for time series modeling, there is a recent trend to construct Transformer based MTS modeling methods and have achieved promising results in learning expressive representations for down-stream tasks. For example, for forecasting, LogTrans (Li et al., 2019) 



incorporates causal convolutions into self-attention layer to consider local temporal dependencies of MTS. Informer (Zhou et al., 2021) develops a probsparse self-attention mechanism for long sequence forecasting. AST (Wu et al., 2020) further constructs a generative adversarial encoder-decoder framework for better predicting output distribution. In addition, there are also some other efficient Transformer-based forecasting methods, such as Autoformer (Xu et al., 2021), FEDformer (Zhou et al., 2022), and TFT (Lim et al., 2021).

