VARIATIONAL ADAPTIVE GRAPH TRANSFORMER FOR MULTIVARIATE TIME SERIES MODELING Anonymous authors Paper under double-blind review

Abstract

Multivariate time series (MTS) are widely collected by large-scale complex systems, such as internet services, IT infrastructures, and wearable devices. The modeling of MTS has long been an important but challenging task. To capture complex longrange dynamics, Transformers have been utilized in MTS modeling and achieved attractive performance. However, Transformers in general do not well capture the diverse relationships between different channels within MTS and have difficulty in modeling MTS with complex distributions due to the lack of stochasticity. In this paper, we first incorporate relational modeling into Transformer to develop an adaptive Graph Transformer (G-Trans) module for MTS. Then, we further consider stochastity by introducing a powerful embedding guided probabilistic generative module for G-Trans to construct Variational adaptive Graph Transformer (VG-Trans), which is a well-defined variational generative dynamic model. VG-Trans is utilized to learn expressive representations of MTS, being an plug-and-play framework that can be applied to forecasting and anomaly detection tasks of MTS. For efficient inference, we develop an autoencoding variational inference scheme with a combined prediction and reconstruction loss. Extensive experiments on diverse datasets show the efficient of VG-Trans on MTS modeling and improving the existing methods on a variety of MTS modeling tasks.

1. INTRODUCTION

Multivariate time series (MTS) is an important type of data that arises from a wide variety of domains, including internet services (Dai et al., 2021; 2022) , industrial devices (Finn et al., 2016; Oh et al., 2015) , health care (Choi et al., 2016b; a) , and finance (Maeda et al., 2019; Gu et al., 2020) , to name a few. However, the modeling of MTS has always been a challenging problem as there exist not only complex temporal dependencies, as shown in the red box in Fig. 1 , but also diverse crosschannel dependencies, as shown in the blue box in Fig. 1 . Moreover, there exist inherently stochastic components, as shown in the green box in Fig. 1 , even if one can fully capture both temporal and cross-channel dependencies. To address these challenges, many deep learning based methods have been proposed for various MTS tasks, such as forecasting, anomaly detection, and classification. To model the temporal-dependencies of MTS, many dynamic methods based on recurrent neural networks (RNNs) have been developed (Malhotra et al., 2016; Zhang et al., 2019; Bai et al., 2019b; Tang et al., 2020; Yao et al., 2018) . Meanwhile, to take the stochasticity into consideration, some probabilistic dynamic methods have also been developed (Dai et al., 2021; 2022; Chen et al., 2020; 2022; Salinas et al., 2020) . With the development of Transformer (Vaswani et al., 2017) and due to its ability to capture long-range dependencies (Wen et al., 2022; Dosovitskiy et al., 2021; Dong et al., 2018; Chen et al., 2021) , and interactions, which is especially attractive for time series modeling, there is a recent trend to construct Transformer based MTS modeling methods and have achieved promising results in learning expressive representations for down-stream tasks. For example, for forecasting, LogTrans (Li et al., 2019) (Li et al., 2018; Bai et al., 2019a; Yu et al., 2018; Wu et al., 2019; Guo et al., 2019; Pan et al., 2019) have also been developed for MTS forecasting. Adaptive graph convolutional recurrent network (AGCRN) (Bai et al., 2020) further learns node-specific patterns for MTS forecasting without requiring a pre-defined graph. However, these methods are still all non-dynamic or RNN based models, limiting their power in capturing complex relationships across long-distance time steps Moving beyond the constraints of previous work, we first propose an adaptive graph Transformer (G-Trans) module by incorporating a graph into the Transformer structure, which can model both temporal and cross-channel dependencies within MTS. Then, considering the stochasticity within MTS and enhancing the representative power of G-Trans, we further develop a Variational adaptive Graph Transformer (VG-Trans), which is a well-defined probabilistic dynamic model obtained by combining G-Trans with a proposed Embedding-guided Probabilistic generative Module (EPM), as illustrated in Fig. 2 (b ). We note that VG-Trans is able to get the robust representations of MTS, which enables it to be combined with the existed methods and applied to both anomaly detection and forecasting tasks. In addition, we introduce an autoencoding variational inference scheme for efficient inference and a joint optimization objective that combines forecasting and reconstruction loss to ensure the expressive time-series representation learning. The main contributions of our work are summarized as follows: • For MTS modeling, we propose a G-Trans module, which incorporates channel-relationship learning into the Transformer structure. • We develop VG-Trans, a VAE-structured probabilistic dynamic model with G-Trans as encoder and EPM as decoder, which can consider the non-deterministic within both temporal and cross-channel dependencies of MTS. VG-Trans can be combined with different methods and applied to different tasks of MTS. • To achieve scalable training, we introduce an autoencoding inference scheme with a combined prediction and reconstruction loss for enhancing the representation power of MTS. • Experiments on both anomaly detection and forecasting tasks illustrate the efficiency of our model on MTS modeling

2. METHOD

We first present the problem definition, and then introduce the probabilistic channel embedding for measuring the relationships between different channels and present G-Trans by incorporating cross-channel dependence into Transformer. Finally, we develop VG-Trans, a novel variational dynamic model. The notations used in this paper are summarized in Table 4 in Appendix.



incorporates causal convolutions into self-attention layer to consider local temporal dependencies of MTS. Informer (Zhou et al., 2021) develops a probsparse self-attention mechanism for long sequence forecasting. AST (Wu et al., 2020) further constructs a generative adversarial encoder-decoder framework for better predicting output distribution. In addition, there are also some other efficient Transformer-based forecasting methods, such as Autoformer (Xu et al., 2021), FEDformer (Zhou et al., 2022), and TFT (Lim et al., 2021).

Figure 1: The temporal dependency, channel relationship and stochasticity within MTS. Besides, for anomaly detection, Meng et al. (2019) illustrate the superiority of using Transformer for anomaly detection over other traditional RNN-based methods. Following it, some modified Transformer-based methods have also been proposed for anomaly detection, such as TransAnomaly (Zhang et al., 2021), ADTrans (Tuli et al., 2022), and Anomaly Transformer (Xu et al., 2022). To address non-deterministic temporal dependence within MTS, Tang & Matteson (2021) further incorporate Transformer structure into state-space models and develop ProTrans. Despite the attractive performance of existing Transformer-based models, their ultimate potentials have been limited by ignoring the cross-channel dependence of MTS. To consider the relationships of different channels within MTS, MSCRED (Zhang et al., 2019) introduces a multi-scale convolutional recurrent encoder&decoder to learn spatial correlations and temporal characteristics in MTS and detects anomalies via the residual signature matrices. InterFusion (Li et al., 2021) incorporates recurrent and convolutional structures into a unified framework to capture both temporal and inter-metric information. Recently, Graph neural networks (GNNs) have gradually attracted more attentions in exploring the relationships. Thus, some GNN-based methods for MTS have been developed (Deng & Hooi, 2021; Zhao et al., 2020)for discovering expressive representations of MTS. Deep variational graph convolutional recurrent network (DVGCRN) incorporates relationship modeling into hierarchical generative process. Moreover, some graph based methods(Li et al., 2018;  Bai et al., 2019a; Yu et al., 2018; Wu et al., 2019; Guo et al., 2019; Pan et al., 2019)  have also been developed for MTS forecasting. Adaptive graph convolutional recurrent network (AGCRN)(Bai  et al., 2020)  further learns node-specific patterns for MTS forecasting without requiring a pre-defined graph. However, these methods are still all non-dynamic or RNN based models, limiting their power in capturing complex relationships across long-distance time steps

