REPRESENTING MULTI-VIEW TIME-SERIES GRAPH STRUCUTRES FOR MULTIVARIATE LONG-TERM TIME-SERIES FORECASTING

Abstract

Multivariate long-term time-series forecasting task is a very challenging task in real-world application areas, such as electricity consumption and influenza-like illness forecasting. At present, researchers are focusing on designing robust and effective models, and have achieved good results. However, there are several issues with existing models that need to be overcome to ensure they provide optimal performance. First, the lack of a relationship structure between multivariate variables needs to be addressed. Second, most models only have a weak ability to capture local dynamic changes across the entire long-term time-series. And, third, the current models suffer from high computational complexity and unsatisfactory accuracy. To address these issues, we propose a novel method called Multi-view Time-series Graph Structure Representation (MTGSR) for multivariate long-term time-series forecasting tasks. MTGSR uses graph convolutional networks (GCNs) to construct topological relationships in the multivariate long-term time-series from three different perspectives: time, dimension, and crossing segments. Variation trends in the different dimensions of the multivariate long-term time-series are extracted through a difference operation so as to construct a topological map that reflects the correlations between the different dimensions. Then, to capture the dynamically changing characteristics of the fluctuation correlations between adjacent local sequences, MTGSR constructs a cross graph by calculating the correlation coefficients between adjacent local sequences. Extensive experiments on five different datasets show that MTGSR reduces errors by 20.41% over the state-of-the-art while maintaining linear complexity. Additionally, memory use is decreased by 66.52% and running time is reduced by 78.09%.

1. INTRODUCTION

In reality, a large amount of time-series data is produced in various fields, such as weather forecasting (Hewage et al., 2021; Rasp et al., 2020) , electricity power planning (Qader et al., 2022; Oreshkin et al., 2021) , disease propagation prejudgment (Li et al., 2021; Zimmer & Yaesoubi, 2020) , and more. Although challenging to model the long-term relationships and multivariate correlations within these real-world time-series are important elements of most practical forecasting tasks involving these data. Thus, in this paper, we focus on multivariate long-term time-series forecasting task, which has higher requirements for models than ordinary time-series forecasting tasks. In recent years, deep learning models have been thoroughly investigated for their power at multivariate longseries forecasting tasks with many achieving good results (Liu et al., 2021; Torres et al., 2021; Lim & Zohren, 2021) . For example, Transformer-based models, the mainstream framework for multivariate long-term time-series forecasting tasks, relies on multi-head self-attention as a core mechanism for extracting powerful characteristics from historical data (Nikita et al., 2020; Zhou et al., 2021; Xu et al., 2021; Zhou et al., 2022) . These characteristics are then analyzed to predict long sequences containing data from farther in the future. However, there are still several extremely challenging issues in multivariate long-term time-series forecasting tasks that need to be addressed. First, existing models do not construct relationships between multivariate variables. Rather, they pay more attention to capturing the temporal features of the series, which means they simply use dimensional mappings to extract blurry relationships between multivariate variables. Topologies between different variables cannot effectively be constructed using this approach. Second, in addition to the relationship between tokens, the characteristics of dynamically changing fluctuations between local sequences in long-term time-series are also important. Yet most existing models process sequences from a global view such that the features of the local fluctuations are entangled with the overall features. Third, the current models still have room to improve accuracy. And, further, most of the models that perform well have a high computational complexity caused by complex structures. To construct a relationship graph of multivariate variables, we turned to graph convolutional networks (GCNs) (Welling & Kipf, 2016) . GCNs are typical graph neural networks used to extract the features of vertices connected by edges in a graph. One advantage of GCNs is that they can generate a more representative topology and richer node properties by passing information between neighboring nodes. Hence, we attempted to build a model by treating the multivariate variables in a multivariate long-term time-series as the nodes of a graph and using the correlations between different variables fluctuating over time as the weights of the graph's edges. Through experiments, we found that these operations could generate dimensional graphs with rich spatial features. Moreover, the GCNs could be used to extract more appropriate topological features between multivariate variables from the obtained dimensional graphs. Additionally, we subsequently found that the process of generating graphs from the time-series and the GCNs was also good for extracting several other graph characteristics-including the temporal characteristics of the long-term time-series and the characteristics of the local sub-sequences with dynamically changing fluctuations. Inspired by these preliminary studies, we developed a novel and effective model named Multi-view time-series Graph Structure Representation (MTGSR) for multivariate long-term time-series forecasting tasks. MTGSR extracts disentangled information from the input time-series to dynamically generate graph structures from three perspectives: the time-view, the dimension-view, and a crossview. In terms of the time-view, MTGSR builds a time graph using Time Graph Generator by calculating the correlations for all the dimensional information between different timestamps. Unlike the normal process of generating a time graph, MTGSR adds a differential operation to process the inputs when generating dimensional graphs so as to extract valid information from the relative fluctuations of the time-series across different dimensions. The cross-view takes into account the correlations of all the fluctuations that dynamically change over time between two adjacent local sequences in the long-term time-series. Because extracting features directly from the entire length of the time-series would result in too many redundant information and probably cause the model to overfit, the objects MTGSR's three graph generators use are local sequences split from the intact multivariate long-term time-series. Benefiting from this design, the scale of the parameters is greatly reduced to the point that the model has a linear complexity. Further, Inspired by Transformer's multihead attention mechanism, a multi-head mechanism is used at MTGSR's input stage to improve its ability to capture different features from the input sequence. This strategy proves to increase the prediction accuracy of the model. In fact, MTGSR outperforms the state-of-the-art model on five data benchmarks in terms of accuracy, memory use, and running times. The contributions of this paper are summarized as follows: • We propose a novel model named Multi-view Time-series Graph Structure Representation (MTGSR) for multivariate long-term time-series forecasting tasks. MTGSR uses GCNs to learn the complex disentangled characteristics in multivariate long-term time-series from three perspectives: the time view, the dimension view, and the cross-segment view. • To construct topologies between multivariate variables, MTGSR uses a GCN-based Dimension Graph Generator to dynamically learn the structural relationships in the multivariate long-term time-series after differencing operations. • To capture the dynamically changing characteristics of the fluctuation correlations between adjacent local sequences in the whole long-term time-series, MTGSR construct a crosssegment graph by calculating the correlation coefficients between adjacent local sequences through the Cross-segments Graph Generator. • Extensive experiments with five datasets show that MTGSR reduces errors by 20.41% while maintaining a linear complexity compared to the state-of-the-art framework FEDformer. Additionally, MTGSR reduces memory use by 66.52% and running time by 78.09%.

2.1. DEEP LEARNING MODELS FOR MULTIVARIATE LONG-TERM TIME-SERIES FORECASTING

Multivariate long-term time-series forecasting is an important research direction in the field of time prediction (Liu et al., 2021; Torres et al., 2021) . In the original field of time-series forecasting, recurrent neural networks (RNNs) (Stankeviciute et al., 2021; Qin et al., 2017; Madan & Mangipudi, 2018) are one of the more widely used deep learning models. As the requirements for the forecasting task increase, the length of the time-series that models need to predict is growing longer. For this reason, an enhanced version of the RNN, LSTM, has been used to model long-term time-series prediction tasks. However, LSTM-based models (Smyl, 2020; Sagheer & Kotb, 2019; Shen et al., 2020) , which iteratively generate prediction sequences, produce cumulative errors and the errors affect the models' prediction accuracy. As a way to address this problem, temporal convolution networks (TCNs) have emerged (Wan et al., 2019; Shen et al., 2020) . These frameworks directly generate all prediction sequences by imitating the principles of a convolution neural network (CNN). In recent years, Transformer (Vaswani et al., 2017) , a model with stronger theoretical advantages, has shown great power over tasks such as audio processing (Gong et al., 2022; Sajid et al., 2021) , natural language processing (Wolf et al., 2020; Guo et al., 2019; Zhang & Zhang, 2020) , and time-series forecasting (Wu et al., 2020; Li et al., 2019; 2021) . However, limited by the high computational complexity O(L 2 ) of self-attention mechanisms that sits at their core, Transformer-based models cannot be used directly to handle long-term time-series. Some Transformer variants do focus on reducing the computational complexity of the self-attention mechanism. For instance, LogTrans (Li et al., 2019) incorporate a sparse attention mechanism named LogSparse attention, which reduces the model's computational complexity to O(LlogL). Reformer (Nikita et al., 2020) presents a novel local-sensitive hashing (LSH) attention based on a hash algorithm, which also has O(LlogL) complexity. In addition, Informer (Zhou et al., 2021) introduces a query sparsity measurement with a distilling mechanism that yields low computational complexity. However, these variant models do not perform well enough in terms of prediction accuracy. Compared with the above models, Autoformer (Xu et al., 2021) is much more accurate. Autoformer includes a time-series decomposition module and an AutoCorrelation mechanism in place of self-attention. However, with the advent of FEDformer (Zhou et al., 2022) , the prediction accuracy of the Transformer variants has risen again. FEDformer combines a time-series decomposition module and frequency enhanced blocks to greatly improve prediction accuracy. Moreover, it reduces computational complexity to O(L). However, when extracting temporal features from long-term time-series, none of these models consider the relationships between multiple variables from a dimensional perspective. Yet there are meaningful relationships between different variables in most real-world datasets, and extracting these relationships is an important part of studying multivariate long-term time-series prediction.

2.2. GRAPH CONVOLUTIONAL NETWORKS

Graph convolutional networks (GCNs) (Welling & Kipf, 2016) are representative graph neural networks that have been widely used in link prediction (Yun et al., 2021; Yan et al., 2021) , social networks (Tong, 2020; Tian et al., 2021b) and graph anomaly detection (Tian et al., 2021a; Markovitz et al., 2020) . GCNs can aggregate the information from adjacent nodes and filter out interference in a graph. Thus, GCNs have an advantage when processing graph structure data. Today, researchers can dynamically generate graphs from time-series data by building graph structure learning models (Zhao et al., 2021b; Fatemi et al., 2021) , where GCNs are used to extract the characteristics of the processed data and the generated graph structures. This has expanded the applicability of GCNs to a wider range of tasks, such as traffic forecasting (Guo et al., 2021; Sofianos et al., 2021; Lan et al., 2022) , health data processing (Zhao et al., 2021a; Ntemi et al., 2022; Zhang et al., 2022) , and so on. Particularly in short-term forecasting tasks, such as traffic forecasting, GCNs have performed very well as mainstream model frameworks. But, to the best of our knowledge, no GCN-based model has currently been designed for multivariate long-term time-series forecasting tasks given multi-domain datasets. In multivariate long-term time-series forecasting tasks, existing models lack the ability to extract dependencies between multivariate variables from multivariate long-term time-series. Most of these models focus their attention on extracting features in the time and frequency domains of the timeseries. By contrast, our framework is designed to use GCNs from a dimensional perspective to construct the complex relationships between the multivariate variables in multivariate long-time time-series.

3. METHODOLOGY

The task with multivariate long-term time-series forecasting is to predict long-term future sequences P ∈ R Lp×din with minimal errors based on historical time-series data X ∈ R Lx×din . As mentioned, several problems currently exist in multivariate long-term time-series forecasting-these being: 1) the lack of a relationship structure between multivariate variables; 2) weak ability to capture local dynamic changes across the full long-term time-series; and 3) high computational complexity and unsatisfactory accuracy. To address these problems, our framework incorporates GCNs that process dynamically-constructed graphs with topological information from three different perspectives: the time view, the dimension view, and the cross-segments view. Inspired by multi-head attention, we designed a multi-head mechanism to improve the model's ability to capture different features. Within this mechanism, RevIN (Kim et al., 2021) improves the model's prediction accuracy with datasets of different distributions. Lastly, we propose Multi-view Time-series Graph Structure Representation (MTGSR), a model with high-precision and low complexity. The overall architecture of our framework is shown in Figure 1 .

3.1. MULTI-HEAD MECHANISM

The multi-head mechanism consists of two modules: RevIN and a parallel linear layer group. This mechanism is designed to provide stable and informative features for subsequent graph generators, with the detailed structure shown in Figure 1 . In a real world dataset, the distributions of the overall data differ over time. Hence, the RevIN module (Kim et al., 2021) dynamically normalizes the input sequence. This process means that input sequences normalized by RevIN all conform to the same distribution each time, which improves the effectiveness of the features extracted by the model. The formulation is: X n = RevIN (X) where X = {x 1 , . . . , x Lx |x i ∈ R din } denotes the input sequence, RevIN (•) denotes the function of RevIN module and X n denotes the hidden layer sequence normalized by the RevIN module. To extract informative features, a parallel linear layer group replaces a single linear layer, with reference to the principle of the multi-head attention mechanism in Transformer (Vaswani et al., 2017) . This mechanism enhances the model's ability to discover the information in X n from different representation subspaces at different positions. Formally, it is expressed as: H i = X n W i + b i (2) where H i ∈ R Lx×d denotes the hidden features, and W i ∈ R din×d represents the learnable parameter matrix of the i-th head in hidden layer. b i is the corresponding learnable bias.

3.2. MULTI-VIEW GRAPH GENERATOR

The multi-view graph generator module extracts the characteristics of the multivariate long-term time-series from multiple perspectives. It contains three graph generators: including Crosssegments Graph Generator, Dimension Graph Generator and Time Graph Generator. The module relies on GCNs to process the graphs generated by the three graph generators so as to obtain the features that contain different subspace information from the input sequence. From experiments, we found that learning the relationship between each pair of tokens across the entire multivariate long-term time-series resulted in an oversized and overfit model. So, to overcome this problem, we split the input sequence into several cross segments of the same size l, where adjacent segments partially cover each other. The process is formulated as: S 1 , S 2 , . . . , S n = Split(H i ) where S j ∈ R l×d represents the j-th segment and Split(•) represents the splitting function. The overall architecture of Multi-view Graph Generator module is shown in Figure 1 . Details of the three generators are provided in the following sections and the architectures of them are shown in Appendix A.

3.2.1. CROSS-SEGMENTS GRAPH GENERATOR

To learn the variations in the fluctuations of the local sub-sequences, we designed the Crosssegments Graph Generator to dynamically generate a relationship graph between two sub-sequences. This generator first projects two adjacent segments onto the feature space through a linear mapping function. Then, the features are normalized. The formulation is: F j = N orm(W c1 S j + b c1 ) F j+1 = N orm(W c2 S j+1 + b c2 ) where W c1 , W c2 ∈ R l×l are two learnable weight matrices, and b c1 and b c2 are two learnable bias. N orm(•) represents the normalization function. All the learnable parameters are shared in each Cross-segments Graph Generator. The generator then calculates the correlation value of the two feature sequences F j and F j+1 to construct a relationship graph, which is formulated as: G cross = Sof tmax2d(F j F T j+1 ) where G cross ∈ R L×L is the relationship graph of the feature sequences F j and F j+1 . To further extract the information on the graph G cross and update the information of the corresponding elements in the feature sequence F j , a GCN is used to process the graph G cross and feature sequence F j . The formulation is: F ′ j = D-1 2 Gcross D-1 2 F j W cross (6) where F ′ j is the updated feature sequence, D-1 2 Gcross D-1 2 is the normalized adjacency matrix of G cross and W cross is a learnable matrix. In our experiment, we simplify Equation 6 to Equation 7to get more efficient with almost no effect loss of accuracy. F ′ j = G cross F j W cross F ′ j is then feature-mapped by a nonlinear layer using a residual connection (He et al., 2016) followed by another nonlinear layer. This formulated as: F cross = F ′ j + σ(F ′ j W nlc + b nlc ) where F cross ∈ R l×d is the updated feature sequence, σ is a activation function, W nlc ∈ R d×d and b nlc are the learnable parameters of the nonlinear layer.

3.2.2. DIMENSION GRAPH GENERATOR

The Dimension Graph Generator dynamically generates a dimension graph that represents the correlation coefficients of each pair of multivariate variables. This generator takes the resulting feature sequence F cross (Section 3.2.1) as its input. The next step is to perform a differential operation on F cross to establish the trend of fluctuations in the series. The formulation is: F d = Dif f (F cross ) where F d ∈ R (l-1)×d is the output of the differential operation Dif f (•). Then, through a linear mapping and normalization operation, the feature sequence F d is updated to F n , formulated as: F n = N orm(W d F d + b d ) where W d ∈ R (l-1)×(l-1) and b d are two learnable parameters. Next, the correlation coefficients of each dimension are calculated in the feature sequence F n , and a dimension graph is constructed. G dim = Sof tmax2d(F T n F n ) where G dim ∈ R d×d is the dimension graph and Sof tmax2d is a normalization function. Finally, a GCN is used to extract information from the dimension graph G dim to update the feature sequence F cross . The formulation is: F ′ d = G dim F cross W dim F dim = F ′ d + σ(F ′ d W nld + b nld ) where F dim ∈ R l×d is the updated feature sequence, W dim , W nld ∈ R d×d and b nld are the learnable parameters.

3.2.3. TIME GRAPH GENERATOR

The Time Graph Generator is a structure that extracts the relationship between each timestamp from the perspective of time. We contend that the values for each timestamp have an important relationship in this perspective, so, unlike the Dimension Graph Generator, the Time Graph Generator does not calculate the difference. The formulation is: F t = N orm(W t F dim + b t ) G time = Sof tmax2d(F t F T t ) where G time ∈ R l×l is the obtained time graph, and W t ∈ R l×l and b t are the learnable parameters. A GCN is then used to update the feature sequence, which is formulated as: F ′ t = G time F dim W time F time = W o (F ′ t + σ(F ′ t W nlt + b nlt )) + b o where F time ∈ R l×d is the updated feature sequence, and W time , W nlt ∈ R d×d , W o ∈ R l×l , b o and b nlt are the learnable parameters. Finally, the feature sequences produced in parallel by the multi-head mechanism are combined to give the final prediction sequence. The formulation is: S ′ j = W j F time j P = RevIN (Concat(S ′ 1 , S ′ 2 , . . . , S ′ n ′ )) where P ∈ R Lp×din is the prediction sequence, F time j represents the updated feature sequence of j-th head, W j is the learnable matrix of j-th head, and Concat(•) are the concatenation function.

4. EXPERIMENTS

To evaluate MTGSR's performance, we performed extensive experiments on five publicly available datasets covering different fields. Additionally, we selected five baselines for comparison. 

4.1. DATASETS

In this section, we show the description of the five datasets: 1) ETT , which contains seven attributes, such as load and oil temperature, collected every 15 minutes from electricity transformers between July 2016 and July 2018. 2) Exchange , with eight attributes, which records the daily exchange rate from eight countries between 1990 to 2016. 3) Weather , containing 21 weather-related attributes collected every 10 minutes for the whole of 2020. 4) Electricity , which consists of 321 customers and records their hourly electricity consumption between 2012 and 2014. 5) ILI , containing seven patient attributes, collected by the Centers for Disease Control and Prevention of the United States between 2002 and 2021. These datasets were split into training, validation, and testing sets for experimentation according to a 7:1:2 ratio.

4.2. IMPLEMENTATION DETAILS

We chose L2 as the loss function to train the model and selected mean square error (MSE) and mean absolute error (MAE) as the evaluation metrics. We used the ADAM optimizer with an initial learning rate of 0.001. The batch size was set to 64. The number of heads in the multi-head mechanism was set to 8, and the hidden dimension in each head of the multi-head mechanism was set to 64. All experiments were repeated three times, and their average values were recorded as the result. The size of segments in MTGSR were set to 12 for the ETT, Electricity, and Illness datasets, and to 24 for the Exchange and Weather datasets. A hyperparameter sensitivity analysis is provided in Section 4.5.1. All experiments were run on a single Nvidia RTX3090 24GB GPU.

4.3. BASELINE

We selected five of the latest state-of-the-art methods as baselines to compare with MTGSR, including FEDformer Zhou et al. (2022 ), Autoformer Xu et al. (2021 ), Informer Zhou et al. (2021) , LogTrans Li et al. (2019) and Reformer Nikita et al. (2020) .

4.4. MAIN RESULTS

We set the input sequence to a fixed length and evaluated the performance of the proposed MT-GSR and baselines over four prediction lengths with the ETTm1, Exchange, Weather and Electricity datasets: 96, 192, 336, and 720 and to 24, 36, 48, and 60 with the Illness dataset. Table 1 shows the results of the experiments. MTGSR yielded better results than the state-of-the-arts in all benchmarks and all prediction length settings. MTGSR's improvements were particularly pronounced on the datasets with strong correlations between multivatiate attributions, such as the Exchange, Weather and ILI datasets, at 24.54%, 26.41%, and 31.78%, respectively. On the other two datasets ETTm1 and Electricity datasets, MTGSR also good performance with a respective 9.58% and 11.94% reduction in MSE. Overall, MTGSR reduced errors by an average of 20.41% across all experiments. These results demonstrate that no matter whether the prediction is short-term (at a prediction length of 96) or long-term (a prediction length of 720), MTGSR's performance is relatively stable. At the same time, MTGSR maintains high precision while remaining low in computational complexity and in terms of model scale, making it easier to migrate MTGSR to edge devices for deployment. Section 4.5.3 next contains details of the hyperparameter sensitivity, the complexity analysis, and the model scale.

4.5. MODEL ANALYSIS

In this section, we show the hyperparameter sensitivity analysis, an ablation study of each graph generator and the efficiency analysis. The prediction sequences and heatmaps of each view of MTGSR are visualized in Appendix B. Results of some expanded experiments on the ETT series datasets are shown in Appendix C. We performed extended experiments to study MTGSR's hyperparameters, including the size of the segments for the Multi-view Graph Generator and the number of heads in the multi-head mechanism. In terms of the segment size, we selected three values -12, 24 and 48 -and performed experiments with the Exchange and Electricity datasets. The results are shown in Table 2 . With the Exchange dataset, the best-performing hyperparameter setting was 24. With the Electricity dataset, the optimal segment sizeb was 12. Hence, from these experiments, we determined that segment size needs to be selected depending of the datasets. With the remaining datasets, the optimal segment size was 12 for the ETTand Illness datasets, and 24 for the Weather dataset. In terms of the number of heads in the multi-head mechanism, we performed experiments with the Exchange dataset, setting the number of heads to 1,4,8, and 16. Table 3 shows the results, indicating that MTGSR performed best with 8 heads.

4.5.2. THE EFFECT OF DIMENSION GRAPH AND CROSS-SEGMENTS GRAPH

To verify the effectiveness of the dimension graph and the cross-segments graph, we conducted ablation experiments. We removed the Dimension Graph Generator and the Cross-segments Graph Generator to get two variants: MTGSR † and MTGSR ‡ . Compared to MTGSR, MTGSR † and MTGSR ‡ decreased prediction accuracy by 8.86% and 8.56%, respectively, and the results are shown in Table 4 . After removing the dimension graph, there was a large decline with the Exchange and Weather datasets, with decreases in accuracy of 14.47% and 9.05%, respectively. This is because, there is a strong relationship between the multivariate variables on both the Exchange and Weather datasets. With the ILI dataset, there is a strong relationship between the fluctuations of local adjacent segments; hence, MTGSR ‡ 's prediction accuracy without the cross-segment graph was greatly reduced, decreasing by 11.68%. Through the experimental results, the dimension graph and cross-segments graph have different degrees of improvement depending on the characteristics of the dataset.

4.5.3. EFFICIENCY ANALYSIS

To prove the efficiency of MTGSR, we performed extensive experiments comparing the model size and runtime of MTGSR to the baselines Informer, Autoformer, and FEDformer. Figure 2a shows that MTGSR is a linear model with a greatly reduced model size, compared to FEDformer and the other models. In the case of a prediction length of 1800, MTGSR reduced memory use by 66.52% over FEDformer. Figure 2b shows the comparison between the runtimes of the four models for one iteration. Compared to FEDformer, MTGSR ran 78.09% faster at a prediction length of 1800. 

5. CONCLUSION

In this paper, we proposed a novel and efficient model named Multi-view Time-series Graph Structure Representation (MTGSR) with a linear computational complexity for multivariate long-term time-series forecasting problems. To more comprehensively extract disentangled characteristics from multivariate long-term time-series, MTGSR uses GCNs to learn the characteristics from three perspectives. MTGSR generates dimension graphs and cross-segment graphs to learn the structural relationships between multivariate variables as well as the dynamically changing characteristics of the fluctuation correlations between adjacent local sequences. In a comprehensive series of experiments, MTGSR outperforms the current state-of-the-art models and exhibits lower memory usage and faster running speeds than the state-of-the-art models. In the future, we will continue to study applications for graph neural networks that pertain to multi-view graph construction problems and multivariate long-term time-series.

A THE ARCHITECTURES OF THREE PERSPECTIVE GRAPH GENERATORS

In Figures 3a-3c , we show the detail architectures of three perspective graph generators: Crosssegments Graph Generator, Dimension Graph Generator and Time Graph Generator. All graph generators build topology graphs based on correlation coefficients and obtain the final generated graph through a 2D-Sof tmax(•) normalization module. 

C EXTEND EXPERIMENTS ON ETT SERIES DATASETS

We performed extensive experiments on the ETT series datasets, including ETTh1, ETTh2, ETTm1 and ETTm2. The results are shown in Table 5 . On the ETTh2, ETTm1 and ETTm2 datasets, MTGSR respectively reduced the error rate by 8.47%, 3.33%, and 9.58% compared to the state-of-the-art FEDformer. 



Figure 1: Multi-view Time-series Graph Structure Representation architecture. The multi-head mechanism provides stable and informative hidden features to the Multi-view Graph Generator module. The Multi-view Graph Generator module extracts the disentangled characteristics of the time stamps, the multivariate variables, and the crossing segments.

Figure 2: Efficiency Analysis. In both analyses, we perform the experiments at seven prediction lengths. The results show that MTGSR used significantly less memory and has a faster running time than the current state-of-the-arts.

(a) Cross-segments Graph Generator. By calculating the cross-correlation value of two adjacent crossed sub-sequences, the Cross-segments Graph Generator generates asymmetrical graphs to represent the dynamically changing characteristics of the sub-sequences' fluctuations.(b) Dimension Graph Generator. The Dimension Graph Generator dynamically generates a dimension graph that represents the correlation coefficients of each pair of multivariate variables. Differential operator is used to establish the trend of fluctuations in the series.(c) Time Graph Generator. The Time Graph Generator extracts the relationship between each timestamp from the perspective of time.

Figure 3: The architectures of three perspective graph generators: Cross-segments Graph Generator.

Figure 6: Heatmap Visualization.

Multivariate long-term time-series forecasting results on five datasets with an input length of I = 96 and a prediction length of O ∈ {96, 192, 336, 720}. Note that with the ILI dataset, we used a prediction length of O ∈ {24, 36, 48, 60}). A lower MSE indicates better performance; the best results are highlighted in bold.

The experimental results of MTGSR with segment sizes of 12, 24, and 48 on the Exchange and Electricity datasets.

The experimental results of MTGSR with head number of 1, 4, 8, 16 on Exchange datasets.

MTGSR ‡ : MTGSR removes Cross-segments Graph Generator.

Ablation experiments with MTGSR, MTGSR † and MTGSR ‡ on four dataset: Exchange, Weather, Electricity and ILI. The best results appear in bold; suboptimal results are underlined.

