MICN: MULTI-SCALE LOCAL AND GLOBAL CONTEXT MODELING FOR LONG-TERM SERIES FORECASTING

Abstract

Recently, Transformer-based methods have achieved surprising performance in the field of long-term series forecasting, but the attention mechanism for computing global correlations entails high complexity. And they do not allow for targeted modeling of local features as CNN structures do. To solve the above problems, we propose to combine local features and global correlations to capture the overall view of time series (e.g., fluctuations, trends). To fully exploit the underlying information in the time series, a multi-scale branch structure is adopted to model different potential patterns separately. Each pattern is extracted with down-sampled convolution and isometric convolution for local features and global correlations, respectively. In addition to being more effective, our proposed method, termed as Multi-scale Isometric Convolution Network (MICN), is more efficient with linear complexity about the sequence length with suitable convolution kernels. Our experiments on six benchmark datasets show that compared with state-of-the-art methods, MICN yields 17.2% and 21.6% relative improvements for multivariate and univariate time series, respectively. Code is available at https://github. com/wanghq21/MICN.

1. INTRODUCTION

Researches related to time series forecasting are widely applied in the real world, such as sensor network monitoring (Papadimitriou & Yu., 2006) , weather forecasting, economics and finance (Zhu & Shasha, 2002) , and disease propagation analysis (Matsubara et al., 2014) and electricity forecasting. In particular, long-term time series forecasting is increasingly in demand in reality. Therefore, this paper focuses on the task of long-term forecasting. The problem to be solved is to predict values for a future period: X t+1 , X t+2 , ..., X t+T -1 , X t+T , based on observations from a historical period: X 1 , X 2 , ..., X t-1 , X t , and T ≫ t. As a classic CNN-based model, TCN (Bai et al., 2018) uses causal convolution to model the temporal causality and dilated convolution to expand the receptive field. It can integrate the local information of the sequence better and achieve competitive results in short and medium-term forecasting (Sen et al., 2019 ) (Borovykh et al., 2017) . However, limited by the receptive field size, TCN often needs many layers to model the global relationship of time series, which greatly increases the complexity of the network and the training difficulty of the model. Transformers (Vaswani et al., 2017) based on the attention mechanism shows great power in sequential data, such as natural language processing (Devlin et al., 2019 ) (Brown et al., 2020 ), audio processing (Huang et al., 2019) and even computer vision (Dosovitskiy et al., 2021 ) (Liu et al., 2021b) . It has also recently been applied in long-term series forecasting tasks (Li et al., 2019b ) (Wen et al., 2022) and can model the long-term dependence of sequences effectively, allowing leaps and bounds in the accuracy and length of time series forecasts (Zhu & Soricut, 2021 ) (Wu et al., 2021b ) (Zhou et al., 2022) . The learned attention matrix represents the correlations between different time points of the sequence and can explain relatively well how the model makes future predictions based on past information. However, it has a quadratic complexity, and many of the computations

