DATEFORMER: TRANSFORMER EXTENDS LOOK-BACK HORIZON TO PREDICT LONGER-TERM TIME SERIES

Abstract

Transformers have demonstrated impressive strength in long-term series forecasting. Existing prediction research mostly focused on mapping past short sub-series (lookback window) to future series (forecast window). The longer training dataset time series will be discarded, once training is completed. Models can merely rely on lookback window information for inference, which impedes models from analyzing time series from a global perspective. And these windows used by Transformers are quite narrow because they must model each time-step therein. Under this point-wise processing style, broadening windows will rapidly exhaust their model capacity. This, for fine-grained time series, leads to a bottleneck in information input and prediction output, which is mortal to long-term series forecasting. To overcome the barrier, we propose a brand-new methodology to utilize Transformer for time series forecasting. Specifically, we split time series into patches by day and reform point-wise to patch-wise processing, which considerably enhances the information input and output of Transformers. To further help models leverage the whole training set's global information during inference, we distill the information, store it in time representations, and replace series with time representations as the main modeling entities. Our designed time-modeling Transformer-Dateformer yields state-of-the-art accuracy on 7 real-world datasets with a 33.6% relative improvement and extends the maximum forecast range to half-year. 1

1. INTRODUCTION

Time series forecasting is a critical demand across many application domains, such as energy consumption, economics planning, traffic and weather prediction. This task can be roughly summed up as predicting future time series by observing their past values. In this paper, we study long-term forecasting that involves a longer-range forecast horizon than regular time series forecasting. Logically, historical observations are always available. But most models (including various Transformers) infer the future by analyzing the part of past sub-series closest to the present. Longer historical series is merely used to train model. For short-term forecasting that more concerns series local (or call short-term) pattern, the closest sub-series carried information is enough. But not for long-term forecasting that requires models to grasp time series' global pattern: overall trend, longterm seasonality, etc. Methods that only observe the recent sub-series can't accurately distinguish the 2 patterns and hence produce sub-optimal predictions (see Figure 1a , models observe an obvious upward trend in the zoom-in window. But zoom out, we know that's a yearly seasonality. And we can see a slightly overall upward trend between the 2 years power load series). However, it's impracticable to thoughtlessly input entire training set series as lookback window. Not only is no model yet can tackle such a lengthy series, but learning dependence from therein is also tough. Thus, we ask: how to enable models to inexpensively use the global information in training set during inference? In addition, the throughput of Transformers (Zhou et al., 2022; Liu et al., 2021; Wu et al., 2021; Zhou et al., 2021; Kitaev et al., 2020; Vaswani et al., 2017) , which show the best performance in long-term forecasting, is relatively limited, especially for fine-grained time series (e.g., recorded per 15 min, half-hour, hour). Given a common time series recorded every 15 minutes (96 time-steps per day), with 24GB memory, they mostly fail to predict next month from 3 past months of series, even if 1 Code will be released soon. 1

