NOVEL FEATURE REPRESENTATION STRATEGIES FOR TIME SERIES FORECASTING WITH PREDICTED FU-TURE COVARIATES

Abstract

Accurate time series forecasting is a fundamental challenge in data science. Unlike traditional statistical methods, conventional machine learning models, such as RNNs and CNNs, use historical data consisting of previously measured variables including the forecast variable and all its covariates. However, in many applications, some of the covariates can be predicted with reasonable accuracy for the immediate future. We refer to such covariates as predictable future covariates. Note that the input may also contain some covariates that cannot be accurately predicted. We consider the problem of predicting water levels at a given location in a river or canal system using historical data and future covariates, some of which (precipitation, tide) may be predictable. In many applications, for some covariates of interest, it may be possible to use historical data or accurate predictions for the near future. Traditional methods to incorporate future predictable covariates have major limitations. The strategy of simply concatenating the future predicted covariates to the input vector is highly likely to miss the past-future connection. Another strategy that iteratively predicts one step at a time can end up with prediction error accumulation. We propose two novel feature representation strategies to solve those limitations -shifting and padding, which create a framework for contextually linking the past with the predicted future, while avoiding any accumulation of prediction errors. Extensive experiments on three well-known datasets revealed that our strategies when applied to RNN and CNN backbones, outperform existing methods. Our experiments also suggest a relationship between the amount of shifting and padding and the periodicity of the time series.

1. INTRODUCTION

Conventional time series forecasting is widely used to predict a set of target variables at a future time point based on past data collected over a predetermined length. Next-step forecasting (Montgomery et al., 2015; Shi et al., 2022) refers to predicting the target variables at a time point one step into the future where the unit of time is the time granularity of the measurements. Multi-horizon forecasting (Quaedvlieg, 2021) predicts the target variables multiple steps into the future Capistrán et al. (2010) . Accurate forecasting allows people to do better resource management and optimization decisions for critical processes (Cinar et al., 2017; Salinas et al., 2020; Rangapuram et al., 2018) . Applications include probabilistic demand forecasting in retail (Böse et al., 2017) , dynamic assignments of beds to patients (Zhang & Nawata, 2018) , monthly inflation forecasting, and much more. Good multi-horizon forecasting requires historical data of the target variables from which to learn long-term patterns. In addition, it also requires measurements from heterogeneous data sources of useful covariates, often from the recent past. However, in many applications, some of the covariates can also be predicted with reasonable accuracy for the immediate future. We refer to such covariates as future covariates. For example, in some applications, a covariate of interest could be "precipitation", for which it is possible to use historical data as well as reasonably accurate predictions for the near future, which may be obtained from the weather service. Despite its importance, only limited approaches exist that use future covariates to improve time series predictions. Related methods can be mainly categorized into direct strategy using sequence-to-sequence models (Mariet & Kuznetsov, 2019) and iterated methods using autoregressive models (Sahoo et al., 2020) . Traditional methods to incorporate future covariates have major limitations. We propose two novel feature representation strategies to solve those limitations -shifting and padding, which create a framework for contextually linking the past with the predicted future, while avoiding any accumulation of prediction errors. Extensive experiments on three well-known datasets revealed that our strategies when applied to RNN and CNN backbones, outperform existing methods. Iterative methods. The iterative strategy recursively uses a Next-step model multiple times where the predicted values for the previous time step is used as the input to forecast the next time step, as in Salinas et al. (2020) . For the prediction at time step t, the target values z t-1 at the previous time step, the (predicted) covariates x t for the current time step, and the context vectors h t-1 that summarize the representation information of all the past time steps are considered as the input to predict target values z t at the current time step using RNNs. Rangapuram et al. (2018) adopted a similar approach by parameterizing a per-time-series linear state space model with recurrent neural networks. Related work with the iterative approach is in Li et al. ( 2019) where the basic architecture used was the transformer model with convolutional layers. Direct methods. The direct method typically uses an encoder model to learn the feature representation of past data, which is saved as context vectors in a hidden state. A decoder model is utilized to intake future covariates and context vectors from the encoder and to then predict the outputs for multi-horizon forecasting, as shown in Figure 1 . The multi-horizon Quantile Recurrent Forecaster by Wen et al. ( 2017) used an LSTM as the encoder to generate context vectors, which are combined with predicted future covariates and fed into a multi-layer perceptron (MLP) to predict the future horizon. Some works ((Fan et al., 2019; Du et al., 2020) ) have applied a temporal attention mechanism between the encoder and the decoder. This architecture is able to learn the relevance of different parts of the feature representations from historical data by computing attentional weights. The weighted feature representations are then passed into the decoder to make predictions for future time steps. In Fan et al. ( 2019), bi-directional LSTMs are used as the decoder backbone allowing past and predicted features to be considered at every future time step. Temporal Fusion Transformer (Lim et al. ( 2021)) combined gated residual networks (GRNs) and an attention mechanism (Vaswani et al., 2017) as an additional decoder on top of the traditional encoder-decoder model. They used GRNs to filter unnecessary information and the additional decoder with attention mechanism to capture long-term dependencies between the time steps. The iterative as well as the direct methods aim to incorporate future covariates as inputs but suffer from several shortcomings. The iterative methods accumulate prediction errors because the input to each time step is the output from the previous step, causing the model performance to quickly degrade for longer forecasting horizons. On the other hand, direct methods are prone to miss some interactions between data from past and future time points. The encoder processes only past data, while the decoder merely concatenates past data and future covariates, which may miss specific relationships between the past and future time points. In this paper, we aim to resolve the shortcomings of both the approaches with a novel architecture.



Figure 1: Direct method using sequence-to-sequence models.

