USING SYNTHETIC DATA TO IMPROVE THE LONG-RANGE FORECASTING OF TIME SERIES DATA Anonymous authors Paper under double-blind review

Abstract

Effective long-range forecasting of time series data remains an unsolved and open problem. One possible approach is to use generative models to improve long-range forecasting, but the challenge then is how to generate high-quality synthetic data. In this paper, we propose a conditional Wasserstein GAN with Gradient and Error Penalty (cWGAN-GEP), aiming to generate accurate synthetic data that preserves the temporal dynamics between the conditioning input and generated data. By using such synthetic data, we develop a long-range forecasting method called Generative Forecasting (GenF). GenF consists of three key components: (i) a cWGAN-GEP based generator, to generate synthetic data for next few time steps. (ii) a predictor which makes long-range predictions based on generated and observed data. (iii) an information theoretic clustering (ITC) algorithm to better train the cWGAN-GEP based generator and the predictor. Our experimental results on three public datasets demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. In most cases, we find an improvement of at least 10% over all studied methods. Lastly, we conduct an ablation study to demonstrate the effectiveness of the cWGAN-GEP and the ITC algorithm.

1. INTRODUCTION

Short-range forecasting of time series data has been able to provide some useful information, but its scope of application is limited (Chatfield, 2000; Granger & Newbold, 2014) . In most applications, long-range forecasting of time series data is preferred as it allows more time for early intervention and planning opportunities (Alvarez et al., 2010; Azad et al., 2014) . For example, long-range forecasting of patient's vital signs effectively gives clinicians more time to take action and may reduce the occurrence of potential adverse events (Luo, 2015; Choi & et al, 2016) . However, a critical issue affecting the long-range forecasting is that the predictive performance (e.g., N -step ahead) becomes worse as N grows. One practical approach to address this problem is to use synthetic data to shorten the prediction horizon N . For example, in iterative forecasting (Marcellino et al., 2006; Hamzaçebi et al., 2009) , the previous predictions are used together with the original data as new inputs to evaluate the next prediction. However, the synthetic data (i.e., previous predictions) generated in this recursive and supervised manner is more susceptible to error propagation (Sorjamaa & et al, 2007; Weigend & Gershenfeld, 2018) . Overall, the quality of synthetic data is the key to improving long-range forecasting. In recent years, the success of Generative Adversarial Network (GAN) (Goodfellow & et al, 2014) in replicating real-world content has inspired numerous GAN based architectures (Frid-Adar et al., 2018a; b; Shin et al., 2018; Bowles et al., 2018) , in order to generate synthetic data for different purposes (e.g., improve classification accuracy). However, the utilization of synthetic data to improve long-range forecasting remains unexplored. In this paper, we contribute to the area of long-range forecasting as follows. 1. We augment the existing conditional Wasserstein GAN with Gradient Penalty with a mean squared error term. This new architecture, called cWGAN-GEP, aims to generate accurate synthetic data which preserves the temporal dynamics between the conditioning input and generated data. 2. We develop a long-range forecasting method called Generative Forecasting (GenF) which consists of three key components: (i) a cWGAN-GEP based generator, to generate synthetic data for next few time steps. (ii) a predictor which makes long-range predictions based on generated and observed data. (iii) an information theoretic clustering (ITC) algorithm to better train the cWGAN-GEP based generator and the predictor. X 1 X 2 • • • X M X M +1 • • • X M +L • • • X M +N Observation Window (Length = M ) Synthetic Data Window (Length = L) Prediction Horizon (Length=N ) Time Step Figure 1 : Long-range forecasting via observation/synthetic data window and prediction horizon. 3. We conduct experiments on three public time series datasets and our results demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. In most cases, we find improvements of at least 10% over the benchmark methods. To the best of our knowledge, our work is the first to propose using synthetic data generated by a GAN based architecture to improve long-range forecasting and the proposed cWGAN-GEP is not explored in any of the previous work.

2. BACKGROUND ON TIME SERIES FORECASTING

In time series forecasting, we use an observation window of historical data to predict the future value (see Fig. 1 ). Let M denote the observation window length and N denote the prediction horizon. For those methods that contain generative models, let L denote the synthetic data window length. Let X i ∈ R K be the observation at the i th time step, where K is the number of features. In Fig. 1 , the prediction horizon is N , indicating that we plan to make predictions N time steps ahead, i.e., at time step t + N (e.g., the circled entry in Fig. 1 ). Next, we discuss related work in Section 2.1 and shortlist several benchmark methods for comparison in Section 2.2.

2.1. RELATED WORK

Early methods which use neural networks to perform long-range forecasting include (Nguyen & Chan, 2004; Goswami & Srividya, 1996; Al-Saba & El-Amin, 1999) . For example, Nguyen & Chan (2004) propose a group of neural networks to make predictions at different time steps. Several recent works (Yu et al., 2017; Lai et al., 2018; Qiu et al., 2020; Bouktif et al., 2018; Barsoum et al., 2018) 2018) propose a Long-and Short-term Time-series network (LSTNet) which combines LSTM with autoregressive models to minimize a customized loss function inspired by support vector regression (SVR) (Cortes & Vapnik, 1995) . Another recent work (Cheng et al., 2020) proposes Multi-Level Construal Neural Network (MLCNN) which attempts to improve the predictive performance by fusing forecasting information of different future time. One more interesting work to mention is (Ahmed et al., 2007) , which combines Gaussian processes (Rasmussen, 2003) and neural networks to improve the forecasting performance. Overall, all these methods can be classified into two main classes: direct forecasting and iterative forecasting (Hamzaçebi et al., 2009; Bontempi et al., 2012) . In direct forecasting, a group of models are trained to directly make predictions at different values of N . The advantage is that all models are independent of each other and hence, models do not interfere with each other. However, its performance tends to becomes worse with N due to the lack of long-range dependencies. In iterative forecasting, the model is trained to make predictions for the next time step (t + 1) only. Then, the same model will be used over and over again and previous predictions are used together with the past observations to make predictions for the next time step. This process is recursively repeated N times to make predictions N time steps ahead. The previous predictions can be considered as synthetic data (with synthetic data window length L = N -1) to shorten the effective prediction horizon. However, the synthetic data generated in this supervised and recursive manner is more susceptible to error propagation, meaning that a small error in the current prediction becomes larger error in subsequent predictions (Taieb et al., 2012; Sorjamaa & et al, 2007) .

2.2. BENCHMARK METHODS

The taxonomy in Section 2.1 suggests that most existing methods can be classified as either direct or iterative forecasting. This motivates us to use both direct and iterative forecasting as benchmark methods. As mentioned in Section 2.1, some classical models such as LSTM, Autoregressive Integrated Moving Average (ARIMA) (Box & Pierce, 1970) , SVR, and Gaussian Process Regression (GPR) (Rasmussen, 2003) are used to form the core of many state-of-the-art methods. Hence, we



attempt to improve the long-range forecasting by proposing new architectures. For example, Yu et al. (2017) propose a Long Short-Term Memory (LSTM) (Hochreiter & Schmidhuber, 1997) based Tensor-Train Recurrent Neural Network as a module for sequence-to-sequence (Sutskever et al., 2014) framework, called TLSTM. Moreover, Lai et al. (

