DEEPTIME: DEEP TIME-INDEX META-LEARNING FOR NON-STATIONARY TIME-SERIES FORECASTING

Abstract

Advances in I.T. infrastructure has led to the collection of longer sequences of time-series. Such sequences are typically non-stationary, exhibiting distribution shifts over time -a challenging scenario for the forecasting task, due to the problems of covariate shift, and conditional distribution shift. In this paper, we show that deep time-index models possess strong synergies with a meta-learning formulation of forecasting, displaying significant advantages over existing neural forecasting methods in tackling the problems arising from non-stationarity. These advantages include having a stronger smoothness prior, avoiding the problem of covariate shift, and having better sample efficiency. To this end, we propose Deep-Time, a deep time-index model trained via meta-learning. Extensive experiments on real-world datasets in the long sequence time-series forecasting setting demonstrate that our approach achieves competitive results with state-of-the-art methods, and is highly efficient. Code is attached as supplementary material, and will be publicly released.

1. INTRODUCTION

Time-series forecasting has important applications across business and scientific domains, such as demand forecasting (Carbonneau et al., 2008) , capacity planning and management (Kim, 2003) , and anomaly detection (Laptev et al., 2017) . With the advances of I.T. infrastructure, time-series are collected over longer durations, and at a higher sampling frequency. This has led to time-series spanning tens-of-thousands to millions of time steps, on which we would like to perform forecasting on. Such datasets face the unique challenge of non-stationarity, where long sequences face distribution shifts over time, due to factors like concept drift. This has practical implications on forecasting models, which face a degradation in performance at test time (Kim et al., 2021) due to covariate shift, and conditional distribution shift (see Appendix B for formal definitions). Table 1 : Time-index models are defined to be models whose predictions, ŷt , are purely functions of the current time-index features, τ t , e.g. relative time-index (1, 2, 3, ...), datetime features (minute-of-hour, week-of-day, etc.) . Historical-value models, whose predictions of future time step(s), ŷt+1 , are explicit functions of past observations, (y t , y t-1 , . . .), and optionally covariates, (z t+1 , z t , z t-1 , . . .), which can include exogenous time-series or even datetime features.

Time-index Models

Historical-value Models As can be seen, it overfits to historical data and is unable to extrapolate. This model corresponds to (+Local) Table 3 of our ablations. (b) DeepTime, our proposed approach, trained via a meta-learning formulation, successfully learns the appropriate function representation and is able to extrapolate. Visualized here is the last variable of the ETTm2 dataset. ŷt = f (τ t ) ŷt+1 = f (y t , 𝑝 ! 𝑦 "#! |𝑦 " , … 𝑝 ! 𝑦 " , 𝑦 "$! , … 𝑝 % 𝑦 "#! |𝑦 " , … 𝑝 % 𝑦 " , 𝑦 "$! , … 𝑝 ! 𝑦 "#! |𝑦 " , … 𝑝 & 𝑦 " , 𝑦 "$! , … Deep Time-index Models On the one hand, classical time-index methods (Taylor & Letham, 2018; Corani et al., 2021; Ord et al., 2017) rely on predefined parametric representation functions y t = f (τ t ) + ε t , where ε t represents idiosyncratic changes not accounted for by the model and f could be some polynomial function to represent trend, Fourier series to represent seasonality, or a composition of seasonal-trend components. While these functions are simple and easy to learn, the choice of representation function requires strong domain expertise or computationally heavy cross-validation. Furthermore, predefining the representation function is a strong assumption and may fail under distribution shifts. On the other hand, while deep time-index models (letting f be a deep neural network) present a deceptively clear path to approximate the representation function in a data-driven manner, deep time-index models are too expressive. Trained via straightforward supervised learning on historical values without any inductive bias, they are unable to extrapolate to future time steps (visualized in Figure 2 ), and a meta-learning formulation is required to do so -this formulation has the added benefit of handling non-stationary forecasting. Advantages of Deep Time-index Meta-learning Firstly, distribution shift of input statistics sharply degrade the prediction accuracy of deep learning models (Nado et al., 2020) . Historicalvalue models, which take past observations as input, suffer from this as an effect of covariate shift. Time-index models easily sidestep this problem since they take time-index features as input. Next, meta-learning is an effective solution to tackle the problem of conditional distribution shift -nearby time steps are assumed to follow a locally stationary distribution (Dahlhaus, 2012; Vogt, 2012) (see Figure 1 ), considered to be a task. The base learner adapts to this locally stationary distribution, while the meta learner generalizes across various task distributions. In principle, historical-value models are able to take advantage of the meta-learning formulation, however, they still suffer from the problem of covariate shift and sample efficiency issues. Time-index models are also able to achieve greater sample efficiency in the meta-learning formulation. Like many existing state-ofthe-art forecasting approaches, time-index models are direct multi-step (DMS) approachesfoot_0 . For a lookback window of length L and forecast horizon of length H, a historical-value DMS model requires N + L + H -1 time steps to construct a support set of size N , whereas a time-index model only requires N time steps. Not only does this marked increase in sample efficiency mean that



DMS methods directly predict the entire forecast horizon, and are contrasted with iterative multi-step (IMS) methods. Further discussion on DMS/IMS, and a taxonomy of forecasting methods in Appendix C.



Figure1: Non-stationary time-series degrade model performance due to covariate shifts and conditional distributional shifts. Such behaviors can be modelled as locally stationary processes, by which contiguous segments are assumed to be stationary. Meta-learning takes advantage of this assumption to adapt to these locally stationary distributions. Yet, existing methods which model the conditional distribution, p(y t+1 |y t , . . .), are still susceptible to covariate shifts since the meta model takes time-series values as input.

y t-1 , . . . , z t+1 , z t , . . .)

