LEARNING FAST AND SLOW FOR ONLINE TIME SERIES FORECASTING

Abstract

Despite the recent success of deep learning for time series forecasting, these methods are not scalable for many real-world applications where data arrives sequentially. Training deep neural forecasters on the fly is notoriously challenging because of their limited ability to adapt to non-stationary environments and remember old knowledge. We argue that the fast adaptation capability of deep neural networks is critical and successful solutions require handling changes to both new and recurring patterns effectively. In this work, inspired by the Complementary Learning Systems (CLS) theory, we propose Fast and Slow learning Network (FSNet) as a novel framework to address the challenges of online forecasting. Particularly, FSNet improves the slowly-learned backbone by dynamically balancing fast adaptation to recent changes and retrieving similar old knowledge. FSNet achieves this mechanism via an interaction between two novel complementary components: (i) a per-layer adapter to support fast learning from individual layers, and (ii) an associative memory to support remembering, updating, and recalling repeating events. Extensive experiments on real and synthetic datasets validate FSNet's efficacy and robustness to both new and recurring patterns.

1. INTRODUCTION

Time series forecasting plays an important role in both research and industries. Correctly forecast time series can greatly benefit various business sectors such as traffic management and electricity consumption (Hyndman & Athanasopoulos, 2018) . As a result, tremendous efforts have been devoted to develop better forecasting models (Petropoulos et al., 2020; Bhatnagar et al., 2021; Triebe et al., 2021) , with a recent success of deep neural networks (Li et al., 2019; Xu et al., 2021; Yue et al., 2021; Zhou et al., 2021) thanks to their impressive capabilities to discover hierarchical latent representations and complex dependencies. However, such studies focus on the batch learning setting which requires the whole training dataset to be made available a priori and implies the relationship between the input and outputs remains static throughout. This assumption is restrictive in real-world applications, where data arrives in a stream and the input-output relationship can change over time (Gama et al., 2014) . In such cases, re-training the model from scratch could be time consuming. Therefore, it is desirable to train the deep forecaster online (Anava et al., 2013; Liu et al., 2016) using only new samples to capture the changing dynamic of the environment. Despite the ubiquitous of online learning in many real-world applications, training deep forecasters online remains challenging for two reaons. First, naively train deep neural networks on data streams requires many samples to converge (Sahoo et al., 2018; Aljundi et al., 2019a) because the offline training benefits such as mini-batches or training for multiple epochs are not available. Therefore, when a distribution shift happens (Gama et al., 2014) , such cumbersome models would require many samples to learn new concepts with satisfactory results. Overall, deep neural networks, although possess strong representation learning capabilities, lack a mechanism to facilitate successful learning on data streams. Second, time series data often exhibit recurrent patterns where one pattern could become inactive and re-emerge in the future. Since deep networks suffer from catastrophic forgetting (McCloskey & Cohen, 1989) , they cannot retain prior knowledge and result in inefficient learning of recurring patterns, which further hinders the overall performance. Consequently, online time series forecasting with deep models presents a promising yet challenging problem. To address the above limitations, we redically formulate online time series forecasting as an online, task-free continual learning problem (Aljundi et al., 2019a; b) . Particularly, continual learning requires balancing two objectives: (i) utilizing past knowledge to facilitate fast learning of current patterns; and (ii) maintaining and updating the already acquired knowledge. These two objectives closely match the aforementioned challenges and are usually referred to as the stability-plasticity dilemma (Grossberg, 1982) . With this connection, we develop an effective online time series forecasting framework motivated by the Complementary Learning Systems (CLS) theory (McClelland et al., 1995; Kumaran et al., 2016) , a neuroscience framework for human continual learning. Specifically, the CLS theory suggests that humans can continually learn thanks to the interactions between the hippocampus and the neocortex, which supports the consolidation, recall, and update such experiences to form a more general representation, which supports generalization to new experiences. This work develops FSNet (Fast-and-Slow learning Network) to enhance the sample efficiency of deep networks when dealing with distribution shifts or recurring concepts in online time series forecasting. FSNet's key idea for fast learning is its ability to always improve the learning at current steps instead of explicitly detecting changes in the environment. To do so, FSNet employs a perlayer adapter to model the temporal consistency in time series and adjust each intermediate layer to learn better, which in turn improve the learning of the whole deep network. In addition, FSNet further equip each adapter with an associative memory (Kaiser et al., 2017) to store important, recurring patterns observed. When encountering such events, the adapter interacts with its memory to retrieve and update the previous actions to further facilitate fast learning. Consequently, the adapter can model the temporal smoothness in time series to facilitate learning while its interactions with the associative memories support remembering and improving the learning of recurring patterns. In summary, our work makes the following contributions. First, we radically formulate learning fast in online time series forecasting with deep models as a continual learning problem. Second, motivated by the CLS theory, we propose a fast-and-slow learning paradigm of FSNet to handle both the fast changing and long-term knowledge in time series. Lastly, we conduct extensive experiments with both real and synthetic datasets to demonstrate FSNet's efficacy and robustness.

2. PRELIMINARY AND RELATED WORK

This section provides the necessary background of time series forecasting and continual learning.

2.1. TIME SERIES FORECASTING SETTINGS

Let X = (x 1 , . . . , x T ) ∈ R T ×n be a time series of T observations, each has n dimensions. The goal of time series forecasting is that given a look-back window of length e, ending at time i: X i,e = (x i-e+1 , . . . , x i ), predict the next H steps of the time series as f ω (X i,H ) = (x i+1 , . . . , x i+H ), where ω denotes the parameter of the forecasting model. We refer to a pair of look-back and forecast windows as a sample. For multiple-step forecasting (H > 1) we follow the standard approach of employing a linear regressor to forecast all H steps in the horizon simultaneously (Zhou et al., 2021) . Online Time Series Forecasting is ubiquitous is many real-world scenarios (Anava et al., 2013; Liu et al., 2016; Gultekin & Paisley, 2018; Aydore et al., 2019) due to the sequential nature of data. In this setting, there is no separation of training and evaluation. Instead, learning occurs over a sequence of rounds. At each round, the model receives a look-back window and predicts the forecast window. Then, the true answer is revealed to improve the model's predictions of the incoming rounds (Hazan, 2019) . The model is commonly evaluated by its accumulated errors throughout learning (Sahoo et al., 2018) . Due to its challenging nature, online time series forecasting exhibits several challenging sub-problems, ranging from learning under concept drifts (Gama et al., 2014) , to dealing with missing values because of the irregularly-sampled data (Li & Marlin, 2020; Gupta et al., 2021) . In this work, we focus on the problem of fast learning (in terms of sample efficiency) under concept drifts by improving the deep network's architecture and recalling relevant past knowledge. There is also a rich literature of Bayesian continual learning to address regression problems (Smola et al., 2003; Kurle et al., 2019; Gupta et al., 2021) . However, such formulation follow the Bayesian

availability

Our code is publicly available at: https://github.com/salesforce/fsnet/.

