EFFECTIVELY MODELING TIME SERIES WITH SIMPLE DISCRETE STATE SPACES

Abstract

Time series modeling is a well-established problem, which often requires that methods (1) expressively represent complicated dependencies, (2) forecast long horizons, and (3) efficiently train over long sequences. State-space models (SSMs) are classical models for time series, and prior works combine SSMs with deep learning layers for efficient sequence modeling. However, we find fundamental limitations with these prior approaches, proving their SSM representations cannot express autoregressive time series processes. We thus introduce SPACETIME, a new state-space time series architecture that improves all three criteria. For expressivity, we propose a new SSM parameterization based on the companion matrix-a canonical representation for discrete-time processes-which enables SPACETIME's SSM layers to learn desirable autoregressive processes. For long horizon forecasting, we introduce a "closed-loop" variation of the companion SSM, which enables SPACETIME to predict many future time-steps by generating its own layer-wise inputs. For efficient training and inference, we introduce an algorithm that reduces the memory and compute of a forward pass with the companion matrix. With sequence length ℓ and state-space size d, we go from Õ(dℓ) naïvely to Õ(d + ℓ). In experiments, our contributions lead to state-of-the-art results on extensive and diverse benchmarks, with best or second-best AUROC on 6 / 7 ECG and speech time series classification, and best MSE on 14 / 16 Informer forecasting tasks. Furthermore, we find SPACETIME (1) fits AR(p) processes that prior deep SSMs fail on, (2) forecasts notably more accurately on longer horizons than prior state-of-the-art, and (3) speeds up training on real-world ETTh1 data by 73% and 80% relative wall-clock time over Transformers and LSTMs.

1. INTRODUCTION

Time series modeling is a well-established problem, with tasks such as forecasting and classification motivated by many domains such as healthcare, finance, and engineering (Shumway et al., 2000) . However, effective time series modeling presents several challenges: • First, methods should be expressive enough to capture complex, long-range, and autoregressive dependencies. Time series data often reflects higher order dependencies, seasonality, and trends, which govern how past samples determine future samples (Chatfield, 2000) . This motivates many classical approaches that model these properties (Box et al., 1970; Winters, 1960) , alongside expressive deep learning mechanisms such as attention (Vaswani et al., 2017) and fully connected layers that model interactions between every sample in an input sequence (Zeng et al., 2022) . • Second, methods should be able to forecast a wide range of long horizons over various data domains. Reflecting real world demands, popular forecasting benchmarks evaluate methods on 34 different tasks (Godahewa et al., 2021) and 24-960 time-step horizons Zhou et al. (2021) . Furthermore, as testament to accurately learning time series processes, forecasting methods should ideally also be able to predict future time-steps on horizons they were not explicitly trained on. • Finally, methods should be efficient with training and inference. Many time series applications require processing very long sequences, e.g., classifying audio data with sampling rates up to 16,000 Hz (Warden, 2018). To handle such settings-where we still need large enough models that We thus propose SPACETIME, a deep state-space architecture for effective time series modeling. To achieve this, we focus on improving each criteria via three core contributions: 1. For expressivity, our key idea and building block is a linear layer that models time series processes as state-space models (SSMs) via the companion matrix (Fig. 1 ). We start with SSMs due to their connections to both classical time series analysis (Kalman, 1960; Hamilton, 1994) and recent deep learning advances (Gu et al., 2021a) . Classically, many time series models such as ARIMA and exponential smoothing (ETS) can be expressed as SSMs (Box et al., 1970; Winters, 1960) . Meanwhile, recent state-of-the-art deep sequence models (Gu et al., 2021a) have used SSMs to outperform Transformers and LSTMs on challenging long-range benchmarks (Tay et al., 2020) . Their primary innovations show how to formulate SSMs as neural network parameters that are practical to train. However, we find limitations with these deep SSMs for time series data. While we build on their advances, we prove that these prior SSM representations (Gu et al., 2021b; a; Gupta, 2022) cannot capture autoregressive processes fundamental for time series. We thus specifically propose the companion matrix representation for its expressive and memory-efficient properties. We prove that the companion matrix SSM recovers fundamental autoregressive (AR) and smoothing processes modeled in classical techniques such as ARIMA and ETS, while only requiring O(d) memory to represent an O(d 2 ) matrix. Thus, SPACETIME inherits the benefits of prior SSM-based sequence models, but introduces improved expressivity to recover fundamental time series processes simply through its layer weights. 2. For forecasting long horizons, we introduce a new "closed-loop" view of SSMs. Prior deep SSM architectures either apply the SSM as an "open-loop" (Gu et al., 2021a) , where fixed-length inputs necessarily generate same-length outputs, or use closed-loop autoregression where final layer outputs are fed through the entire network as next-time-step inputs (Goel et al., 2022) . We



Figure 1: We learn time series processes as state-space models (SSMs) (top left). We represent SSMs with the companion matrix, which is highly expressive for discrete time series (top middle), and compute such SSMs efficiently as convolutions or recurrences via a shift + low-rank decomposition (top right). We use these SSMs to build SPACETIME, a new time series architecture broadly effective across tasks and domains (bottom). can expressively model this data-training and inference should ideally scale subquadratically with sequence length and model size in time and space complexity. Unfortunately, existing time series methods struggle to achieve all three criteria. Classical methods (c.f., ARIMA (Box et al., 1970), exponential smoothing (ETS) (Winters, 1960)) often require manual data preprocessing and model selection to identify expressive-enough models. Deep learning methods commonly train to predict specific horizon lengths, i.e., as direct multi-step forecasting (Chevillon, 2007), and we find this hurts their ability to forecast longer horizons (Sec. 4.2.2). They also face limitations achieving high expressivity and efficiency. Fully connected networks (FCNs) in Zeng et al. (2022) scale quadratically in O(ℓh) space complexity (with input length ℓ and forecast length h). Recent Transformer-based models reduce this complexity to O(ℓ + h), but do not always outperform the above FCNs on forecasting benchmarks (Liu et al., 2022; Zhou et al., 2021).

