EFFECTIVELY MODELING TIME SERIES WITH SIMPLE DISCRETE STATE SPACES

Abstract

Time series modeling is a well-established problem, which often requires that methods (1) expressively represent complicated dependencies, (2) forecast long horizons, and (3) efficiently train over long sequences. State-space models (SSMs) are classical models for time series, and prior works combine SSMs with deep learning layers for efficient sequence modeling. However, we find fundamental limitations with these prior approaches, proving their SSM representations cannot express autoregressive time series processes. We thus introduce SPACETIME, a new state-space time series architecture that improves all three criteria. For expressivity, we propose a new SSM parameterization based on the companion matrix-a canonical representation for discrete-time processes-which enables SPACETIME's SSM layers to learn desirable autoregressive processes. For long horizon forecasting, we introduce a "closed-loop" variation of the companion SSM, which enables SPACETIME to predict many future time-steps by generating its own layer-wise inputs. For efficient training and inference, we introduce an algorithm that reduces the memory and compute of a forward pass with the companion matrix. With sequence length ℓ and state-space size d, we go from Õ(dℓ) naïvely to Õ(d + ℓ). In experiments, our contributions lead to state-of-the-art results on extensive and diverse benchmarks, with best or second-best AUROC on 6 / 7 ECG and speech time series classification, and best MSE on 14 / 16 Informer forecasting tasks. Furthermore, we find SPACETIME (1) fits AR(p) processes that prior deep SSMs fail on, (2) forecasts notably more accurately on longer horizons than prior state-of-the-art, and (3) speeds up training on real-world ETTh1 data by 73% and 80% relative wall-clock time over Transformers and LSTMs.

1. INTRODUCTION

Time series modeling is a well-established problem, with tasks such as forecasting and classification motivated by many domains such as healthcare, finance, and engineering (Shumway et al., 2000) . However, effective time series modeling presents several challenges: • First, methods should be expressive enough to capture complex, long-range, and autoregressive dependencies. Time series data often reflects higher order dependencies, seasonality, and trends, which govern how past samples determine future samples (Chatfield, 2000) . This motivates many classical approaches that model these properties (Box et al., 1970; Winters, 1960) , alongside expressive deep learning mechanisms such as attention (Vaswani et al., 2017) and fully connected layers that model interactions between every sample in an input sequence (Zeng et al., 2022) . • Second, methods should be able to forecast a wide range of long horizons over various data domains. Reflecting real world demands, popular forecasting benchmarks evaluate methods on 34 different tasks (Godahewa et al., 2021) and 24-960 time-step horizons Zhou et al. (2021) . Furthermore, as testament to accurately learning time series processes, forecasting methods should ideally also be able to predict future time-steps on horizons they were not explicitly trained on. • Finally, methods should be efficient with training and inference. Many time series applications require processing very long sequences, e.g., classifying audio data with sampling rates up to 16,000 Hz (Warden, 2018). To handle such settings-where we still need large enough models that 1

