NON-PARAMETRIC STATE-SPACE MODELS: IDENTIFIA-BILITY, ESTIMATION AND FORECASTING

Abstract

State-space models (SSMs) provide a standard methodology for time series analysis and prediction. While recent works utilize nonlinear functions to parameterize the transition and emission processes to enhance their expressivity, the form of additive noise still limits their applicability in real-world scenarios. In this work, we propose a general formulation of SSMs with a completely non-parametric transition model and a flexible emission model which can account for sensor distortion. Besides, to deal with more general scenarios (e.g., non-stationary time series), we add a higher-level model to capture the time-varying characteristics of the process. Interestingly, we find that even though the proposed model is remarkably flexible, the latent processes are generally identifiable. Given this, we further propose the corresponding estimation procedure and make use of it for the forecasting task. Our model can recover the latent processes and their relations from observed sequential data. Accordingly, the proposed procedure can also be viewed as a method for causal representation learning. We argue that forecasting can benefit from causal representation learning, since the estimated latent variables are generally identifiable. Empirical comparisons on various datasets validate that our model could not only reliably identify the latent processes from the observed data, but also consistently outperform baselines in the forecasting task.

1. INTRODUCTION

Time series forecasting plays a crucial role in various automation and optimization of business processes (Petropoulos et al., 2022; Benidis et al., 2020; Lim & Zohren, 2021) . State-space models (SSMs) (Durbin & Koopman, 2012) are among the most commonly-used generative forecasting models, providing a unified methodology to model dynamic behaviors of time series. Formally, given observations x t , they describe a dynamical system with latent processes z t as: z t = f (z t-1 ) + ϵ t , Transition x t = g(z t ) + η t , Emission where η t and ϵ t denote the i.i.d. Gaussian measurement and process noise terms, and f (•) and g(•) are the nonlinear transition model and the nonlinear emission model, respectively. The transition model captures the latent dynamics underlying the observed data, while the emission model learns the mapping from the latent processes to the observations. Recently, more expressive and scalable deep learning architectures were leveraged for modeling nonlinear transition and emission models effectively (Fraccaro et al., 2017; Castrejon et al., 2019; Saxena et al., 2021; Tang & Matteson, 2021) . However, these SSMs are not guaranteed to recover the underlying latent processes and their relations from observations. Furthermore, stringent assumptions of additive noise terms in both transition and emission models may not hold in practice. In particular, the additive noise terms cannot capture nonlinear distortions in the observed or latent values of the variables, which might be necessarily true in real-world applications (Zhang & Hyvarinen, 2012; Yao et al., 2021) , like sensor distortion and motion capture. If we directly apply SSMs with this constrained additive noise form, the model misspecification can lead to biased estimations. Second, the identification of SSMs is a very challenging task when both states and transition models are unknown. Most work so far has focused on developing efficient estimation methods. We argue that this issue should not be ignored, and it becomes more severe when nonlinear transition and emission models are implemented with deep learning techniques. As the parameter space has increased significantly, SSMs are prone to capture spurious causal relations and strengths, and thus identifiability of SSMs is vital. Furthermore, the transition model is usually assumed to be constant across the measured time period. This stationary assumption hardly holds in many real-life problems due to the changes in dynamics. For example, the unemployment rate tends to rise much faster at the start of a recession than it drops at the beginning of a recovery (Lubik & Matthes, 2015) . In this setting, SSMs should appropriately adapt to the time-varying characteristics of the latent processes to be applicable in general non-stationary scenarios. In this work, in contrast to state-of-the-art approaches following the additive form of transition/emission models, we propose a general formulation of SSMs, called the Non-Parametric State-Space Model (NPSSM)foot_0 . In particular, we leverage the non-parametric functional causal model (Pearl, 2009) for the transition process and the post-nonlinear model (Zhang & Hyvarinen, 2012) to capture nonlinear distortion effects in the emission model. Besides, we add a higher level model to NPSSM, called N-NPSSM, to capture the potential time-varying change property of the latent processes for more general scenarios (e.g., non-stationary time series). Interestingly, although the proposed NPSSM is remarkably flexible, the latent processes are generally identifiable. To this end, we further develop a novel estimation framework built upon the structural variational autoencoder (VAE) for the proposed NPSSMs. It allows us to recover latent processes and their time-delayed causal relations from observed sequential data and use them to build the latent prediction model simultaneously (illustrated in Figure 1 (left)). Accordingly, the proposed procedure can be viewed as a method for causal representation learning or latent causal model learning from time series data. We argue that forecasting tasks can benefit from causal representation learning, as latent processes are generally identifiable in NPSSM. As shown in Figure 1 (right), first, it provides a compact structure for forecasting, whereas vanilla predictors (bottom), which directly learn a mapping function in the observation space, face the issue of complicated and spurious dependencies. Second, the predictions following the correct causal factorization are expected to be more robust to distribution shifts that happen to some of the modules in the system. If some local intervention exists on one mechanism, it will not affect other modules, and those modules will still contribute generously to the final prediction. Although formulating this problem and providing quantitative theoretical results seem challenging, our empirical studies illustrate this well. Third, it gives a compact way to model the distribution changes. In realistic situations, data distribution might change over time. Fortunately, given the high-dimensional input, the changes often occur in a relatively small space in a causally-factorized system, which is known as the minimal change principle (Ghassami et al., 2018; Huang et al., 2020) 



Here, the definition of "non-parametric" is not about the general form of mapping function but indicates the functional causal model which takes the cause variables and errors as the input of a general function. Unlike the additive noise form, there is no constraint for how the noise interacts with the cause variable. Formal definition can be found in line 4 below Eq. (1.40) in(Pearl, 2009)



Figure 1: Left: The proposed estimation framework mainly includes the learning of latent causal model learning and prediction model. Right: Motivational examples demonstrate the benefit of latent causal model learning for forecasting. (1). It provides compact representations for forecasting, as vanilla predictors include complicated dependencies. (2). The prediction model is more robust to the distribution shift (Red circles here indicate distribution change). (3). It gives a compact way to model the change factors to address non-stationary forecasting issues.

