GATED INFERENCE NETWORK: INFERENCING AND LEARNING STATE-SPACE MODELS

Abstract

State-space models (SSMs) perform predictions by learning the underlying dynamics of observed sequence. We propose a new SSM approach in both high and low dimensional observation space, which utilizes Bayesian filtering-smoothing to model system's dynamics more accurately than RNN-based SSMs and can be learned in an end-to-end manner. The designed architecture, which we call the Gated Inference Network (GIN), is able to integrate the uncertainty estimates and learn the complicated dynamics of the system that enables us to perform estimation and imputation tasks in both data presence and absence. The proposed model uses the GRU cells into its structure to complete the data flow, while avoids expensive computations and potentially unstable matrix inversions. The GIN is able to deal with any time-series data and gives us a strong robustness to handle the observational noise. In the numerical experiments, we show that the GIN reduces the uncertainty of estimates and outperforms its counterparts , LSTMs, GRUs and variational approaches.

1. INTRODUCTION

State estimation and inference in the states in dynamical systems is one of the most interesting problems that has lots of application in signal processing and time series Rauch et al. (1965) . In some cases, learning state space is a very complicated task due to the relatively high dimension of observations and measurements, which only provides the partial information about the states. Noise is another significant issue in this scenario, where it is more likely to obtain a noisy observation. Time series prediction and estimating the next scene, e.g, the state prediction or next observation prediction, is another substantial application that again requires the inference within the states which comes from the observations. Classical memory networks such as LSTMs (Hochreiter & Schmidhuber, 1997) , GRUs (Cho et al., 2014) and simple RNNs like (Wilson & Finkel, 2009) and (Yadaiah & Sowmya, 2006) fail to give some intuition about the uncertainties and dynamics. A group of approaches perform the Kalman Filtering (KF) in the latent state which usually requires a deep encoder for feature extraction. Krishnan et al. (2017 ), Ghalamzan et al. (2021 ) and Hashempour et al. (2020) belong to these group of works. However the mentioned solutions have some restrictions, where they are not able to deal with high dimensional non-linear systems and the classic KF approach is computationally expensive, e.g matrix inversion issue. Likewise, indirect optimization of an objective fuction by using variational inference, like the work of Kingma & Welling (2013) , increases the complexity of the model. Moreover, in the variational inference approaches that usually implemented in the context of variational auto encoders for dimension reduction, they do not have access to the loss directly and have to minimize its lower bound instead, which reduce the ability of learning dynamics and affect the performance of the model. KalmanNet Revach et al. ( 2021) and Ruhe & Forré (2021) use GRU in their structure for the state update. However, they are only able to deal with low-dimensional state space and cannot handle complex high dimensional inputs because of directly using classic Bayesian equations and matrix inversion issue. Moreover, their structure require the full, or at least partial, dynamic information. The mentioned restrictions for KF and its variants and variational models in addition the necessity of having a metric to measure the uncertainty, motivate us to introduce the GIN, an end to end structure with dynamics learning ability using Bayesian properties for filtering-smoothing. The contributions of GIN are: (i) modeling high-low dimensional sequences: we show the eligibility of the GIN to infer both cases by a simple adjustment in the observation transferring functions in the 

2. RELATED WORKS

To deal with complex sensory inputs, some approaches integrate a deep auto encoders into their architecture. Among these works, Embed to Control (E2C) (Watter et al., 2015) uses a deep encoder to obtain the observation and a variational inference about the states. However, these methods are not able to deal with missing data problem and imputation task since they do not rely on memory cells and are not recurrent. Another group of works like BackpropKF (Haarnoja et al., 2016) and RKN (Becker et al., 2019) apply CNNs for dimension-reduction and output both the uncertainty vector and observation, where they move away from variational inference and borrow Bayesian properties for the inference. However, these methods cannot handle the cases with the available knowledge of the dynamics and impose restrictive assumptions over covariance matrices, while the GIN provides a principled way for using the available partial dynamics information and release any assumption over covariance. 



Figure 1: Inferred 5k length trajectories for Lorenz attractor.

Toward learning state space (system identification) a group of works likeWang et al. (2007),Ko & Fox (2011) and Frigola et al. (2013)  propose algorithms to learn GPSSMs based on maximum likelihood estimation with the iterative EM algorithm.Frigola  et al. (2013)  obtain sample trajectories from the smoothing distribution, then conditioned on this trajectory they conduct M step for the model's parameters. Switching linear dynamics systems (SLDS)(Ghahramani & Hinton, 2000), use additional latent variables to switch among different linear dynamics, where the approximate inference algorithms can be utilized to model switching linearity for reducing approximation errors ,however, this approach is not as flexible as general non-linear dynamical systems because the switch transition model is assumed independent of states and observations. To address this problem, Linderman et al. (2017) performs SLDS method through augmentation with a Polya-gamma-distributed variable and a stick-breaking process, however, this approach employs Gibbs sampling for inferring the parameters and therefore is not scalable to large datasets. Auto-regressive Hidden Markov Models (ARHMM) explain time series structures by defining a mapping from past observations to the current observation.(Salinas et al., 2020) is a ARHMM approach, in which target values are used as inputs directly. However, this dependency of the model on the targets makes the model more vulnerable to noise. This issue is addressed in DSSM(Rangapuram et al., 2018), another ARHMM approach, where the target values are only incorporated through the likelihood term. Other group of works consider EM-based variational-inference like Structured Inference Networks (SIN)(Krishnan et al., 2017), where it utilizes a RNN to update the state. Kalman Variational Autoencoder (KVAE)(Fraccaro et al., 2017)  and Extended KVAE (EKVAE)(Klushyn et al., 2021)  use the original KF equations and apply both filtering and smoothing.

