GATED INFERENCE NETWORK: INFERENCING AND LEARNING STATE-SPACE MODELS

Abstract

State-space models (SSMs) perform predictions by learning the underlying dynamics of observed sequence. We propose a new SSM approach in both high and low dimensional observation space, which utilizes Bayesian filtering-smoothing to model system's dynamics more accurately than RNN-based SSMs and can be learned in an end-to-end manner. The designed architecture, which we call the Gated Inference Network (GIN), is able to integrate the uncertainty estimates and learn the complicated dynamics of the system that enables us to perform estimation and imputation tasks in both data presence and absence. The proposed model uses the GRU cells into its structure to complete the data flow, while avoids expensive computations and potentially unstable matrix inversions. The GIN is able to deal with any time-series data and gives us a strong robustness to handle the observational noise. In the numerical experiments, we show that the GIN reduces the uncertainty of estimates and outperforms its counterparts , LSTMs, GRUs and variational approaches.

1. INTRODUCTION

State estimation and inference in the states in dynamical systems is one of the most interesting problems that has lots of application in signal processing and time series Rauch et al. (1965) . In some cases, learning state space is a very complicated task due to the relatively high dimension of observations and measurements, which only provides the partial information about the states. Noise is another significant issue in this scenario, where it is more likely to obtain a noisy observation. Time series prediction and estimating the next scene, e.g, the state prediction or next observation prediction, is another substantial application that again requires the inference within the states which comes from the observations. Classical memory networks such as LSTMs (Hochreiter & Schmidhuber, 1997 ), GRUs (Cho et al., 2014) and simple RNNs like (Wilson & Finkel, 2009) and (Yadaiah & Sowmya, 2006) fail to give some intuition about the uncertainties and dynamics. A group of approaches perform the Kalman Filtering (KF) in the latent state which usually requires a deep encoder for feature extraction. Krishnan et al. (2017) , Ghalamzan et al. (2021) and Hashempour et al. (2020) belong to these group of works. However the mentioned solutions have some restrictions, where they are not able to deal with high dimensional non-linear systems and the classic KF approach is computationally expensive, e.g matrix inversion issue. Likewise, indirect optimization of an objective fuction by using variational inference, like the work of Kingma & Welling (2013), increases the complexity of the model. Moreover, in the variational inference approaches that usually implemented in the context of variational auto encoders for dimension reduction, they do not have access to the loss directly and have to minimize its lower bound instead, which reduce the ability of learning dynamics and affect the performance of the model. KalmanNet Revach et al. ( 2021) and Ruhe & Forré (2021) use GRU in their structure for the state update. However, they are only able to deal with low-dimensional state space and cannot handle complex high dimensional inputs because of directly using classic Bayesian equations and matrix inversion issue. Moreover, their structure require the full, or at least partial, dynamic information. The mentioned restrictions for KF and its variants and variational models in addition the necessity of having a metric to measure the uncertainty, motivate us to introduce the GIN, an end to end structure with dynamics learning ability using Bayesian properties for filtering-smoothing. The contributions of GIN are: (i) modeling high-low dimensional sequences: we show the eligibility of the GIN to infer both cases by a simple adjustment in the observation transferring functions in the 1

