LEARNING IMPLICIT HIDDEN MARKOV MODELS USING NEURAL LIKELIHOOD-FREE INFERENCE

Abstract

likelihood-free inference methods based on neural conditional density estimation were shown to drastically reduce the simulation burden in comparison to classical methods such as ABC. However, when applied in the context of any latent variable model, such as a Hidden Markov model (HMM), these methods are designed to only estimate the parameters rather than the joint posterior distribution of both the parameters and the hidden states. Naive application of these methods to a HMM, ignoring the inference of this joint posterior distribution, will result in overestimation of uncertainty of the posterior predictive. We propose a postprocessing step that can rectify this problem for HMMs with a continuous state space. Our approach relies on learning directly the intractable posterior distribution of the hidden states, using an autoregressive-flow, by exploiting the Markov property. Upon evaluating our approach on some implicit HMMs, we found that the quality of the estimates retrieved using our postprocessing is comparable to what can be achieved using a computationally expensive particle-filtering which additionally requires a tractable data distribution.

1. INTRODUCTION

We consider the task of Bayesian inference of a Hidden Markov modelfoot_0 whose likelihood is analytically intractable and the model is only available as a simulator. Due to the unavailability of the likelihood standard Bayesian inference methods cannot be applied to such a model. Inference of such a model is generally carried out using approximate Bayesian computation (ABC) (Sisson et al., 2018) , which only require forward simulations from the model, see for example Martin et al. (2019) ; Picchini (2014) ; Toni et al. (2009) . Recently, a new class of likelihood-free inference methods, see Cranmer et al. (2020) for a review, were developed that use a neural network based emulator of the posterior density, the likelihood density and the likelihood ratio. Such methods were empirically shown to be much more sample efficient (Lueckmann et al., 2021) in comparison to ABC. Additionally, these methods do not require the user to specify difficult-to-choose algorithmic parameters and they perform equally well across different models without (much) problem specific choice of the neural network's architecture. Naturally, these methods appear as more preferable algorithmic choices for carrying out inference in an implicit HMM, in comparison to ABC. We like to point out that these neural likelihood-free approaches (NLFI) are usually applied to estimate the posterior of the parameters only. This is since a naive implementation of a neural network based emulator may perform unreliably in estimating the joint posterior of the parameters and the high-dimensional hidden states, potentially for a lack of inductive biases. Estimation of the hidden states may or may not be of interest within a particular application domain. However, without estimating the joint posterior of the parameters and the hidden states the goodness-of-fit cannot be correctly assessed. This is indeed a severe limitation. Note that although ABC theoretically targets the joint distribution it fails to estimate (we will demonstrate this later in the experiments) the hidden states adequately within a reasonable simulation budget. In this paper we present a novel technique to estimate the hidden states by learning an approximation of the incremental posterior distribution of the states using a neural density estimator. After learning the incremental posterior density, the density estimator can be used to draw the full path of the hidden states recursively. Following are our salient contributions: • We highlight the problem of neural likelihood-free methods in estimating the joint density of the unknown variables within a HMM. We then propose a postprocessing technique to mitigate this limitation. • Our approach can be used as a postprocessing technique for any likelihood-free method, in order to avoid overestimation of uncertainty. • We develop a method to obtain an amortised approximation of the posterior of a latent Markov process, with intractable transition and observation densities, without using ABC.

2. BACKGROUND

We begin by first introducing the implicit HMM and then we will discuss the challenges of carrying out Bayesian inference. We can describe a HMM, for a continuous latent Markov process X t ∈ R K with a K-dimensional state-space, as follows: X t ∼ f (X t |X t-1 , θ) y t ∼ g(y t |X t , θ) where θ parameterise the transition f (X t |X t-1 , θ) and the observation g(y t |X t , θ) densities respectively. We consider θ to include the initial state X 0 . Given a set of noisy observations of L out of the K states y ∈ R M ×L at M experimental time points, of the latent process, our goal is to infer the joint posterior distribution p(θ, x|y), where x = (X 1 , . . . , X M -1 ) is the unobserved sample path of the process -the hidden states. The expression for the unnormalised posterior is given by p(θ, x|y) ∝ p(θ) M -1 t=0 g(y t |X t , θ) M -1 t=1 f (X t |X t-1 , θ) , where p(θ) is the prior distributions over the parameters and the initial values. The trickier part is the inference of the hidden states. We are interested in the case where one can draw samples from f (•) and g(•), but cannot evaluate either or both of these densities. Note that indeed when these densities are known, and are Gaussian, we can apply classical filtering/smoothing techniques (Särkkä, 2013) . For non-Gaussian densities, or when only g(•) is known, particle-filtering can be used. However, when g(•) is not known then likelihood-free methods are generally used.

3. NEURAL LIKELIHOOD-FREE INFERENCE

If instead of the joint p(θ, x|y) we only wish to estimate the marginal p(θ|y), then a number of strategies based on conditional density estimation can be employed. For example, we can simulate pairs of θ, y from their joint distribution and then subsequently create a training dataset, of N samples {θ n , y n } N n=1 , which can be utilised to train a conditional density estimator that can approximate the posterior (Papamakarios & Murray, 2016) p(θ|y) ≈ q ψ (θ|y) or the likelihood Papamakarios et al. ( 2019) p(y|θ) ≈ q ψ (y|θ). In the former case once we have trained an approximation to the posterior, using a density estimator, we can directly draw samples θ ∼ q ψ (θ|y o ) by conditioning on a particular dataset y o . In the latter case we can use the trained density estimator to approximate the posterior p(θ|y) ∝ q ϕ (y o |θ)p(θ) and then draw samples from it using Markov chain Monte Carlo (MCMC). In principle we can choose any density estimator for q ψ (•|•). However, in the literature surrounding NLFI, a neural network based density estimator is generally used. A neural network is used in this context either as a nonlinear transformation of the conditioning variables, within a mixture-of-Gaussian density as was proposed in Bishop (1994) , or as a normalizing-flow (Rezende & Mohamed, 2015; Papamakarios et al., 2021) that builds a transport map (Parno, 2015) between a simple distribution (such as a standard Gaussian) and a complex one such as the likelihood/posterior density. Following the seminal work of Tabak & Turner (2013) a large amount of research is undertaken to build such transport maps using samples from the respective measures. An alternative formulation of NLFI utilises the duality (Cranmer et al., 2015) between the optimal decision function of a probabilistic classifier and the likelihood ratio, r(θ a , θ b ) = p(y|θ a ) p(y|θ b ) evaluated using two samples θ a and θ b , to approximate the latter through training a binary classifier using samples from p(y, θ). This likelihood ratio can then be used as proxy within a MCMC scheme as follows: 



In some literature the term state-space model is used interchangeably to refer to a Hidden Markov model.



min 1, p(y o |θ * )k θ (θ|θ * )p(θ * ) p(y o |θ)k θ (θ * |θ)p(θ) ≈ min 1, r(θ * , θ) k θ (θ|θ * )p(θ * ) k θ (θ * |θ)p(θ) ,

