NEURAL APPROXIMATE SUFFICIENT STATISTICS FOR IMPLICIT MODELS

Abstract

We consider the fundamental problem of how to automatically construct summary statistics for implicit generative models where the evaluation of the likelihood function is intractable but sampling data from the model is possible. The idea is to frame the task of constructing sufficient statistics as learning mutual information maximizing representations of the data with the help of deep neural networks. The infomax learning procedure does not need to estimate any density or density ratio. We apply our approach to both traditional approximate Bayesian computation and recent neural likelihood methods, boosting their performance on a range of tasks.

1. INTRODUCTION

Many data generating processes can be well-described by a parametric statistical model that can be easily simulated forward but does not possess an analytical likelihood function. These models are called implicit generative models (Diggle & Gratton, 1984) or simulator-based models (Lintusaari et al., 2017) and are widely used in science and engineering domains, including physics (Sjöstrand et al., 2008) , genetics (Järvenpää et al., 2018 ), computer graphics (Mansinghka et al., 2013) , robotics (Lopez-Guevara et al., 2017) , finance (Bansal & Yaron, 2004) , cosmology (Weyant et al., 2013) , ecology (Wood, 2010) and epidemiology (Chinazzi et al., 2020) . For example, the number of infected/healthy people in an outbreak could be well modelled by stochastic differential equations (SDE) simulated by Euler-Maruyama discretization but the likelihood function of a SDE is generally non-analytical. Directly inferring the parameters of these implicit models is often very challenging. The techniques coined as likelihood-free inference open us a door for performing Bayesian inference in such circumstances. Likelihood-free inference needs to evaluate neither the likelihood function nor its derivatives. Rather, it only requires the ability to sample (i.e. simulate) data from the model. Early approaches in approximate Bayesian computation (ABC) perform likelihood-free inference by repeatedly simulating data from the model, and pick a small subset of the simulated data close to the observed data to build the posterior (Pritchard et al., 1999; Marjoram et al., 2003; Beaumont et al., 2009; Sisson et al., 2007) . Recent advances make use of flexible neural density estimators to approximate either the intractable likelihood (Papamakarios et al., 2019) or directly the posterior (Papamakarios & Murray, 2016; Lueckmann et al., 2017; Greenberg et al., 2019) . Despite the algorithmic differences, a shared ingredient in likelihood-free inference methods is the choice of summary statistics. Well-chosen summary statistics have been proven crucial for the performance of likelihood-free inference methods (Blum et al., 2013; Fearnhead & Prangle, 2012; Sisson et al., 2018) . Unfortunately, in practice it is often difficult to determine low-dimensional and informative summary statistic without domain knowledge from experts. In this work, we propose a novel deep neural network-based approach for automatic construction of summary statistics. Neural networks have been previously applied to learning summary statistics for likelihood-free inference (Jiang et al., 2017; Dinev & Gutmann, 2018; Alsing et al., 2018; Brehmer et al., 2020) . Our approach is unique in that our learned statistics directly target global sufficiency. The main idea is to exploit the link between statistical sufficiency and information theory, and to formulate the task of learning sufficient statistic as the task of learning information-maximizing representations of data. We achieve this with distribution-free mutual information estimators or their proxies (Székely et al., 2014; Hjelm et al., 2018) . Importantly, our statistics can be learned jointly with the posterior, resulting in fast learning where the two can refine each other iteratively. To sum up, our main contributions are: • We propose a new neural approach to automatically extract compact, near-sufficient statistics from raw data. The approach removes the need for careful handcrafted design of summary statistics. • With the proposed statistics, we develop two new likelihood-free inference methods namely SMC-ABC+ and SNL+. Experiments on tasks with various types of data demonstrate their effectiveness.

2. BACKGROUND

Likelihood-free inference. LFI considers the task of Bayesian inference when the likelihood function of the model is intractable but simulating (sampling) data from the model is possible: π(θ θ θ|x o ) ∝ π(θ θ θ) p(x o |θ θ θ) ? where x o is the observed data, π(θ θ θ) is the prior over the model parameters θ θ θ, p(x o |θ θ θ) is the (possibly) non-analytical likelihood function and π(θ θ θ|x o ) is the posterior over θ θ θ. We assume that, while we do not have access to the exact likelihood, we can still sample (simulate) data from the model with a simulator: x ∼ p(x|θ θ θ). The task is then to infer π(θ θ θ|x o ) given x o and the sampled data: D = {θ θ θ i , x i } n i=1 where θ θ θ i ∼ p(θ θ θ), x i ∼ p(x|θ θ θ i ). Note that p(θ θ θ) is not necessarily the prior π(θ θ θ). Curse of dimensionality. Different likelihood-free inference algorithms might learn π(θ θ θ|x o ) in different ways, nevertheless most existing methods suffer from the curse of dimensionality. For example, traditional ABC methods use a small subset of D closest to x o under some metric to build the posterior (Pritchard et al., 1999; Marjoram et al., 2003; Beaumont et al., 2009; Sisson et al., 2007) , however in high-dimensional space measuring the distance sensibly is notoriously hard (Sorzano et al., 2014; Xie et al., 2017) . On the other hand, recent advances (Papamakarios et al., 2019; Lueckmann et al., 2017; Papamakarios & Murray, 2016; Greenberg et al., 2019) utilize neural density estimators (NDE) to model the intractable likelihood or the posterior. Unfortunately, modeling high-dimensional distributions with NDE accurately is also known to be very difficult (Rippel & Adams, 2013; Van Oord et al., 2016) , especially when the available training data is scarce. Our interest here is not to design a new inference algorithm, but to find a low-dimensional statistic s = s(x) that is (Bayesian) sufficient: π(θ θ θ|x o ) ≈ π(θ θ θ|s o ) ∝ π(θ θ θ)p(s o |θ θ θ), where s : X → S is a deterministic function also learned from D. We conjecture that the learning of s(•) might be an easier task than direct density estimation. The resultant statistic s could then be applied to a wide range of likelihood-free inference algorithms as we will elaborate in Section 3.2.

3.1. NEURAL SUFFICIENT STATISTICS

Our new deep neural network-based approach for automatic construction of near-sufficient statistics is based on the infomax principle, as illustrated by the following proposition (also see Figure 1 where S is deterministic mapping and I(•; •) is the mutual information between random variables. Proof. We defer the complete proof to the appendix. This proposition is a variant of Theorem 8 in (Shamir et al., 2010) with an adaption to the likelihood-free inference scenario. This important result suggests that we could find the sufficient statistic s = s(x) for a likelihood function p(x|θ θ θ) by maximizing the mutual information (MI) I(θ θ θ; s) = KL[p(θ θ θ, s) p(θ θ θ)p(s)] between θ θ θ and s. Moreover, as our interest is in maximizing MI rather than knowing its precise value,



): Proposition 1. Let θ θ θ ∼ p(θ θ θ), x ∼ p(x|θ θ θ), and s : X → S be a deterministic function. Then s = s(x) is a sufficient statistic for p(x|θ θ θ) if and only if s = arg max S:X →S I(θ θ θ; S(x)),

