Maximum Likelihood Learning of Energy-Based Models for Simulation-Based Inference

Abstract

We introduce two synthetic likelihood methods for Simulation-Based Inference (SBI), to conduct either amortized or targeted inference from experimental observations when a high-fidelity simulator is available. Both methods learn a conditional energy-based model (EBM) of the likelihood using synthetic data generated by the simulator, conditioned on parameters drawn from a proposal distribution. The learned likelihood can then be combined with any prior to obtain a posterior estimate, from which samples can be drawn using MCMC. Our methods uniquely combine a flexible Energy-Based Model and the minimization of a KL loss: this is in contrast to other synthetic likelihood methods, which either rely on normalizing flows, or minimize score-based objectives; choices that come with known pitfalls. Our first method, Amortized Unnormalized Neural Likelihood Estimation (AUNLE), introduces a tilting trick during training that allows to significantly lower the computational cost of inference by enabling the use of efficient MCMC techniques. Our second method, Sequential UNLE (SUNLE), utilizes a new conditional EBM learning technique in order to re-use simulation data and improve posterior accuracy on a specific dataset. We demonstrate the properties of both methods on a range of synthetic datasets, and apply them to a neuroscience model of the pyloric network in the crab, matching the performance of other synthetic likelihood methods at a fraction of the simulation budget.

1. Introduction

Simulation-based modeling expresses a system as a probabilistic program (Ghahramani, 2015) , which describes, in a mechanistic manner, how samples from the system are drawn given the parameters of the said system. This probabilistic program can be concretely implemented in a computer -as a simulator -from which synthetic parameter-samples pairs can be drawn. This setting is common in many scientific and engineering disciplines such as stellar events in cosmology (Alsing et al., 2018; Schafer & Freeman, 2012) , particle collisions in a particle accelerator for high energy physics (Eberl, 2003; Sjöstrand et al., 2008) , and biological neural networks in neuroscience (Markram et al., 2015; Pospischil et al., 2008) . Describing such systems using a probabilistic program often turns out to be easier than specifying the underlying probabilistic model with a tractable probability distribution. We consider the task of inference for such systems, which consists in computing the posterior distribution of the parameters given observed (non-synthetic) data. When a likelihood function of the simulator is available alongside with a prior belief on the parameters, inferring the posterior distribution of the parameters given data is possible using Bayes' rule. Traditional inference methods such as variational techniques (Wainwright & Jordan, 2008) or Markov Chain Monte Carlo (Andrieu et al., 2003) can then be used to produce approximate posterior samples of the parameters that are likely to have generated the observed data. Unfortunately, the likelihood function of a simulator is computationally intractable in general, thus making the direct application of traditional inference techniques unusable for simulation-based modelling. Simulation-Based Inference (SBI) methods (Cranmer et al., 2020) are methods specifically designed to perform inference in the presence of a simulator with an intractable likelihood. These methods repeatedly generate synthetic data using the simulator to build an estimate of 1

