PARAMETRIC DENSITY ESTIMATION WITH UNCERTAINTY USING DEEP ENSEMBLES

Abstract

In parametric density estimation, the parameters of a known probability density 1 are typically recovered from measurements by maximizing the log-likelihood. Prior knowledge of measurement uncertainties is not included in this method, po-3



tentially producing degraded or even biased parameter estimates. We propose 4 an efficient two-step, general-purpose approach for parametric density estimation et al., 2006) . These tasks typically involve one input 17 vector or image and a single output vector of predictions.

18

In parametric density estimation, there is a known probability density that the data (or latent features 19 of the data) are expected to follow. The goal is to find representative distribution parameters for a 20 given dataset. In simple cases where the likelihood is calculable, maximum likelihood estimation 21 can be used effectively. In cases where latent features of the data follow a known distribution (e.g.,

22

heights of people in a dataset of photographs), NNs can potentially be used to directly estimate the 23 distribution parameters. For clarity, we define this direct/end-to-end approach as parametric feature 24 density estimation (PFDE). Such an approach requires employing entire datasets (with potentially 25 thousands to millions of high-dimensional examples) as inputs in order to output a vector of den-26 sity parameters. Furthermore, to be useful these NNs would need to generalize to arbitrarily sized 27 dataset-inputs.

28

One example of NNs making sense of large dataset-inputs is found in natural language processing.

29

Here large text corpora, converted to word vectors (Pennington et al., 2014; Devlin et al., 2019) , 30 can be input and summarized by single output vectors using recurrent neural networks (RNNs), for 31 instance in sentiment analysis (Can et al., 2018) . However, these problems and RNNs themselves 32 contain inductive bias -there is inherent structure in text. Not all information need be given at once 33 and a concept of memory or attention is sufficient (Vaswani et al., 2017) . The same can be said 34 about time domain problems, such as audio processing or voice recognition. Memory is inherently 35 imperfect -for PFDE, one ideally wants to know all elements of the ensemble at once to make 36 the best prediction: sequential inductive bias is undesirable. Ultimately, memory and architectural 37 constraints make training NNs for direct PFDE computationally intractable.

38

On the other hand, density estimation on data directly (not on its latent features), is computationally 39 tractable. Density estimation lets us find a complete statistical model of the data generating process. 



using deep ensembles. Feature predictions and their uncertainties are returned 6 by a deep ensemble and then combined in an importance weighted maximum 7 likelihood estimation to recover parameters representing a known density along 8 with their respective errors. To compare the bias-variance tradeoff of different 9 approaches, we define an appropriate figure of merit. We illustrate a number of 10 use cases for our method in the physical sciences and demonstrate state-of-the-art 11 results for X-ray polarimetry that outperform current classical and deep learning state-of-the-art NN performances are single (high-dimensional) input, multiple-15 output tasks, for instance classifying images (Krizhevsky et al., 2012), scene understanding (Red-16 mon et al., 2015) and voice recognition (Graves

Applying deep learning to density estimation has advanced the field significantly (Papamakarios, 41 2019). Most of the work so far focuses on density estimation where the density is unknown a priori.42This can be achieved with non-parametric methods such as neural density estimation (Papamakarios 43 1

