AUTO-ENCODING GOODNESS OF FIT

Abstract

We develop a new type of generative autoencoder called the Goodness-of-Fit Autoencoder (GoFAE), which incorporates GoF tests at two levels. At the minibatch level, it uses GoF test statistics as regularization objectives. At a more global level, it selects a regularization coefficient based on higher criticism, i.e., a test on the uniformity of the local GoF p-values. We justify the use of GoF tests by providing a relaxed L 2 -Wasserstein bound on the distance between the latent distribution and a distribution class. We prove that optimization based on these tests can be done with stochastic gradient descent on a compact Riemannian manifold. Empirically, we show that our higher criticism parameter selection procedure balances reconstruction and generation using mutual information and uniformity of p-values respectively. Finally, we show that GoFAE achieves comparable FID scores and mean squared errors with competing deep generative models while retaining statistical indistinguishability from Gaussian in the latent space based on a variety of hypothesis tests.

1. INTRODUCTION

Generative autoencoders (GAEs) aim to achieve unsupervised, implicit generative modeling via learning a latent representation of the data (Bousquet et al., 2017) . A generative model, known as the decoder, maps elements in a latent space, called codes, to the data space. These codes are sampled from a pre-specified distribution, or prior. GAEs also learn an encoder that maps the data space into the latent space, by controlling the probability distribution of the transformed data, or posterior. As an important type of GAEs, variational autoencoders (VAEs) maximize a lower bound on the data log-likelihood, which consists of a reconstruction term and a Kullback-Leibler (KL) divergence between the approximate posterior and prior distributions (Kingma & Welling, 2013; Rezende et al., 2014) . Another class of GAEs seek to minimize the optimal transport cost (Villani, 2008) between the true data distribution and the generative model. This objective can be simplified into an objective minimizing a reconstruction error and subject to matching the aggregated posterior to the prior distribution (Bousquet et al., 2017) . This constraint is relaxed in the Wasserstein autoencoder (WAE) (Tolstikhin et al., 2017) via a penalty on the divergence between the aggregated posterior and the prior, allowing for a variety of discrepancies (Patrini et al., 2018; Kolouri et al., 2018) . Regardless of the criterion, training a GAE requires balancing low reconstruction error with a regularization loss that encourages the latent representation to be meaningful for data generation (Hinton & Salakhutdinov, 2006; Ruthotto & Haber, 2021) . Overly emphasizing minimization of the divergence metric between the data derived posterior and the prior in GAEs is problematic. In VAEs, this can manifest as posterior collapse (Higgins et al., 2017; Alemi et al., 2018; Takida et al., 2021) resulting in the latent space containing little information about the data. Meanwhile, WAE can suffer from over-regularization when the prior distribution is too simple, e.g. isotropic Gaussian (Rubenstein et al., 2018; Dai & Wipf, 2019) . Generally, it is difficult to decide when the posterior is close enough to the prior but not to a degree that is problematic. The difficulty is rooted in several issues: (a) an absence of tight constraints on the statistical distances; (b) distributions across minibatches used in the training; and (c) difference in scale between reconstruction and regularization objectives. Unlike statistical distances, goodness-of-fit (GoF) tests are statistical hypothesis tests that assess the indistinguishability between a given (empirical) distribution and a distribution class (Stephens, 2017) . In recent years, GAEs based on GoF tests have been proposed to address some of the aforementioned

