THE TILTED VARIATIONAL AUTOENCODER: IMPROVING OUT-OF-DISTRIBUTION DETECTION

Abstract

A problem with using the Gaussian distribution as a prior for a variational autoencoder (VAE) is that the set on which Gaussians have high probability density is small as the latent dimension increases. This is an issue because VAEs aim to achieve both a high likelihood with respect to a prior distribution and at the same time, separation between points for better reconstruction. Therefore, a small volume in the high-density region of the prior is problematic because it restricts the separation of latent points. To address this, we propose a simple generalization of the Gaussian distribution, the tilted Gaussian, whose maximum probability density occurs on a sphere instead of a single point. The tilted Gaussian has exponentially more volume in high-density regions than the standard Gaussian as a function of the distribution dimension. We empirically demonstrate that this simple change in the prior distribution improves VAE performance on the task of detecting unsupervised out-of-distribution (OOD) samples. We also introduce a new OOD testing procedure, called the Will-It-Move test, where the tilted Gaussian achieves remarkable OOD performance.

1. INTRODUCTION

Due to its simplicity, the Gaussian distribution is a common prior for the variational autoencoder (VAE) (Kingma & Welling, 2014; Rezende et al., 2014) . One drawback it has is that the region of high probability density becomes relatively smaller as the latent dimension increases. To see why this is an issue, consider the objective of the VAE. It tries to encode points such that they are close to the prior and can reconstruct points into their original form. Given a limited capacity of an encoder/decoder model, points in the latent space must be separated to have significant differences in their reconstructed points. With a sufficiently complex data set, it would be required to have a large volume in the high density region of the prior distribution to accommodate each of the latent points, while allowing for sufficient separation. We argue that the Gaussian distribution's volume under regions of high probability density is not large enough to accommodate real data sets. To this end, we show that many of the points encoded by Gaussian prior VAEs exist in low-density regions, and that the high-density region remains relatively empty. In support, Nalisnick et al. (2019a) report that the latent point at the highest density of a Gaussian VAE trained on MINST was an all-black image. To deal with these issues, we propose a simple generalization of the Gaussian distribution called the tilted Gaussian distribution. We create this distribution by "exponentially tilting" the ordinary multivariate Gaussian distribution by its norm. The operation of exponential tilting is a common procedure in such diverse fields as statistical mechanics, large deviations or importance sampling, but we believe using it for VAEs as we do here is a novel contribution. The tilted Gaussian has a maximum probability density lying on the surface of a sphere rather than at a single point. A single parameter corresponds to the sphere's radius, allowing for control of the volume under the high-density region of the distribution. We show that the tilted Gaussian has exponentially more volume that the standard Gaussian as as function of the latent dimension, allowing for a far greater proportion of points from a dataset to exist in regions of

