QUANTITATIVE UNDERSTANDING OF VAE AS A NON-LINEARLY SCALED ISOMETRIC EMBEDDING

Abstract

Variational autoencoder (VAE) estimates the posterior parameters (mean and variance) of latent variables corresponding to each input data. While it is used for many tasks, the transparency of the model is still an underlying issue. This paper provides a quantitative understanding of VAE property by interpreting VAE as a non-linearly scaled isometric embedding. According to the Rate-distortion theory, the optimal transform coding is achieved by using a PCA-like orthonormal transform where the transform space is isometric to the input. From this analogy, we show theoretically and experimentally that VAE can be mapped to an implicit isometric embedding with a scale factor derived from the posterior parameter. As a result, we can estimate the data probabilities in the input space from the prior, loss metrics, and corresponding posterior parameters. In addition, the quantitative importance of each latent variable can be evaluated like the eigenvalue of PCA.

1. INTRODUCTION

Variational autoencoder (VAE) (Kingma & Welling, 2014) is one of the most successful generative models, estimating posterior parameters of latent variables for each input data. In VAE, the latent representation is obtained by maximizing an evidence lower bound (ELBO). A number of studies (Higgins et al., 2017; Kim & Mnih, 2018; Lopez et al., 2018; Chen et al., 2018; Locatello et al., 2019; Alemi et al., 2018; Rolínek et al., 2019) have tried to reveal the property of latent variables. However, quantitative behavior of VAE is still not well clarified. For example, there has not been a theoretical formulation of the reconstruction loss and KL divergence in ELBO after optimization. More specifically, although the conditional distribution p θ (x|z) in the reconstruction loss of ELBO is predetermined such as the Gaussian or Bernoulli distributions, it has not been discussed well whether the true conditional distribution after optimization matches the predetermined distribution. Rate-distortion (RD) theory (Berger, 1971) , which is an important part of Shannon information theory and successfully applied to image compression, quantitatively formulates the RD trade-off optimum in lossy compression. To realize a quantitative data analysis, Rate-distortion (RD) theory based autoencoder, RaDOGAGA (Kato et al., 2020) , has been proposed with isometric embedding (Han & Hong, 2006) where the distance between arbitrary two points of input space in a given metrics is always the same as L2 distance in the isometric embedding space. In this paper, by mapping VAE latent space to an implicit isometric space like RaDOGAGA on variable-by-variable basis and analysing VAE quantitatively as a well-examined lossy compression, we thoroughly clarify the quantitative properties of VAE theoretically and experimentally as follows. 1) Implicit isometric embedding is derived in the loss metric defined space such that the entropy of data representation becomes minimum. A scaling factor between the VAE latent space and implicit isometric space is formulated by the posterior for each input. In the case of β-VAE, the posterior variance of each dimensional component in the implicit isometric embedding space is a constant β/2, which is analogous to the rate-distortion optimal of transform coding in RD theory. As a result, the reconstruction loss and KL divergence in ELBO can be quantitatively formulated. 2) From these properties, VAE can provide a practical quantitative analysis of input data. First, the data probabilities in the input space can be estimated from the prior, loss metric, and posterior parameters. In addition, the quantitative importance of each latent variable, analogous to the eigenvalue of PCA, can be evaluated from the posterior variance of VAE. This work will lead the information theoretic generative models in the right direction.

