ENCODED PRIOR SLICED WASSERSTEIN AUTOEN-CODER FOR LEARNING LATENT MANIFOLD REPRESEN-TATIONS

Abstract

While variational autoencoders have been successful in a variety of tasks, the use of conventional Gaussian or Gaussian mixture priors are limited in their ability to encode underlying structure of data in the latent representation. In this work, we introduce an Encoded Prior Sliced Wasserstein AutoEncoder (EPSWAE) wherein an additional prior-encoder network facilitates learns an embedding of the data manifold which preserves topological and geometric properties of the data, thus improving the structure of latent space. The autoencoder and prior-encoder networks are iteratively trained using the Sliced Wasserstein (SW) distance, which efficiently measures the distance between two arbitrary sampleable distributions without being constrained to a specific form as in the KL divergence, and without requiring expensive adversarial training. To improve the representation, we use (1) a structural consistency term in the loss that encourages isometry between feature space and latent space and (2) a nonlinear variant of the SW distance which averages over random nonlinear shearing. The effectiveness of the learned manifold encoding is best explored by traversing the latent space through interpolations along geodesics which generate samples that lie on the manifold and hence are advantageous compared to standard Euclidean interpolation. To this end, we introduce a graph-based algorithm for interpolating along network-geodesics in latent space by maximizing the density of samples along the path while minimizing total energy. We use the 3D-spiral data to show that the prior does indeed encode the geometry underlying the data and to demonstrate the advantages of the network-algorithm for interpolation. Additionally, we apply our framework to MNIST, and CelebA datasets, and show that outlier generations, latent representations, and geodesic interpolations are comparable to the state of the art.

1. INTRODUCTION

Generative models have the potential to capture rich representations of data and use them to generate realistic outputs. In particular, Variational AutoEncoders (VAEs) (Kingma & Welling, 2014) can capture important properties of high-dimensional data in their latent embeddings, and sample from a prior distribution to generate realistic images. Whille VAEs have been very successful in a variety of tasks, the use of a simplistic standard normal prior is known to cause problems such as under-fitting and over-regularization, and fails to use the network's entire modeling capacity (Burda et al., 2016) . Gaussian or Gaussian mixture model (GMM) priors are also limited in their ability to represent geometric and topological properties of the underlying data manifold. High-dimensional data can typically be modeled as lying on or near an embedded low-dimensional, nonlinear manifold (Fefferman et al., 2016) . Learning improved latent representations of this nonlinear manifold is an important problem, for which a more flexible prior may be desirable. Conventional variational inference uses Kullback-Leibler (KL) divergence as a measure of distance between the posterior and the prior, restricting the prior distribution to cases that have tractable approximations of the KL divergence. Many works such as Guo et al. (2020) ; Tomczak & Welling (2018); Rezende & Mohamed (2015) etc. have investigated the use of more complicated priors (notably GMMs) which lead to improved latent representation and generation compared to a single Gaussion prior. Alternate approaches such as adversarial training learn arbitrary priors by using a

