GM-VAE: REPRESENTATION LEARNING WITH VAE ON GAUSSIAN MANIFOLD

Abstract

We propose a Gaussian manifold variational auto-encoder (GM-VAE) whose latent space consists of a set of diagonal Gaussian distributions. It is known that the set of the diagonal Gaussian distributions with the Fisher information metric forms a product hyperbolic space, which we call a Gaussian manifold. To learn the VAE endowed with the Gaussian manifold, we first propose a pseudo Gaussian manifold normal distribution based on the Kullback-Leibler divergence, a local approximation of the squared Fisher-Rao distance, to define a density over the latent space. With the newly proposed distribution, we introduce geometric transformations at the last and the first of the encoder and the decoder of VAE, respectively to help the transition between the Euclidean and Gaussian manifolds. Through the empirical experiments, we show competitive generalization performance of GM-VAE against other variants of hyperbolic-and Euclidean-VAEs. Our model achieves strong numerical stability, which is a common limitation reported with previous hyperbolic-VAEs.

1. INTRODUCTION

The geometry of latent space in generative models, such as the variational auto-encoders (VAE) (Kingma & Welling, 2013) and generative adversarial networks (GAN) (Goodfellow et al., 2020) , reflects the structure of the representation of the data. Mathieu et al. (2019) ; Nagano et al. (2019) ; Cho et al. (2022) show that employing a hyperbolic space as the latent space improves in preserving the hierarchical structure of the data in the latent space. The expanded geometry is not just limited to the hyperbolic space, as the space can be other types of Riemannian manifolds, such as spherical manifolds (Xu & Durrett, 2018; Davidson et al., 2018) and the product of Riemannian manifolds with mixed curvatures (Skopek et al., 2019) . Meanwhile, it is known that univariate Gaussian distributions equipped with Fisher information metric (FIM) form a Riemannian manifold, sharing the manifold with Poincaré half-plane which is one of the four isometric hyperbolic models. This statistical manifold is known to have a metric tensor akin to that of the Poincaré half-plane (Costa et al., 2015) , providing a possibility of viewing it as a hyperbolic space. Furthermore, the diagonal Gaussian distributions form a product of Riemannian manifolds showing the presence of an extended statistical manifold. Based on the connection between hyperbolic spaces and statistical manifolds, in this work, we add an alternative perspective on hyperbolic VAEs with a viewpoint from the information geometry. Previously proposed hyperbolic VAEs rely on the distributions defined over the hyperbolic space. Riemannian normal and wrapped normal are commonly used as prior and variational distributions over the hyperbolic space. Unlike the Gaussian distribution in Euclidean space, these distributions suffer from numerical instability (Mathieu et al., 2019; Skopek et al., 2019) . In addition, the Riemannian normal requires performing rejection sampling, which often generates too many unwanted samples. From the information geometric perspective of the hyperbolic space, we introduce a new distribution, named a pseudo Gaussian manifold normal distribution (PGM normal). The Gaussian manifold, here, refers to the statistical manifold with univariate Gaussian distributions. The newly proposed distribution uses the KL divergence as a statistical distance between two distributions in the Gaussian manifold. Since the KL divergence approximates the squared Riemannian distance of the statistical manifold, derived from FIM, the proposed distribution follows the geometric property of the Gaussian distributions. We show that the PGM normal is easy to sample, and the KL divergence between two PGM normals can be computed analytically. With the PGM normal as prior and variational distributions, we define a Gaussian manifold VAE (GM-VAE), whose latent space is defined over the Gaussian manifold. Nevertheless, the data points are still assumed to be defined over the Euclidean space. To correct the mismatch between the data space and the latent space, we introduce a transformation from Euclidean to hyperbolic space at the last and the first layers of the encoder and decoder, respectively. Empirical experiments with multiple datasets show that GM-VAE can achieve a competitive generalization performance against existing hyperbolic VAEs. During the experiments, we observe that the PGM normal is robust in terms of sampling and computation of the KL divergence, compared to the commonly-used hyperbolic distributions; we briefly explain the reason why others are numerically unstable. Analysis of the latent space exhibits that the geometrical structures and probabilistic semantics of the dataset can be captured in the representations learned with GM-VAE. We summarize our contributions as follows: • We propose a variant of VAE whose latent space is defined on a statistical manifold formed by diagonal Gaussian distributions. • We propose a new distribution called pseudo Gaussian manifold normal distribution, which is easy to sample and has closed form KL-divergence, to train the VAE on the manifold. • We propose new encoder and decoder structures to support the proper transition between Euclidean (data) space and the statistical manifold. • We empirically verify that the newly proposed model performs similarly to existing hyperbolic VAEs while achieving stable training without numerical issues.

2. PRELIMINARIES

In this section, we first review the fundamental concepts of the Riemannian manifold. We then explain the commonly-used distributions over the Riemannian manifolds and visit the concepts of Riemannian geometry between statistical objects.

2.1. REVIEW OF RIEMANNIAN MANIFOLD

A n-dimensional Riemannian manifold consists of a manifold M and a metric tensor g : M → R n×n , which is a smooth map from each point x ∈ M to a symmetric positive definite matrix. The metric tensor g(x) defines the inner product of two tangent vectors for each point of the manifold ⟨•, •⟩ x : T x M × T x M → R, where T x M is the tangent space of x. A Riemannian manifold can be characterized by the curvature of the curves defined on it. The curvature of a Riemannian manifold can be computed at each point of the curves, while some manifolds have curvature of a constant value. For example, the unit sphere S has constant positive curvature of +1, and the Poincaré half-plane U has constant negative curvature of -1. The hyperbolic models Among the hyperbolic space, the Klein model, the Poincaré disk model, the Lorentz (Hyperboloid) model, and Poincaré half-plane model are known to be isometric and have the same value of curvature -1 (Nickel & Kiela, 2018; Gulcehre et al., 2018; Tifrea et al., 2018) . The metric tensor induces basic operations of the Riemannian manifold such as a geodesic, exponential map, log map, and parallel transport. Given two points x, y ∈ M, geodesic γ x : [0, 1] → M is a unit speed curve on M being the shortest path between γ(0) = x and γ(1) = y. This can be interpreted as the generalized curve of a straight line in the Euclidean space. The exponential map exp x : T x M → M is defined as γ(1), where γ is a geodesic starting from x and γ ′ (0) = v, where a tangent vector v ∈ T x M. The log map log x : M → T x M is the inverse of the exponential map, i.e., log x (exp x (v)) = v. The parallel transport PT x→y : T x M → T y M moves the tangent vector v along the geodesic between x and y. The distance function d M (x, y) can be induced from the metric tensor as follows: d M (x, y) = 1 0 ⟨ γ(t), γ(t)⟩ γ(t) dt. (1)

