GEOMETRICALLY REGULARIZED AUTOENCODERS FOR NON-EUCLIDEAN DATA

Abstract

Regularization is almost de rigueur when designing autoencoders that are sparse and robust to noise. Given the recent surge of interest in machine learning problems involving non-Euclidean data, in this paper we address the regularization of autoencoders on curved spaces. We show that by ignoring the underlying geometry of the data and applying standard vector space regularization techniques, autoencoder performance can be severely degraded, or worse, training can fail to converge. Assuming that both the data space and latent space can be modeled as Riemannian manifolds, we show how to construct regularization terms in a coordinate-invariant way, and develop geometric generalizations of the denoising autoencoder and reconstruction contractive autoencoder such that the essential properties that enable the estimation of the derivative of the log-probability density are preserved. Drawing upon various non-Euclidean data sets, we show that our geometric autoencoder regularization techniques can have important performance advantages over vector-spaced methods while avoiding other breakdowns that can result from failing to account for the underlying geometry.

1. INTRODUCTION

Regularization is almost de rigueur when designing autoencoders that are sparse and robust to noise. With appropriate regularization, autoencoders enable representations useful for downstream applications (Bengio et al., 2013) , generate plausible data samples (Kingma & Welling, 2013; Rezende et al., 2014) , or even obtain information on the data-generating probability density (Vincent et al., 2010; Rifai et al., 2011b) . Existing work on autoencoder regularization has mostly been confined to vector spaces, i.e., the data are assumed to be drawn from a vector space. On the other hand, a significant and growing number of problems in machine learning involve data that is non-Euclidean (in some past cases the fact that the data was non-Euclidean was not recognized or ignored). Bronstein et al. (2017) reviews several deep neural network architectures and modeling principles to explicitly deal with data defined on non-Euclidean domains, e.g., data collected from sensor networks, social networks in computational social sciences, or two-dimensional meshes embedded in the three-dimensional space. Other works have also addressed manifold-valued data including human mass and shape data (Kendall, 1984; Freifeld & Black, 2012 ), directional data (Mardia, 2014) , point cloud data (Lee et al., 2022), and MRI imaging data (Fletcher & Joshi, 2007; Banerjee et al., 2015) , with several deep neural networks proposed to handle such data in a coordinate-invariant way (Huang & Van Gool, 2017; Chakraborty et al., 2020) . The fundamental idea behind these works is that the geometrical structure of the curved space from which the non-Euclidean data are drawn needs to be accounted for properly, so that the output of any deep learning network applied to such input data should not depend on the particular choice of coordinates used to parametrize the data. Ignoring the underlying geometry of the data and simply applying standard vector space techniques can severely degrade performance, or worse, cause training to fail. Autoencoder training and its regularization are no exception. For example, consider autoencoder training on a set of data points on a sphere as shown in Figure 1 . When using spherical coordinate representations as inputs to train an autoencoder with a contractive regularization, the trained reconstruction function can heavily depend on the choice of coordinates. Moreover, it often fails to learn the correct contractive directions toward datadense regions, especially near the singularity (or the spherical coordinate origins). On the other hand, an autoencoder that properly reflects the spherical constraints can recover those directions successfully and show results almost invariant to the choice of coordinates. In this paper we address the regularization of autoencoders on curved spaces. Any loss function used to train or regularize the autoencoder should be formulated in a coordinate-invariant way, i.e., invariant to the choice of local coordinates used to parametrize the data, and instead depend only on the intrinsic properties of the curved space such as curvature or the choice of metric. Assuming that both the data space and latent space can be modeled as Riemannian manifolds, we show how to construct regularization terms and objective functions in a coordinate-invariant way. We also develop geometric generalizations of the denoising autoencoder (DAE) and reconstruction contractive autoencoder (RCAE) such that the essential properties that enable the estimation of the score, i.e., the log-derivative of the data-generating density, are preserved. We provide some applications that use this property, such as sampling, clustering, and filtering for non-Euclidean data, and also show that the proposed autoencoders can obtain useful representations for non-Euclidean data, especially when noise exists in data. Drawing upon various non-Euclidean data sets, we show that our geometric autoencoder regularization techniques can have important performance advantages over vector-spaced methods -in some cases by significant margins -while avoiding other breakdowns that can result from failing to account for the underlying geometry. The paper is organized as follows. We describe regularized autoencoders for Euclidean data in Section 2 and propose their coordinate-invariant generalizations to non-Euclidean data in Section 3. We then provide autoencoder training case studies using non-Euclidean data sets in Section 4.

2. REGULARIZED AUTOENCODERS FOR EUCLIDEAN DATA

Mathematically, an autoencoder can be represented as the composition of two mappings f : R D → R d (the encoder) and g : R d → R D (the decoder), i.e., r = g • f : R D → R D with the space of hidden variables R d . Assume there exists a data-generating probability density ρ : R D → R from which data points on R D are drawn. Autoencoder training in a vector space can then be formulated as minimizing the reconstruction error R D ∥xg(f(x; θ 1 ); θ 2 )∥ 2 ρ(x) dx over θ = (θ 1 , θ 2 ), where x ∈ R D denotes an input variable, θ 1 and θ 2 are respectively the parameter sets of the maps f and g, and ∥•∥ 2 is the squared Euclidean norm. In the typical case where d < D, the autoencoder engages in a type of dimensionality reduction. By disregarding the assumption that d < D, autoencoders have been modeled in the form of deep artificial neural networks accompanied by certain regularization terms to learn useful representations of the data (Bengio et al., 2007; Vincent et al., 2008; 2010; Ranzato et al., 2007; 2008; Kingma & Welling, 2013; Rezende et al., 2014; Rifai et al., 2011b; a) . For a more comprehensive review of autoencoders, we refer the reader to Goodfellow et al. (2016) . In the meantime, the effects of regularization have been investigated in some detail in Alain & Bengio (2014) for the denoising autoencoder (DAE) (Vincent et al., 2010) and the reconstruction contractive autoencoder (RCAE). They point out that these regularization methods reduce the autoencoder's sensitivity to the input, while the reconstruction error increases the autoencoder's sensitivity to variations along the region of the highest density in the data space. Reconstruction and regularization together successfully capture variations in such regions while ignoring variations that



Figure 1: Autoencoder training on spherical data sampled from the von Mises-Fisher (vMF) distribution. We train the reconstruction contractive autoencoder (RCAE) and the geometric RCAE (GRCAE). For (b)-(c), we plot the reconstruction directions of the autoencoders trained using representations obtained from different coordinate choices. The results from each coordinate choice are color-coded along with corresponding spherical coordinate origins. (See Appendix E for more details.)

