UNSUPERVISED LEARNING OF GLOBAL FACTORS IN DEEP GENERATIVE MODELS

Abstract

We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in beta-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images.

1. INTRODUCTION

Since its first proposal by Kingma & Welling (2013) , Variational Autoencoders (VAEs) have evolved into a vast amount of variants. To name some representative examples, we can include VAEs with latent mixture models priors (Dilokthanakul et al. (2016) ), adapted to model time-series (Chung et al. (2015) ), trained via deep hierarchical variational families (Ranganath et al. (2016) , Tomczak & Welling (2018) ), or that naturally handle heterogeneous data types and missing data (Nazabal et al. (2020) ). The large majority of VAE-like models are designed over the assumption that data is i.i.d., which remains a valid strategy for simplifying the learning and inference processes in generative models with latent variables. A different modelling approach may drop the i.i.d. assumption with the goal of capturing a higher level of dependence between samples. Inferring such kind of higher level dependencies can directly improve current approaches to find interpretable disentangled generative models (Bouchacourt et al. (2018) ), to perform domain alignment (Heinze-Deml & Meinshausen (2017)) or to ensure fairness and unbiased data (Barocas et al. (2017) ). The main contribution of this paper is to show that a deep probabilistic VAE non i.i.d. model with both local and global latent variable can capture meaningful and interpretable correlation among data points in a completely unsupervised fashion. Namely, weak supervision to group the data samples is not required. In the following we refer to our model as Unsupervised Global VAE (UG-VAE). We combine a clustering inducing mixture model prior in the local space, that helps to separate the fundamental data features that an i.i.d. VAE would separate, with a global latent variable that modulates the properties of such latent clusters depending on the observed samples, capturing fundamental and interpretable data features. We demonstrate such a result using both CelebA, MNIST and the 3D FACES dataset in Paysan et al. (2009) . Furthermore, we show that the global latent space can explain common features in samples coming from two different databases without requiring any domain label for each sample, establishing a probabilistic unsupervised framework for domain alignment. Up to our knowledge, UG-VAE is the first VAE model in the literature that performs unsupervised domain alignment using global latent variables. Finally, we demonstrate that, even when the model parameters have been trained using an unsupervised approach, the global latent space in UG-VAE can discriminate groups of samples with non-trivial structures, separating groups of people with black and blond hair in CelebA or series of numbers in MNIST. In other words, if weak supervision is applied at test time, the posterior distribution of the global latent variable provides with an informative representation of the user defined groups of correlated data. 2018) proposed non i.i.d. exchangable models by including correlation information between datapoints via an undirected graph. Finally, some other works rely on simpler generative models (compared to these previous approaches), including global variables with fixed-complexity priors, typically a multi-variate Gaussian distribution, that aim at modelling the correlation between user-specified groups of correlated samples (e.g. images of the same class in MNIST, or faces of the same person). In Bouchacourt et al. (2018) or Hosoya ( 2019), authors apply weak supervision by grouping image samples by identity, and include in the probabilistic model a global latent variable for each of these groups, along with a local latent variable that models the distribution for each individual sample. Below we specify the two most relevant lines of research, in relation to our work.

2. RELATED

VAEs with mixture priors. Several previous works have demonstrated that incorporating a mixture in the latent space leads to learn significantly better models. In Johnson et al. ( 2016) authors introduce a latent GMM prior with nonlinear observations, where the means are learned and remain invariant with the data. The GMVAE proposal by Dilokthanakul et al. ( 2016) aims at incorporating unsupervised clustering in deep generative models for increasing interpretability. In the VAMP VAE model Tomczak & Welling (2018), the authors define the prior as a mixture with components given by approximated variational posteriors, that are conditioned on learnable pseudo-inputs. This approach leads to an improved performance, avoiding typical local optima difficulties that might be related to irrelevant latent dimensions. , where G denotes the number of groups. ML-VAE includes a local Gaussian variable S i that encodes style-related information for each sample, and global Gaussian variable C G to model shared in a group of samples. For instance, they feed their algorithm with batches of face images from the same person, modeling content shared within the group that characterize a person. This approach leads to learning a disentangled representations at the group and observations level, in a content-style fashion. Nevertheless, the groups are user-specified, hence resulting in a semisupervised modelling approach. In Vowels et al. ( 2020) authors use weak supervision for pairing samples. They implement two outer VAEs with shared weights for the reconstruction, and a Nested



WORK Non i.i.d. deep generative models are getting recent attention but the literature is still scarse. First we find VAE models that implement non-parametric priors: in Gyawali et al. (2019) the authors make use of a global latent variable that induces a non-parametric Beta process prior, and more efficient variational mechanism for this kind of IBP prior are introduced in Xu et al. (2019). Second, both Tang et al. (2019) and Korshunova et al. (

Figure 1: Comparison of four deep generative models. Dashed lines represent the graphical model of the associated variational family. The Vanilla VAE (a), the GMVAE (b), and semi-supervised variants for grouped data; ML-VAE (c) and NestedVAE (d).

