UNSUPERVISED LEARNING OF GLOBAL FACTORS IN DEEP GENERATIVE MODELS

Abstract

We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in beta-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images.

1. INTRODUCTION

Since its first proposal by Kingma & Welling (2013) The large majority of VAE-like models are designed over the assumption that data is i.i.d., which remains a valid strategy for simplifying the learning and inference processes in generative models with latent variables. A different modelling approach may drop the i.i.d. assumption with the goal of capturing a higher level of dependence between samples. Inferring such kind of higher level dependencies can directly improve current approaches to find interpretable disentangled generative models (Bouchacourt et al. The main contribution of this paper is to show that a deep probabilistic VAE non i.i.d. model with both local and global latent variable can capture meaningful and interpretable correlation among data points in a completely unsupervised fashion. Namely, weak supervision to group the data samples is not required. In the following we refer to our model as Unsupervised Global VAE (UG-VAE). We combine a clustering inducing mixture model prior in the local space, that helps to separate the fundamental data features that an i.i.d. VAE would separate, with a global latent variable that modulates the properties of such latent clusters depending on the observed samples, capturing fundamental and interpretable data features. We demonstrate such a result using both CelebA, MNIST and the 3D FACES dataset in Paysan et al. (2009) . Furthermore, we show that the global latent space can explain common features in samples coming from two different databases without requiring any domain label for each sample, establishing a probabilistic unsupervised framework for domain alignment. Up to our knowledge, UG-VAE is the first VAE model in the literature that performs unsupervised domain alignment using global latent variables. Finally, we demonstrate that, even when the model parameters have been trained using an unsupervised approach, the global latent space in UG-VAE can discriminate groups of samples with non-trivial structures, separating groups of people with black and blond hair in CelebA or series of 1



, Variational Autoencoders (VAEs) have evolved into a vast amount of variants. To name some representative examples, we can include VAEs with latent mixture models priors (Dilokthanakul et al. (2016)), adapted to model time-series (Chung et al. (2015)), trained via deep hierarchical variational families (Ranganath et al. (2016), Tomczak & Welling (2018)), or that naturally handle heterogeneous data types and missing data (Nazabal et al. (2020)).

(2018)), to perform domain alignment (Heinze-Deml & Meinshausen (2017)) or to ensure fairness and unbiased data (Barocas et al. (2017)).

