DISENTANGLING LEARNING REPRESENTATIONS WITH DENSITY ESTIMATION

Abstract

Disentangled learning representations have promising utility in many applications, but they currently suffer from serious reliability issues. We present Gaussian Channel Autoencoder (GCAE), a method which achieves reliable disentanglement via flexible density estimation of the latent space. GCAE avoids the curse of dimensionality of density estimation by disentangling subsets of its latent space with the Dual Total Correlation (DTC) metric, thereby representing its high-dimensional latent joint distribution as a collection of many low-dimensional conditional distributions. In our experiments, GCAE achieves highly competitive and reliable disentanglement scores compared with state-of-the-art baselines.

1. INTRODUCTION

The notion of disentangled learning representations was introduced by Bengio et al. ( 2013) -it is meant to be a robust approach to feature learning when trying to learn more about a distribution of data X or when downstream tasks for learned features are unknown. Since then, disentangled learning representations have been proven to be extremely useful in the applications of natural language processing Jain et al. ( 2018 While these VAE-based disentanglement methods have been the most successful in the field, Locatello et al. (2019) point out serious reliability issues shared by all. In particular, increasing disentanglement pressure during training doesn't tend to lead to more independent representations, there currently aren't good unsupervised indicators of disentanglement, and no method consistently dominates the others across all datasets. Locatello et al. (2019) stress the need to find the right inductive biases in order for unsupervised disentanglement to truly deliver. We seek to make disentanglement more reliable and high-performing by incorporating new inductive biases into our proposed method, Gaussian Channel Autoencoder (GCAE). We shall explain them in



), content and style separation John et al. (2018), drug discovery Polykovskiy et al. (2018); Du et al. (2020), fairness Sarhan et al. (2020), and more. Density estimation of learned representations is an important ingredient to competitive disentanglement methods. Bengio et al. (2013) state that representations z ∼ Z which are disentangled should maintain as much information of the input as possible while having components which are mutually invariant to one another. Mutual invariance motivates seeking representations of Z which have independent components extracted from the data, necessitating some notion of p Z (z). Leading unsupervised disentanglement methods, namely β-VAE Higgins et al. (2016), FactorVAE Kim & Mnih (2018), and β-TCVAE Chen et al. (2018) all learn p Z (z) via the same variational Bayesian framework Kingma & Welling (2013), but they approach making p Z (z) independent with different angles. β-VAE indirectly promotes independence in p Z (z) via enforcing low D KL between the representation and a factorized Gaussian prior, β-TCVAE encourages representations to have low Total Correlation (TC) via an ELBO decomposition and importance weighted sampling technique, and FactorVAE reduces TC with help from a monolithic neural network estimate. Other well-known unsupervised methods are Annealed β-VAE Burgess et al. (2018), which imposes careful relaxation of the information bottleneck through the VAE D KL term during training, and DIP-VAE I & II Kumar et al. (2017), which directly regularize the covariance of the learned representation. For a more in-depth description of related work, please see Appendix D.

