IMPROVING THE UNSUPERVISED DISENTANGLED REPRESENTATION LEARNING WITH VAE ENSEMBLE

Abstract

Variational Autoencoder (VAE) based frameworks have achieved the state-of-the-1 art performance on the unsupervised disentangled representation learning. A re-2 cent theoretical analysis shows that such success is mainly due to the VAE im-3 plementation choices that encourage a PCA-like behavior locally on data sam-4



, and is shown to improve the 22 efficiency and generalization of supervised learning (Locatello et al., 2019) , reinforcement learning 23 (Watters et al., 2019) , and reasoning tasks (van Steenkiste et al., 2019) . The current state-of-the-24 art unsupervised disentangled representation learning deploy the Variational Autoencoder (VAE) 25 (Kingma & Welling, 2013; Rezende et al., 2014) . The main challenge is to reduce the trade-off 26 between learning a disentangled representation and reconstructing input data. Most of the recent 27 works extend the original VAE objective with carefully designed augmented objective to address 28 this trade-off (Higgins et al., 2017; Burgess et al., 2017; Kim & Mnih, 2018; Chen et al., 2018; 29 Kumar et al., 2017) . A recent study in (Locatello et al., 2018 ) compared these methods and showed 30 that their performance is sensitive to initialization and hyperparameter setting of the augmented 31 objective function. (Rolinek et al., 2019) that the implementation 35 choices of VAE encourage a local PCA-like behavior locally on data samples. As a result, disen-36 tangled representations by VAEs are "alike" as they are similar up to signed permutation transfor-37 mations. On the contrary, the entangled representations by VAEs are "unique" as they are similar 38 at least up to non-degenerate rotation matrices. UDR uses multiple models trained with different 39 initializations and hyperparameter settings, and builds a similarity matrix measuring the pair-wise 40 similarity between the latent variables from different models. A higher score is given to the model 41 that can match its representations to many others models. The results show close match between 42 UDR and commonly used supervised metrics, as well as the performance of downstream tasks using 43 the latent representations. 



ples. Despite this implied model identifiability, the VAE based disentanglement 5 frameworks still face the trade-off between the local orthogonality and data re-6 construction. As a result, models with the same architecture and hyperparameter 7 setting can sometimes learn entangled representations. To address this challenge, 8 we propose a simple yet effective VAE ensemble framework consisting of multi-9 ple VAEs. It is based on the assumption that entangled representations are unique 10 in their own ways, and the disentangled representations are "alike" (similar up to a 11 signed permutation transformation). In the proposed VAE ensemble, each model 12 not only maintains its original objective, but also encodes to and decodes from 13 other models through pair-wise linear transformations between the latent repre-14 sentations. We show both theoretically and experimentally, the VAE ensemble 15 objective encourages the linear transformations connecting the VAEs to be triv-16 ial transformations, aligning the latent representations of different models to be 17 "alike". We compare our approach with the state-of-the-art unsupervised disen-18 tangled representation learning approaches and show the improved performance.

aims to capture the semantically meaningful compositional 21 representation of data

Recently,Duan et al. (Duan et al., 2019) developed an unsupervised model selection method named 33 Unsupervised Disentanglement Ranking (UDR) to address the challenge of hyperparameter search 34 and model selection. UDR leverages the finding in

