LEARNING DISENTANGLEMENT IN AUTOENCODERS THROUGH EULER ENCODING

Abstract

Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations that are independent of one another. To the best of our knowledge, this is the first deterministic model that is aiming to achieve disentanglement based on autoencoders without pairs of images or labels, by explicitly introducing inductive biases into a model architecture through Euler encoding. The proposed model is then compared with a number of state-of-the-art models, relevant to disentanglement, including symmetry-based and generative models based on autoencoders. Our evaluation using six different disentanglement metrics, including the unsupervised disentanglement metric we propose here in this paper, shows that the proposed model can offer better disentanglement, especially when variances of the features are different, where other methods may struggle. We believe that this model opens several opportunities for linear disentangled representation learning based on deterministic autoencoders.

1. INTRODUCTION

Learning generalizable representations of data is one of the fundamental aspects of modern machine learning (Rudin et al., 2022) . In fact, better representations are more than a luxury now, and is a key to achieving generalization, interpretability, and robustness of machine learning models (Bengio et al., 2013; Brakel & Bengio, 2017; Spurek et al., 2020) . One of the primary and desired characteristics of the learned representation is factorizability or disentanglement, so that latent representation can be composed of multiple, independent generative factors of variations. The disentanglement process renders the latent space features to become independent of one another, providing a basis for a set of novel applications, including scene rendering, interpretability, and unsupervised deep learning (Eslami et al., 2018; Iten et al., 2020; Higgins et al., 2021) . Deep generative models, particularly that build on variational autoencoders (VAEs) (Kingma & Welling, 2013; Kumar et al., 2017; Higgins et al., 2017; Tolstikhin et al., 2018; Burgess et al., 2018; Chen et al., 2018; Burgess et al., 2018; Kim & Mnih, 2018; Zhao et al., 2019) , have shown to be effective in learning factored representations. Although these approaches have advanced the disentangled representation learning by regularizing the latent spaces, there are a number of issues that limit their full potential: (a) VAE-based models consist of two loss components, and balancing these loss components is a well known issue (Asperti & Trentin, 2020) (b) it is almost impossible to honor the idealized notion of having a known prior distribution for VAEs in practical settings (Takahashi et al., 2019; Asperti & Trentin, 2020; Zhang et al., 2020; Aneja et al., 2021) and, (c) factorizing the aggregated posterior in the latent space does not guarantee corresponding uncorrelated representations (Locatello et al., 2019 ). An alternative approach for achieving disentangled representations is through seeking irreducible representations of the symmetry groups (Cohen & Welling, 2014; Higgins et al., 2018; Painter et al., 2020; Tonnaer et al., 2022) , where the aim is to find latent space transformations that are independent of one another, underpinned by well-defined mathematical framework(s) based on group theory. As this group of methods exploits the notion of transitions between samples, they require pairs of images representing the transitions (Cohen & Welling, 2014; Painter et al., 2020) or equivalent labels (Tonnaer et al., 2022) . Regardless of the approach, as shown in Locatello et al. (2019) , it is fundamentally impossible to learn disentangled representations without having inductive biases on either the model or the dataset, and both VAE-and symmetry-based approaches exemplify implicitly embedding inductive bias.

