ADDRESSING THE TOPOLOGICAL DEFECTS OF DISEN-TANGLEMENT Anonymous authors Paper under double-blind review

Abstract

A core challenge in Machine Learning is to disentangle natural factors of variation in data (e.g. object shape vs pose). A popular approach to disentanglement consists in learning to map each of these factors to distinct subspaces of a model's latent representation. However, this approach has shown limited empirical success to date. Here, we show that this approach to disentanglement introduces topological defects (i.e. discontinuities in the encoder) for a broad family of transformations acting on images -encompassing simple affine transformations such as rotations and translations. Moreover, motivated by classical results from group representation theory, we propose an alternative, more flexible approach to disentanglement which relies on distributed equivariant operators, potentially acting on the entire latent space. We theoretically and empirically demonstrate the effectiveness of our approach to disentangle affine transformations. Our work lays a theoretical foundation for the recent success of a new generation of models using distributed operators for disentanglement

1. INTRODUCTION

Learning disentangled representations is arguably key to build robust, fair, and interpretable ML systems (Bengio et al., 2013; Lake et al., 2017; Locatello et al., 2019a) . However, it remains unclear how to achieve disentanglement in practice. Current approaches aim to map different factors of variations in the data to distinct subspaces of a latent representation, but have achieved only limited empirical success (Higgins et al., 2016; Burgess et al., 2018) . More work on the theoretical foundations of disentanglement could provide the key to the development of more successful approaches. In its original formulation, disentanglement consists in isolating statistically independent factors of variation in data into independent latent dimensions. This perspective has led to a range of theoretical studies investigating the conditions under which these factors are identifiable (Locatello et al., 2019b; Shu et al., 2020; Locatello et al., 2020; Hauberg, 2019; Khemakhem et al., 2020) . More recently, Higgins et al. ( 2018) has proposed an alternative perspective connecting disentanglement to group theory (see Appendix A for a primer on group theory). In this framework, the factors of variation are different subgroups acting on the dataset, and the goal is to learn representations where separated subspaces are equivariant to distinct subgroups -a promising formalism since many transformations found in the physical world are captured by group structures (Noether, 1915) . However, the fundamental principles for how to design models capable of learning such equivariances remain to be discovered (but see Caselles-Dupré et al. (2019) ). Here we attack the problem of disentanglement through the lens of topology (Munkres, 2014) . We show that for a very broad class of transformations acting on images -encompassing all affine transformations (e.g. translations, rotations), an encoder that would map these transformations into dedicated latent subspaces would necessarily be discontinuous. With this assurance, we reframe disentanglement by distinguishing its objective from its traditional implementation, resolving the discontinuities of the encoder. Guided by classical results from group representation theory

availability

All code is available at https://anonymous.4open. science/r/5b7e2cbb-54dc-4fde-bc2c-8f75d29fc15a/.

