ON DISENTANGLED REPRESENTATIONS LEARNED FROM CORRELATED DATA Anonymous authors Paper under double-blind review

Abstract

Despite impressive progress in the last decade, it still remains an open challenge to build models that generalize well across multiple tasks and datasets. One path to achieve this is to learn meaningful and compact representations, in which different semantic aspects of data are structurally disentangled. The focus of disentanglement approaches has been on separating independent factors of variation despite the fact that real-world observations are often not structured into meaningful independent causal variables. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of most prominent methods and disentanglement scores on correlated data in a large scale empirical study (including 4260 models). We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations, while widely used disentanglement scores fall short of capturing these latent correlations. Finally, we demonstrate how to disentangle these latent correlations using weak supervision, even if we constrain this supervision to be causally plausible. Our results thus support the argument to learn independent mechanisms rather than independent factors of variations.

1. INTRODUCTION

Figure 1 : While in principle we consider the presence of the objects (coffee cup, table, chair) to be independent mechanisms, they tend to appear together in observed data. Due to the induced structure, disentangled representations promise generalization to unseen scenarios (Higgins et al., 2017b) , increased interpretability (Adel et al., 2018; Higgins et al., 2018) and faster learning on downstream tasks (van Steenkiste et al., 2019; Locatello et al., 2019a) . While the advantages of disentangled representations have been well established, they generally assume the existence of natural factors that vary independently within the given dataset, which is rarely the case in real-world settings. As an example, consider a scene with a table and some chairs (see Fig. 1 ). The higher-level factors of this representation are in fact correlated and what we actually want to infer are independent (causal) mechanisms (Peters et al., 2017; Parascandolo et al., 2018; Suter et al., 2019; Goyal et al., 2019) . A complex generative model can be thought of as the composition of independent mechanisms or "causal" modules, which generate highdimensional observations (such as images or videos). In the causality community, this is often considered a prerequisite to achieve representations which are robust to interventions upon variables determined by such models (Peters et al., 2017) . One particular instantiation of this idea in the machine learning community is the notion of disentangled representations (Bengio et al., 2013) . The goal of disentanglement learning is to find a representation of the data which captures all the ground-truth factors of variation (FoV) independently. Despite the recent growth of the field, the performance of state-of-the-art disentanglement learners remains unknown for more realistic settings where FoV are correlated during training. Given the potential societal impact in the medical domain (Chartsias et al., 2018) or fair decision making (Locatello et al., 2019a; Madras et al., 2018; Creager et al., 2019) , the evaluation of the usefulness of disentangled representations trained on correlated data is of high importance.

