IDENTIFYING INFORMATIVE LATENT VARIABLES LEARNED BY GIN VIA MUTUAL INFORMATION Anonymous authors Paper under double-blind review

Abstract

How to learn a good representation of data is one of the most important topics of machine learning. Disentanglement of representations, though believed to be the core feature of good representations, has caused a lot of debates and discussions in recent. Sorrenson et al. (2020), using the techniques developed in nonlinear independent component analysis theory, show that general incompressible-flow networks (GIN) can recover the underlying latent variables that generate the data, and thus can provide a compact and disentangled representation. However, in this paper, we point out that the method taken by GIN for informative latent variables selection is not theoretically supported and can be disproved by experiments. We propose to use the mutual information between each learned latent variables and the auxiliary variable to correctly identify informative latent variables. We directly verify the improvement brought by our method in experiments on synthetic data. We further show the advantage of our method on various downstream tasks including classification, outlier detection and adversarial attack defence on both synthetic and real data.

1. INTRODUCTION

Representation learning is arguably one of the most important area in machine learning. Many researchers believe that being able to extract useful and interpretable features is a crucial advantage of deep networks over other learning models. A data representation can be obtained either via a supervised learning task or an unsupervised learning task. The former one includes the popular ImageNet pretrained backbone in computer vision tasks, while the latter one mainly consists of generative models like variants of VAEs, GANs and flow-based models. Among all the generative models, VAEs and flow-based models can naturally output the representation of data or even their density, which is convenient for the representation learning purpose. Moreover, supervision information of labels can be integrated into generative models to further improve their performance. General Incompressible-flow Networks (GIN, Sorrenson et al. ( 2020)), the model we considered in this paper, falls into this case. Disentanglement is a widely discussed concept by many representation learning works. However, to the best of our knowledge, it has not been given a widely accepted definition (Bengio et al., 2013; Higgins et al., 2018) . Many disentangled representation learning algorithms focus on recovering the independent latent variables that generate the data (Burgess et al., 2018; Chen et al., 2018b) . However, Locatello et al. (2018) show that without more assumptions than independence among latent variables, it is impossible to recover them from the observation of data. This result is equivalent to the non-identifiability of non-linear independent component analysis (ICA) (Comon, 1994) . Actually, any assumptions solely on the latent variables' distribution without referring to the observable data is not sufficient for the identifiability (Hyvarinen and Morioka, 2016; Khemakhem et al., 2020) . A set of sufficient conditions is proposed under the framework of nonlinear ICA by Khemakhem et al. (2020) . The core condition requires that the data, denoted by x, are generated by latent vectors z through a generative model p(x | z), and conditioned on an auxiliary variable u, the entries of z are independent and follow some exponential family distributions. This can be expressed by the 1

