IDENTIFYING WEIGHT-VARIANT LATENT CAUSAL MODELS

Abstract

The task of causal representation learning aims to uncover latent higher-level causal representations that affect lower-level observations. Identifying true latent causal representations from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start from the analysis of three intrinsic indeterminacies in identifying latent space from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. Furthermore, based on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder (SuaVE), which directly learns latent causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. We show that SuaVE learns the true parameters asymptotically. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of SuaVE in learning latent causal representations.

1. INTRODUCTION

While there is no universal formal definition, one widely accepted feature of disentangled representations (Bengio et al., 2013) is that a change in one dimension corresponds to a change in one factor of variation in the underlying model of the data, while having little effect on others. The underlying model is rarely available for interrogation, however, which makes learning disentangled representations challenging. Several excellent works for disentangled representation learning have been proposed that focus on enforcing independence over the latent variables that control the factors of variation (Higgins et al., 2017; Chen et al., 2018; Locatello et al., 2019; Kim & Mnih, 2018; Locatello et al., 2020) . In many applications, however, the latent variables are not statistically independent, which is at odds with the notion of disentanglement above, i.e., foot length and body height exhibit strong positive correlation in the observed data (Träuble et al., 2021) . Causal representation learning avoids the aforementioned limitation, as it aims to learn a representation that exposes the unknown high-level causal structural variables, and the relationships between them, from a set of low-level observations (Schölkopf et al., 2021) . Unlike disentangled representation learning, it identifies the possible causal relations among latent variables. In fact, disentangled representation learning can be viewed as a special case of causal representation learning where the latent variables have no causal influences (Schölkopf et al., 2021) . One of the most prominent additional capabilities of causal representations is the ability to represent interventions and to make predictions regarding such interventions (Pearl, 2000) , which enables the generation of new samples that do not lie within the distribution of the observed data. This can be particularly useful to improve the generalization of the resulting model. Causal representations also enable answering counterfactual questions, e.g., would a given patient have suffered heart failure if they had started exercising a year earlier (Schölkopf et al., 2021)? 

