IDENTIFYING WEIGHT-VARIANT LATENT CAUSAL MODELS

Abstract

The task of causal representation learning aims to uncover latent higher-level causal representations that affect lower-level observations. Identifying true latent causal representations from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start from the analysis of three intrinsic indeterminacies in identifying latent space from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. Furthermore, based on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder (SuaVE), which directly learns latent causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. We show that SuaVE learns the true parameters asymptotically. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of SuaVE in learning latent causal representations.

1. INTRODUCTION

While there is no universal formal definition, one widely accepted feature of disentangled representations (Bengio et al., 2013) is that a change in one dimension corresponds to a change in one factor of variation in the underlying model of the data, while having little effect on others. The underlying model is rarely available for interrogation, however, which makes learning disentangled representations challenging. Several excellent works for disentangled representation learning have been proposed that focus on enforcing independence over the latent variables that control the factors of variation (Higgins et al., 2017; Chen et al., 2018; Locatello et al., 2019; Kim & Mnih, 2018; Locatello et al., 2020) . In many applications, however, the latent variables are not statistically independent, which is at odds with the notion of disentanglement above, i.e., foot length and body height exhibit strong positive correlation in the observed data (Träuble et al., 2021) . Causal representation learning avoids the aforementioned limitation, as it aims to learn a representation that exposes the unknown high-level causal structural variables, and the relationships between them, from a set of low-level observations (Schölkopf et al., 2021) . Unlike disentangled representation learning, it identifies the possible causal relations among latent variables. In fact, disentangled representation learning can be viewed as a special case of causal representation learning where the latent variables have no causal influences (Schölkopf et al., 2021) . One of the most prominent additional capabilities of causal representations is the ability to represent interventions and to make predictions regarding such interventions (Pearl, 2000) , which enables the generation of new samples that do not lie within the distribution of the observed data. This can be particularly useful to improve the generalization of the resulting model. Causal representations also enable answering counterfactual questions, e.g., would a given patient have suffered heart failure if they had started exercising a year earlier (Schölkopf et al., 2021) ? Despite its advantages, causal representation learning is a notoriously hard problem-without certain assumptions, identifying the true latent causal model from observed data is generally not possible. There are three primary approaches to achieve identifiability: 1) adapting (weakly) supervised methods with given latent causal graphs or/and labels (Kocaoglu et al., 2018; Yang et al., 2021; Von Kügelgen et al., 2021; Brehmer et al., 2022) , 2) imposing sparse graphical conditions, e.g., with bottleneck graphical conditions (Adams et al., 2021; Xie et al., 2020; Lachapelle et al., 2021) , 3) using temporal information (Yao et al., 2021; Lippe et al., 2022) . A brief review is provided in Section 2. For the supervised approach, when labels are known, the challenging identifiability problem in latent space has been transferred to an identifiability problem in the observed space, for which some commonly-used functional classes have been proven to be identifiable (Zhang & Hyvarinen, 2012; Peters et al., 2014) . Given latent causal graphs overly depends on domain knowledge. For the second approach, many true latent causal graphs do not satisfy the assumed sparse graph structure. The temporal approach is only applicable when temporal information or temporal intervened information among latent factors is available. In this work, we explore a new direction in the identifiability of latent causal representations, by allowing causal influences among latent causal variables to change, motivated by recent advances in nonlinear ICA (Hyvarinen et al., 2019; Khemakhem et al., 2020) . (Hyvarinen et al., 2019; Khemakhem et al., 2020) have shown that with an additional observed variable u to modulate latent independent variables, the latent independent variables are identifiable. Then a question naturally arises: with causal relationships among latent variables, what additional assumptions are required for the identifiability? To answer this question, we start from the analysis of three intrinsic indeterminacies in latent space (see Section 3): transitivity, permutation indeterminacy, and scaling indeterminacy, which further give rise to the following insights. 1) Transitivity is the scourge of identifiablity of the latent causal model. 2) Permutation indeterminacy means the recovered latent variables can have arbitrary permutation of the underlying orders, due to the flexibility in latent space. This nature enables us to enforce the learned causal representations in correct causal orders regularized by a predefined a directed acyclic supergraph, avoiding troublesome directed acyclic graph (DAG) constraint. 3) Scaling and permutation indeterminacy only allow recovering latent causal variables up to permutation and scaling, not the exact values. To overcome the transitivity challenge we model the underlying causal representation with weight-variate linear Gaussian models, where both the weights (i.e., causal coefficients) and the mean and variance of the Gaussian noise are modulated by an additional observed variable u (see Section 4). With these assumptions, we can show that the latent causal representations can be recovered up to a trivial permutation and scaling. The key to the identifiability is that the causal influences (weights) among the latent causal variables are allowed to change. Intuitively, the changing causal influences enable us to obtain interventional observed data, which enables the identifiability of the latent causal variables. Based on this result, in Section 5, we further propose a novel method, Structural caUsAl Variational autoEncoder (SuaVE), for learning latent causal representations with consistency guarantee. Section 6 verifies the efficacy of the proposed approach on both synthetic and real fMRI data.

2. RELATED WORK

Due to the challenges of identifiability in causal representation learning, most existing works handle this problem by imposing assumptions. We thus give a brief review of the related work on this basis. (Weakly) Supervised Causal Representation Learning Approaches falling within this class assume known latent causal graphs or labels. CausalGAN (Kocaoglu et al., 2018) requires a-priori knowledge of the structure of the causal graph of latent variables, which is a significant practical limitation. CausalVAE (Yang et al., 2021) needs additional labels to supervise the learning of latent variables. Such labels are not commonly available, however, and manual labeling can be costly and error-prone. Von Kügelgen et al. (2021) use a known but non-trivial causal graph between content and style factors to study self-supervised causal representation learning. Brehmer et al. (2022) learn causal representation in a weakly supervised setting whereby they assume access to data pairs representing the system before and after a randomly chosen unknown intervention. Sparse Graphical Structure Most recent progress in identifiability focuses on sparse graphical structure constraints (Silva et al., 2006; Shimizu et al., 2009; Anandkumar et al., 2013; Frot et al., 2019; Cai et al., 2019; Xie et al., 2020; 2022) . Adams et al. (2021) provided a unifying viewpoint of

