LEARNING LATENT STRUCTURAL CAUSAL MODELS

Abstract

Causal learning has long concerned itself with the accurate recovery of underlying causal mechanisms. Such causal modelling enables better explanations of out-ofdistribution data. Prior works on causal learning assume that the high-level causal variables are given. However, in machine learning tasks, one often operates on low-level data like image pixels or high-dimensional vectors. In such settings, the entire Structural Causal Model (SCM) -structure, parameters, and high-level causal variables -is unobserved and needs to be learnt from low-level data. We treat this problem as Bayesian inference of the latent SCM, given low-level data. For linear Gaussian additive noise SCMs, we present a tractable approximate inference method which performs joint inference over the causal variables, structure and parameters of the latent SCM from random, known interventions. Experiments are performed on synthetic datasets and a causally generated image dataset to demonstrate the efficacy of our approach. We also perform image generation from unseen interventions, thereby verifying out of distribution generalization for the proposed causal model.

1. INTRODUCTION

Learning variables of interest and uncovering causal dependencies is crucial for intelligent systems to reason and predict in scenarios that differ from the training distribution. In the causality literature, causal variables and mechanisms are often assumed to be known. This knowledge enables reasoning and prediction under unseen interventions. In machine learning, however, one does not have direct access to the underlying variables of interest nor the causal structure and mechanisms corresponding to them. Rather, these have to be learned from observed low-level data like pixels of an image which are usually high-dimensional. Having a learned causal model can then be useful for generalizing to out-of-distribution data (Scherrer et al., 2022; Ke et al., 2021) , estimating the effect of interventions (Pearl, 2009; Schölkopf et al., 2021) , disentangling underlying factors of variation (Bengio et al., 2012; Wang and Jordan, 2021) , and transfer learning (Schoelkopf et al., 2012; Bengio et al., 2019) . Structure learning (Spirtes et al., 2000; Zheng et al., 2018) learns the structure and parameters of the Structural Causal Model (SCM) (Pearl, 2009) that best explains some observed high-level causal variables. In causal machine learning and representation learning, however, these causal variables may no longer be observable. This serves as the motivation for our work. We address the problem of learning the entire SCM -consisting its causal variables, structure and parameters -which is latent, by learning to generate observed low-level data. Since one often operates in low-data regimes or non-identifiable settings, we adopt a Bayesian formulation so as to quantify epistemic uncertainty over the learned latent SCM. Given a dataset, we use variational inference to learn a joint posterior over the causal variables, structure and parameters of the latent SCM. To the best of our knowledge, ours is the first work to address the problem of Bayesian causal discovery in linear Gaussian latent SCMs from low-level data, where causal variables are unobserved. Our contributions are as follows: • We propose a general algorithm for Bayesian causal discovery in the latent space of a generative model, learning a distribution over causal variables, structure and parameters in linear Gaussian latent SCMs with random, known interventions. Figure 1 illustrates an overview of the proposed method. • By learning the structure and parameters of a latent SCM, we implicitly induce a joint distribution over the causal variables. Hence, sampling from this distribution is equivalent to ancestral sampling through the latent SCM. As such, we address a challenging, simultane- A Structural Causal Model (SCM) is defined by a set of equations which represent the mechanisms by which each endogenous variable Z i depends on its direct causes Z G pa(i) and a corresponding exogenous noise variable ϵ i . The direct causes are subsets of other endogenous variables. If the causal parent assignment is assumed to be acyclic, then an SCM is associated with a Directed Acyclic Graph (DAG) G = (V, E), where V corresponds to the endogenous variables and E encodes direct cause-effect relationships. The exact value z i taken on by a causal variable Z i , is given by local causal mechanisms f i conditional on the values of its parents z G pa(i) , the parameters Θ i , and the node's noise variable ϵ i , as given in equation 1. For linear Gaussian additive noise SCMs with equal noise variance, i.e., the setting that we focus on in this work, all f i 's are linear functions, and Θ denotes the weighted adjacency matrix W , where each W ji is the edge weight from j → i. The linear Gaussian additive noise SCM thus reduces to equation 2, z i = f i (z G pa(i) , Θ, ϵ i ) , (1) z i = j∈pa G (i) W ji • z j + ϵ i .

2.2. CAUSAL DISCOVERY

Structure learning in prior work refers to learning a DAG according to some optimization criterion with or without the notion of causality (e.g., He et al. ( 2019)). The task of causal discovery on the other hand, is more specific in that it refers to learning the structure (also parameters, in some cases) of SCMs, and subscribes to causality and interventions like that of Pearl (2009) . That is, the methods aim to estimate (G, Θ). These approaches often resort to modular likelihood scores over causal variables -like the BGe score (Geiger and Heckerman, 1994; Kuipers et al., 2022) and BDe



Figure 1: Model architecture of the proposed generative model for the Bayesian latent causal discovery task to learn latent SCM from low-level data.

