CAUSAL REASONING IN THE PRESENCE OF LATENT CONFOUNDERS VIA NEURAL ADMG LEARNING

Abstract

Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using AD-MGs are based on either linear functional assumptions or a discrete search that is complicated to use and lacks computational tractability for large datasets. In this work, we further extend the existing body of work and develop a novel gradientbased approach to learning an ADMG with non-linear functional relations from observational data. We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with non-linear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows for ADMG learning. This not only enables us to determine complex causal structural relationships behind the data in the presence of latent confounding, but also estimate their functional relationships (hence treatment effects) simultaneously. We further validate our approach via experiments on both synthetic and real-world datasets, and demonstrate the competitive performance against relevant baselines.

1. INTRODUCTION

Learning causal relationships and estimating treatment effects from observational studies is a fundamental problem in causal machine learning, and has important applications in many areas of social and natural sciences (Pearl, 2010; Spirtes, 2010) . They enable us to answer questions in causal nature; for example, what is the effect on the expected lifespan of a patient if I increase the dose of X drug? However, many existing methods of causal discovery and inference overwhelmingly rely on the assumption that all necessary information is available. This assumption is often untenable in practice. Indeed, an important, yet often overlooked, form of causal relationships is that of latent confounding; that is, when two variables have an unobserved common cause (Verma & Pearl, 1990) . If not properly accounted for, the presence of latent confounding can lead to incorrect evaluation of causal quantities of interest (Pearl, 2009) . Traditional causal discovery methods that account for the presence of latent confoundings, such as the fast causal inference algorithm (FCI) (Spirtes et al., 2000) and its extensions (Colombo et al., 2012; Claassen et al., 2013; Chen et al., 2021) , rely on uncovering an equivalence class of acyclic directed mixed graphs (ADMGs) that share the same conditional independencies. Without additional assumptions, however, these methods might return uninformative results as they cannot distinguish between members of the same Markov equivalence class (Bellot & van der Schaar, 2021) . More recently, causal discovery methods based on structural causal models (SCMs) (Pearl, 1998) have been developed for latent confounding (Nowzohour et al., 2017; Wang & Drton, 2020; Maeda & Shimizu, 2020; 2021; Bhattacharya et al., 2021) . By assuming that the causal effects follow specific functional forms, they have the advantage of being able to distinguish between members of the same Markov equivalence class (Glymour et al., 2019 ). Yet, existing approaches either rely on restrictive linear functional assumptions (Bhattacharya et al., 2021; Maeda & Shimizu, 2020; Bellot & van der Schaar, 2021) , and/or discrete search over the discrete space of causal graphs (Maeda & Shimizu, 2021) that are computationally burdensome and unintuitive to use. As a result, modeling non-linear causal relationships between variables in the presence of latent confounders in a scalable way remains an outstanding task. In this work, we seek to utilize recent advances in differentiable causal discovery (Zheng et al., 2018; Bhattacharya et al., 2021) and neural causal models (Lachapelle et al., 2019; Morales-Alvarez et al., 2022; Geffner et al., 2022) to overcome these limitations. Our core contribution is to extend the framework of differentiable ADMG discovery for linear models (Bhattacharya et al., 2021) to non-linear cases using neural causal models. This enables us to build scalable and flexible methods capable of discovering non-linear, potentially confounded relationships between variables and perform subsequent causal inference. Specifically, our contributions include: 1. Sufficient conditions for ADMG identifiability with non-linear SCMs (Section 4). We assume: i) the functional relationship follows non-linear additive noise SCM; ii) the effect of observed and latent variables do not modulate each other, and iii) all latent variables confound a pair of non-adjacent observed nodes. Under these assumptions, the underlying ground truth ADMG causal graph is identifiable. This serves as a foundation for designing ADMG identification algorithms for flexible, non-linear SCMs based on deep generative models. 2. A novel gradient-based framework for learning ADMGs from observational data (Section 5). Based on our theoretical results, we further propose Neural ADMG Learning (N-ADMG), a neural autoregressive-flow-based model capable of learning complex non-linear causal relationships with latent confounding. N-ADMG utilizes variational inference to approximate posteriors over causal graphs and latent variables, whilst simultaneously learning the model parameters via gradient-based optimization. This is more efficient and accurate than discrete search methods, allowing us to replace task-specific search procedures with general purpose optimizers. 3. Empirical evaluation on synthetic and real-world datasets (Section 6). We evaluate N-ADMG on a variety of synthetic and real-world datasets, comparing performance with a number of existing causal discovery and inference algorithms. We find that N-ADMG provides competitive or state-of-the-art results on a range of causal reasoning tasks.

2. RELATED WORK

Causal discovery with latent confounding. Constraint-based causal discovery methods in the presence of latent confounding have been well-studied (Spirtes et al., 2000; Zhang, 2008; Colombo et al., 2012; Claassen et al., 2013; Chen et al., 2021) . Without further assumptions, these approaches can only identify a Markov equivalence class of causal structures (Spirtes et al., 2000) . When certain assumptions are made on the data generating process in the form of SCMs (Pearl, 1998) , additional constraints can help identify the true causal structure. In the most general case, additional nonparametric constraints have been identified (Verma & Pearl, 1990; Shpitser et al., 2014; Evans, 2016) . Further refinement can be made through the assumption of stricter SCMs. For example, in the linear Gaussian additive noise model (ANM) case, Nowzohour et al. ( 2017) proposes a scorebased approach for finding an equivalent class of bow-free acyclic path diagrams. Both Maeda & Shimizu (2020) and Wang & Drton (2020) develop Independence tests based approach for linear non-Gaussian ANM case, with Maeda & Shimizu (2021) extending this to more general cases. Differentiable characterization of causal discovery. All aforementioned approaches employ a search over a discrete space of causal structures, which often requires task-specific search procedures, and imposes a computational burden for large-scale problems. More recently, (Zheng et al., 2018) proposed a differentiable constraint on directed acyclic graphs (DAG), and frames the graph structure learning problem as a differentiable constrained optimization task in the absence of latent confounders. This is further generalized to the latent confounding case (Bhattacharya et al., 2021) through differentiable algebraic constraints that characterize the space of ADMGs. Nonetheless, this work is limited in that it only considers linear Gaussian ANMs.

