SCORE-BASED CAUSAL DISCOVERY FROM HETEROGENEOUS DATA

Abstract

Causal discovery has witnessed significant progress over the past decades. Most algorithms in causal discovery consider a single domain with a fixed distribution. However, it is commonplace to encounter heterogeneous data (data from different domains with distribution shifts). Applying existing methods on such heterogeneous data may lead to spurious edges or incorrect directions in the learned graph. In this paper, we develop a novel score-based approach for causal discovery from heterogeneous data. Specifically, we propose a Multiple-Domain Score Search (MDSS) algorithm, which is guaranteed to find the correct graph skeleton asymptotically. Furthermore, benefiting from distribution shifts, MDSS enables the detection of more causal directions than previous algorithms designed for single domain data. The proposed MDSS can be readily incorporated into off-the-shelf search strategies, such as the greedy search and the policy-gradient-based search. Theoretical analyses and extensive experiments on both synthetic and real data demonstrate the efficacy of our method.

1. INTRODUCTION

Discovering causal relations among variables is a fundamental problem in various fields such as economics, biology, drug testing, and commercial decision making. Because conducting randomized controlled trials is usually expensive or even infeasible, discovering causal relations from observational data, i.e. causal discovery (Pearl, 2000; Spirtes et al., 2000) , has received much attention over the past few decades. Early causal discovery algorithms can be roughly categorized into two types: constraint-based ones (e.g. PC (Spirtes et al., 2000) ) and score-based ones (e.g. GES (Chickering, 2002) ). In general, these methods cannot uniquely identify the causal graph but are guaranteed to output a Markov equivalence class. Since the seminal work by Shimizu et al. (2006) , several methods have been developed, achieving identifiability of the whole causal structure by making use of constrained Functional Causal Models (FCMs), including the linear non-Gaussian model (Shimizu et al., 2006) , the nonlinear additive noise model (Hoyer et al., 2009) , and the post-nonlinear model (Zhang & Hyvärinen, 2009) . Recently, Zheng et al. (2018) proposed a score-based method that formulates the causal discovery problem as continuous optimization with a structural constraint that ensures acyclicity. Based on the continuous structural constraint, several researchers further proposed to model the causal relations by neural networks (NNs) (Lachapelle et al., 2019; Yu et al., 2019; Zheng et al., 2019) . Another recent work Zhu & Chen (2019) used reinforcement learning (RL) for causal discovery, where the RL agent searches over the graph space and outputs a graph that fits the data best. The above approaches are designed for data from a single domain with a fixed causal model, with the limitation that many of the edge directions cannot be determined without strong functional constraints. In addition, the sample size of data from one domain is usually not large enough to guarantee small statistical estimation errors. One way to improve statistical reliability is to combine datasets from multiple domains, such as P-value meta-analyses (Lee, 2015; Marot et al., 2009) . The idea of combining multiple-domain data is commonly seen in learning with mixture of Bayesion networks (Thiesson et al., 1998) . While mixture of Bayesion networks are usually used for density estimation, the purpose of causal analysis from multiple-domain data is completely different, it aims at discovering the underlying causal graphs for all domains. Regarding causal analysis from multiple-domain data, a challenge is the data heterogeneity problem: the data distribution may vary across domains. For example, in fMRI hippocampus signal analysis, the connection strength among different brain regions may change across different subjects (domains). Due to the distribution shift, directly pooling the data from multiple domains may lead to spurious edges. To tackle the issue, different ways have been investigated, including using sliding windows (Calhoun et al., 2014) , online change point detection (Adams & MacKay, 2007) , online undirected graph learning (Talih & Hengartner, 2005) , locally stationary structure tracker (Kummerfeld & Danks, 2013) , and regime aware learning (Bendtsen, 2016) . However, these methods may suffer from high estimation variance due to sample scarcity, large type II errors, and a large number of statistical tests. Huang et al. (2015) recovers causal relations with changing modules by making use of certain types of smoothness of the change, while it does not explicitly locate the changing causal modules. Other similar methods, including Xing et al. (2010); Song et al. (2009) , can be reduced to online parameter learning because the causal directions are given. By utilizing the invariance property (Hoover, 1990; Tian & Pearl, 2001; Peters et al., 2016) and the more general independent change mechanism (Pearl, 2000) , recently, Ghassami et al. ( 2018) developed two methods: identical boundaries (IB) and minimal changes (MC), for causal discovery from multi-domain data. However, the proposed methods 1) assume causal sufficiency (i.e., all common causes of variables are measured), which is usually not held in real circumstances, 2) are designed for linear systems only, 3) and are not capable of identifying causal directions from more than ten domains. Huang et al. ( 2019) proposed a more general approach called CD-NOD for both linear and nonlinear heterogeneous data, by extending the PC algorithm to tackle the heterogeneity issue. However, inheriting the drawbacks of constraint-based methods, CD-NOD involves a multiple testing problem and is time-consuming due to large number of independence tests. To overcome the limitations of existing works, we propose a Multiple-Domain Score Search (MDSS) method for causal discovery from heterogeneous data, which enjoys the following properties. (1) To avoid spurious edges when combing multi-domain data, MDSS searches over the space of augmented graphs, which includes an additional domain index as a surrogate variable to characterize the distribution shift. ( 2) The changing causal modules can be immediately identified from the recovered augmented graph. (3) Benefiting from causal invariance and the independent change mechanism, MDSS uses a novel Multiple-Domain Score (MDS) to help identify more causal directions beyond those in the Markov equivalence class from distribution-shifted data. (4) MDSS can be readily incorporated into off-the-shelf search strategies and is time-efficient and applicable to both linear and nonlinear data. ( 5) Theoretically, we show that MDSS is guaranteed to find the correct graph skeleton asymptotically, and further identify more causal directions than other traditional score-based and constraint-based algorithms. Empirical studies on both synthetic and real data prove the efficacy of our method.

2. METHODOLOGY

In this section, we start from a brief introduction to causal discovery and distribution shifts (Section 2.1), and then in Section 2.2 and 2.3, we introduce our proposed Multiple-Domain Score Search (MDSS). In Section 2.2, MDSS starts with a predefined graph search algorithm to learn the skeleton of the causal graph, with the linear Bayesian information criterion (BIC) score or nonlinear generalized score (GS (Huang et al., 2018) ) on the augmented causal system. Then in Section 2.3, MDSS further identifies causal directions with Multiple-Domain Score (MDS) based on the identified skeleton of the graph from Section 2.2. Both theoretically and empirically, we show that MDSS can identify more directions compared to algorithms that are designed for i.i.d. or stationary data.

2.1. BACKGROUND IN CAUSAL DISCOVERY AND DISTRIBUTION SHIFTS

The basic causal discovery problem can be formulated as follows: Suppose there are d observable random variables, i.e. V = (V 1 , ..., V d ). Each random variable satisfies the following generating process: V i = f i P A i , , where f i is a function to model the causal relation between V i and its parents P A i , and i is a noise variable with non-zero variance. All the noise variables are independent of each other. The task of causal discovery is to recover the causal adjacency matrix B given the observed data matrix X ∈ R T ×d , where B ij = 1 indicates that V i is a parent of V j , and T is the sample size. We denote the underlying causal graph over V as G 0 . For each V i , we call P (V i |P A i ) its causal module. For a single domain, the joint probability can be factorized as P (V) = d i=1 P (V i |P A i ). Suppose there are n domains with distribution shifts (i.e. P (V) changes across domains), which implies that some causal modules change across domains. The changes may be caused by the variation of functional models, causal strength, or noise variance. Furthermore, we have the following assumptions. Assumption 1. The changes of causal modules can be represented as functions of domain index C, denoted by g(C),

