DEPENDENCY STRUCTURE DISCOVERY FROM INTERVENTIONS

Abstract

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the extension and application of methods designed for observational data to include interventions is not straightforward and remains an open problem. In this paper we provide a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data. The proposed method is applicable even in the challenging and realistic case that the identity of the intervened upon variable is unknown. We examine the proposed method in the setting of graph recovery both de novo and from a partially-known edge set. We establish strong benchmark results on several structure learning tasks, including structure recovery of both synthetic graphs as well as standard graphs from the Bayesian Network Repository.

1. INTRODUCTION

Structure learning concerns itself with the recovery of the graph structure of Bayesian networks (BNs) from data samples. A natural application of Bayesian networks is to describe cause-effect relationships between variables. In that context, one may speak of causal structure learning. Causal structure learning is challenging because purely observational data may be satisfactorily explained by multiple Bayesian networks (a Markov equivalence class), but only one is the most robust to distributional shifts: The one with the correct graph. A more powerful tool than BNs is thus needed to model causal relationships. Structural Causal Models (SCMs) are that tool. An SCM over a set of random variables is a collection of assignments to these variables and a directed acyclic graph of dependencies between them (Peters et al., 2017, §6.2) . Each assignment is a function of only the direct causes of a variable, plus an independent noise source. An SCM entails precisely one (observational) data distribution. Interventions on an SCM's assignments, such as setting a random variable to a fixed value (a hard intervention), entail new interventional data distributions (Peters et al., 2017, §6.3) . SCMs can be used to answer higher-order questions of cause-and-effect, up the ladder of causation (Pearl & Mackenzie, 2018) . Causal structure learning using SCMs has been attempted in several disciplines including biology (Sachs et al., 2005; Hill et al., 2016) , weather forecasting (Abramson et al., 1996) and medicine (Lauritzen & Spiegelhalter, 1988; Korb & Nicholson, 2010) . Causal structure is most frequently learned from data drawn from observational distributions. Structure learning methods generally cannot do more than identify the causal graph up to a Markov equivalence class (Spirtes et al., 2000) . In order to fully identify the true causal graph, a method must either make restrictive assumptions about the underlying data-generating process, such as linear but non-Gaussian data (Shimizu et al., 2006) , or must access enough data from outside the observational distribution (i.e., from interventions). Under certain assumptions about the number, diversity, and nature of the interventions, the true underlying causal graph is always identifiable, given that the method knows the intervention performed (Heckerman et al., 1995) . In much of the prior work on causal model induction it is assumed that there is an experimenter and this experimenter performs interventions. However, in the real world, interventions can also be performed by other agents, which could lead to unknown interventions (interventions with unknown target variables). A few works have attempted to learn structures from unknown-intervention data (Eaton & Murphy, 2007a; Squires et al., 2020; Huang et al., 2020) . A notable such work, (Mooij et al., 2016) , has been extended in (Kocaoglu et al., 2019; Jaber et al., 2020) . Although there is no theoretical guarantee that the true causal graph can be identified in that setting, evidence so far points to that still being the case. Another common setting is when the graph structure is partially provided, but must be completed. An example is protein structure learning in biology, where we may have definitive knowledge of some causal edges in the protein-protein interactome, but the remaining causal edges must be discovered. We will call this setting "partial graph completion". This is an easier task compared to learning the entire graph, since it limits the number of edges that have to be learned. Figure 1 : In many areas of science, such as biology, we try to infer the underlying mechanisms and structure through experiments. We can obtain observational data plus interventional data through known (e.g. by targeting a certain variable) or unknown interventions (e.g. when it is unclear where the effect of the intervention will be). Knowledge of existing edges e.g. through previous experiments can likewise be included and be considered a special case of causal induction. Recently, a flurry of work on structure learning using continuous optimization methods has appeared (Zheng et al., 2018; Yu et al., 2019) . These methods operate on observational data and are competitive with other methods. Because of the theoretical limitations on identification from purely observational data cited above, it would be interesting to extend these methods to interventional data. However, it is not straightforward to apply continuous optimization methods to structure learning from interventional data. Our key contributions are to answer the following questions experimentally: 1. Can the proposed model recover true causal structure? Yes, see Figure §4 . 2. How does the proposed model compare against state of the art causal methods on real-world datasets? Favourably; see §5.4 and Table §1 . 

2. PRELIMINARIES

Causal modeling. A Structural Causal Model (SCM) (Peters et al., 2017) over a finite number M of random variables X i is a set of structural assignments X i := f i (X pa(i,C) , N i ) , ∀i ∈ {0, . . . , M -1} Identifiability. In a purely-observational setting, it is known that causal graphs can be distinguished only up to a Markov equivalence class. In order to identify the true causal graph structure, interventional data is needed (Eberhardt et al., 2012) . Interventions. There are several types of common interventions which may be available (Eaton & Murphy, 2007b) . These are: No intervention: only observational data is obtained from the ground truth model. Hard/perfect: the value of a single or several variables is fixed and then ancestral sampling is performed on the other variables. Soft/imperfect: the conditional distribution of the variable on which the intervention is performed is changed. Uncertain: the learner is not sure of which variable exactly the intervention affected directly. Here we make use of soft intervention because they include hard intervention as a limiting case and hence are more general. Structure discovery using continuous optimization. Structure discovery is a super-exponential search problem that searches though all possible directed acyclic graphs (DAGs). Previous continuousoptimization structure learning works (Zheng et al., 2018; Yu et al., 2019; Lachapelle et al., 2019) mitigate the problem of searching in the super-exponential set of graph structures by considering the degree to which a hypothesis graph violates "DAG-ness" as an additional penalty to be optimized. If there are M such variables, the strategy of considering all the possible structural graphs as separate hypotheses is not feasible because it would require maintaining O(2 M 2 ) models of the data. 2



3. Does a proposed model generalize well to unseen interventions? Yes, see §5.5. 4. How does the proposed model perform on partial graph recovery? It scales to ∼ 50 variables while the other baselines can't. see §5.7.

