DEPENDENCY STRUCTURE DISCOVERY FROM INTERVENTIONS

Abstract

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the extension and application of methods designed for observational data to include interventions is not straightforward and remains an open problem. In this paper we provide a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data. The proposed method is applicable even in the challenging and realistic case that the identity of the intervened upon variable is unknown. We examine the proposed method in the setting of graph recovery both de novo and from a partially-known edge set. We establish strong benchmark results on several structure learning tasks, including structure recovery of both synthetic graphs as well as standard graphs from the Bayesian Network Repository.

1. INTRODUCTION

Structure learning concerns itself with the recovery of the graph structure of Bayesian networks (BNs) from data samples. A natural application of Bayesian networks is to describe cause-effect relationships between variables. In that context, one may speak of causal structure learning. Causal structure learning is challenging because purely observational data may be satisfactorily explained by multiple Bayesian networks (a Markov equivalence class), but only one is the most robust to distributional shifts: The one with the correct graph. A more powerful tool than BNs is thus needed to model causal relationships. Structural Causal Models (SCMs) are that tool. An SCM over a set of random variables is a collection of assignments to these variables and a directed acyclic graph of dependencies between them (Peters et al., 2017, §6.2). Each assignment is a function of only the direct causes of a variable, plus an independent noise source. An SCM entails precisely one (observational) data distribution. Interventions on an SCM's assignments, such as setting a random variable to a fixed value (a hard intervention), entail new interventional data distributions (Peters et al., 2017, §6.3) . SCMs can be used to answer higher-order questions of cause-and-effect, up the ladder of causation (Pearl & Mackenzie, 2018) . Causal structure learning using SCMs has been attempted in several disciplines including biology (Sachs et al., 2005; Hill et al., 2016) , weather forecasting (Abramson et al., 1996) and medicine (Lauritzen & Spiegelhalter, 1988; Korb & Nicholson, 2010) . Causal structure is most frequently learned from data drawn from observational distributions. Structure learning methods generally cannot do more than identify the causal graph up to a Markov equivalence class (Spirtes et al., 2000) . In order to fully identify the true causal graph, a method must either make restrictive assumptions about the underlying data-generating process, such as linear but non-Gaussian data (Shimizu et al., 2006) , or must access enough data from outside the observational distribution (i.e., from interventions). Under certain assumptions about the number, diversity, and nature of the interventions, the true underlying causal graph is always identifiable, given that the method knows the intervention performed (Heckerman et al., 1995) . In much of the prior work on causal model induction it is assumed that

