TRUST YOUR ∇: GRADIENT-BASED INTERVENTION TARGETING FOR CAUSAL DISCOVERY

Abstract

Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.

Gradient-based Causal Discovery

Gradient-based Intervention Targeting (GIT) Score using

Intervention Acquisition

Figure 1 : Overview of GIT's usage in a gradient-based causal discovery framework. The framework infers a posterior distribution over graphs from observational and interventional data (denoted as D obs and Dint) through gradient-based optimization. The distribution over graphs and the gradient estimator ∇L(•) are then used by GIT in order to score the intervention targets based on the magnitude of the estimated gradients. The intervention target with the highest score is then selected, upon which the intervention is performed. New interventional data D new int are then collected and the procedure is repeated. Estimating causal structure from data, commonly known as causal discovery or causal structure learning, is central to the progress of science (Pearl, 2009) . Methods for causal discovery have been successfully deployed in various fields, such as biology (Sachs et al., 2005; Triantafillou et al., 2017; Glymour et al., 2019) , medicine (Shen et al., 2020; Castro et al., 2020; Wu et al., 2022) , earth system science (Ebert-Uphoff & Deng, 2012), or neuroscience (Sanchez-Romero et al., 2019) . In general, realworld systems can often be explained as a modular composition of smaller parts connected by causal relationships. Knowing the underlying structure is crucial for making robust predictions about the system after a perturbation (or treatment) is applied (Peters et al., 2016) . Moreover, such knowledge decompositions are shown to enable sampleefficient learning and fast adaptation to distribution shifts by only updating a subset of parameters (Bengio et al., 2019; Scherrer et al., 2022) . To identify a system's causal structure uniquely, observational data (i.e., obtained directly from the system, without interference) are, in general, insufficient and only allow recovery of the causal structure up to the Markov Equivalence Class (MEC) (Spirtes et al., 2000a; Peters et al., 2017) . Such a class contains multiple graphs that explain the observational data equally well. To overcome the

