MODEL-BASED CAUSAL BAYESIAN OPTIMIZATION

Abstract

How should we intervene on an unknown structural causal model to maximize a downstream variable of interest? This optimization of the output of a system of interconnected variables, also known as causal Bayesian optimization (CBO), has important applications in medicine, ecology, and manufacturing. Standard Bayesian optimization algorithms fail to effectively leverage the underlying causal structure. Existing CBO approaches assume noiseless measurements and do not come with guarantees. We propose model-based causal Bayesian optimization (MCBO), an algorithm that learns a full system model instead of only modeling intervention-reward pairs. MCBO propagates epistemic uncertainty about the causal mechanisms through the graph and trades off exploration and exploitation via the optimism principle. We bound its cumulative regret, and obtain the first non-asymptotic bounds for CBO. Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form, so we show how the reparameterization trick can be used to apply gradient-based optimizers. Empirically we find that MCBO compares favorably with existing state-of-the-art approaches.

1. INTRODUCTION

Many applications, such as drug and material discovery, robotics, agriculture, and automated machine learning, require optimizing an unknown function that is expensive to evaluate. Bayesian optimization (BO) is an efficient framework for sequential optimization of such objectives (Močkus, 1975) . The key idea in BO is to quantify uncertainty in the unknown function via a probabilistic model, and then use this to navigate a trade-off between selecting inputs where the function output is favourable (exploitation) and selecting inputs to learn more about the function in areas of uncertainty (exploration). While most standard BO methods focus on a black-box setup (Figure 1 a), in practice, we often have more structure on the unknown function that can be used to improve sample efficiency. In this paper, we exploit structural knowledge in the form of a causal graph specified by a directed acyclic graph (DAG). In particular, we assume that actions can be modeled as interventions on a structural causal model (SCM) (Pearl, 2009) that contains the reward (function output) as a variable (Figure 1 b ). While we assume the graph structure to be known, we consider the functional relations in the SCM as unknown. All variables in the SCM are observed along with the reward after each action. This Causal BO setting has important potential applications, such as optimizing medical and ecological interventions (Aglietti et al., 2020b) . For illustrative purposes, consider the example of an agronomist trying to find the optimal Nitrogen fertilizer schedule for maximizing crop yield, described in Figure 1 . There, the concentration of Nitrogen in the soil causes its concentration in the soil at the later timesteps. To exploit the causal graph structure for optimization, we propose model-based causal Bayesian optimization (MCBO). MCBO explicitly models the full SCM and the accompanying uncertainty of all SCM components. This allows our algorithm to select interventions based on an optimistic strategy similar to that used by the upper confidence bound algorithm (Srinivas et al., 2010) . We show that this strategy leads to the first CBO algorithm with a cumulative regret guarantee. For a practical algorithm, maximizing the upper confidence bound in our setting is computationally more difficult, because uncertainty in all system components must be propagated through the entire estimated SCM to the reward variable. We show that an application of the reparameterization trick allows MCBO to be practically implemented with common gradient-based optimizers. Empirically, The DAG corresponding to our stylised agronomy example, where we aim to maximize crop yield Y . CBO takes this DAG as input for designing actions. X 0 is an unmodifiable starting property of the soil, and X 1 . . . X 3 are the measured amounts of Nitrogen in the soil at different timesteps. Each observation is modelled with its own Gaussian process. a 1 . . . a 3 are possible interventions involving adding Nitrogen fertilizer to the soil. a 2 a 1 a 0 Y (a) Causal Bayesian Optimization a 1 a 2 a 3 X 0 X 1 X 2 X 3 Y (b) MCBO achieves competitive performance on existing CBO benchmarks and a related setting called function network BO (Astudillo & Frazier, 2021b) .

Contributions

• We introduce MCBO, a model-based algorithm for causal Bayesian optimization than can be applied with very generic classes of interventions. • Using MCBO we prove the first sublinear cumulative regret bound for CBO. We show how the bound scales depending on the graph structure. We demonstrate that CBO can lead to a potentially exponential improvement in cumulative regret, with respect to the number of actions, compared to standard BO. • By an application of the reparameterization technique, we show how our algorithm can be efficiently implemented with popular gradient-based optimizers. • We evaluate MCBO on existing CBO benchmarks and the related setting of function network BO. Our results show that MCBO performs favorably compared to methods designed specifically for these tasks.

2. BACKGROUND AND PROBLEM STATEMENT

We consider the problem of an agent interacting with an SCM for T rounds in order to maximize the value of a particular target variable. We start with introducing SCMs and the kinds of interventions an agent can perform on an SCM. In the following, we denote with [m] the set of integers {0, . . . , m}. Structural Causal Models An SCM is described by a tuple ⟨G, Y, X, F , Ω⟩ of the following elements: G is a known DAG; Y is the reward variable of interest; X = {X i } m-1 i=0 is a set of observed random variables; the set F = {f i } m i=0 defines the functional relations between these variables; and Ω = {Ω i } m i=0 is a set of independent noise variables with zero-mean and known distribution. We use the notation Y and X m interchangeably and assume the elements of X are topologically ordered, i.e., X 0 is a root and X m is a leaf. We use the notation pa i ⊂ {0, . . . , m} for the indices of the parents of the ith node, and Z i = {X j } j∈pai for the parents of the ith node. We sometimes use X i to refer to both the ith node and the ith random variable.



Figure 1: A visual comparison between the modelling assumptions of BO vs CBO. Circular nodes represent observed variables, squares represent action inputs and Y is the reward. Algorithms select a before observing X and Y . (a) In standard BO, the DAG has the structure shown regardless of the problem structure. (b)The DAG corresponding to our stylised agronomy example, where we aim to maximize crop yield Y . CBO takes this DAG as input for designing actions. X 0 is an unmodifiable starting property of the soil, and X 1 . . . X 3 are the measured amounts of Nitrogen in the soil at different timesteps. Each observation is modelled with its own Gaussian process. a 1 . . . a 3 are possible interventions involving adding Nitrogen fertilizer to the soil.

