NORMALIZING FLOWS FOR INTERVENTIONAL DENSITY ESTIMATION

Abstract

Existing machine learning methods for causal inference usually estimate quantities expressed via the mean of potential outcomes (e.g., average treatment effect). However, such quantities do not capture the full information about the distribution of potential outcomes. In this work, we estimate the density of potential outcomes after interventions from observational data. For this, we propose a novel, fully-parametric deep learning method called Interventional Normalizing Flows. Specifically, we combine two normalizing flows, namely (i) a teacher flow for estimating nuisance parameters and (ii) a student flow for a parametric estimation of the density of potential outcomes. We further develop a tractable optimization objective based on a one-step bias correction for an efficient and doubly robust estimation of the student flow parameters. As a result our Interventional Normalizing Flows offer a properly normalized density estimator. Across various experiments, we demonstrate that our Interventional Normalizing Flows are expressive and highly effective, and scale well with both sample size and high-dimensional confounding. To the best of our knowledge, our Interventional Normalizing Flows are the first fully-parametric, deep learning method for density estimation of potential outcomes.

1. INTRODUCTION

Causal inference increasingly makes use of machine learning methods to estimate treatment effects from observational data (e.g., van der Laan et al., 2011; Künzel et al., 2019; Curth & van der Schaar, 2021; Kennedy, 2022) . This is relevant for various fields including medicine (e.g., Bica et al., 2021 ), marketing (e.g., Yang et al., 2020 ), and policy-making (e.g., Hünermund et al., 2021) . Here, causal inference from observational data promises great value, especially when experiments for determining treatment effects are costly or even unethical. The vast majority of the machine learning methods for causal inference estimate averaged quantities expressed by the (conditional) mean of potential outcomes. Examples of such quantities are the average treatment effect (ATE) (e.g., Shi et al., 2019; Hatt & Feuerriegel, 2021) , the individual treatment effect (ITE) (e.g., Shalit et al., 2017; Hassanpour & Greiner, 2019; Zhang et al., 2020) , and treatment-response curves (e.g., Bica et al., 2020; Nie et al., 2021) . Importantly, these estimates only describe averages without distributional properties. However, making decisions based on averaged causal quantities can be misleading and, in some applications, even dangerous (Spiegelhalter, 2017; van der Bles et al., 2019) . On the one hand, if potential outcomes have different variances or number of modes, relying on the average quantities provides incomplete information about potential outcomes, and may inadvertently lead to local -and not global -optima during decision-making. On the other hand, distributional knowledge is needed to account for uncertainty in potential outcomes, and thus informs how likely a certain outcome is. For example, in medicine, knowing the distribution of potential outcomes is highly important (Gische & Voelkle, 2021): it gives the probability that the potential outcome lies in a desired range, and thus defines the probability of treatment success or failure. Motivated by this, we aim to estimate the density of potential outcomes. An example highlighting the need for estimating the density of potential outcomes is shown in Fig. 1 . Here, we simulated outcomes according to a given structural causal model (SCM). The potential outcomes Y [a] can be sampled by setting the treatment to specific value in the equation for A (cf.

