ESTIMATING INDIVIDUAL TREATMENT EFFECTS UNDER UNOBSERVED CONFOUNDING USING BINARY INSTRU-MENTS

Abstract

Estimating conditional average treatment effects (CATEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where the treatment assignment is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating CATEs using binary IVs and thus yield an unbiased CATE estimator. Different from previous work for binary IVs, our framework estimates the CATE directly via a pseudo-outcome regression. (1) We provide a theoretical analysis where we show that our framework yields multiple robust convergence rates: our CATE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2) We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for CATE estimation, in the sense that it achieves a faster rate of convergence if the CATE is smoother than the individual outcome surfaces. (3) We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for CATE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-theart performance. To the best of our knowledge, our MRIV is the first multiply robust machine learning framework tailored to estimating CATEs in the binary IV setting.

1. INTRODUCTION

Conditional average treatment effects (CATEs) are relevant across many disciplines such as marketing (Varian, 2016) and personalized medicine (Yazdani & Boerwinkle, 2015) . Knowledge about CATEs provides insights into the heterogeneity of treatment effects, and thus helps in making potentially better treatment decisions (Frauen et al., 2023) . Many recent works that use machine learning to estimate causal effects, in particular CATEs, are based on the assumption of unconfoundedness (Alaa & van der Schaar, 2017; Lim et al., 2018; Melnychuk et al., 2022a; b) . In practice, however, this assumption is often violated because it is common that some confounders are not reported in the data. Typical examples are income or the socioeconomic status of patients, which are not stored in medical files. If the confounding is sufficiently strong, standard methods for estimating CATEs suffer from confounding bias (Pearl, 2009) , which may lead to inferior treatment decisions. To handle unobserved confounders, instrumental variables (IVs) can be leveraged to relax the assumption of unconfoundedness and still compute reliable CATE estimates. IV methods were originally developed in economics (Wright, 1928) , but, only recently, there is a growing interest in combining IV methods with machine learning (see Sec. 3). Importantly, IV methods outperform classical CATE estimators if a sufficient amount of confounding is not observed (Hartford et al., 2017) . We thus aim at estimating CATEs from observational data under unobserved confounding using IVs. In this paper, we consider the setting where a single binary instrument is available. This setting is widespread in personalized medicine (and other applications such as marketing or public policy) (Bloom et al., 1997) . In fact, the setting is encountered in essentially all observational or randomized studies with observed non-compliance (Imbens & Angrist, 1994) . As an example, consider a randomized controlled trial (RCT), where treatments are randomly assigned to patients and their outcomes are observed. Due to some potentially unobserved confounders (e.g., income, education), some patients refuse to take the treatment initially assigned to them. Here, the treatment assignment serves as a binary IV. Moreover, such RCTs have been widely used by public decision-makers, e.g., to analyze the effect of health insurance on health outcome (see the so-called Oregon health insurance experiment) (Finkelstein et al., 2012) or the effect of military service on lifetime earnings (Angrist, 1990) . We propose a novel machine learning framework (called MRIV) for estimating CATEs using binary IVs. Our framework takes an initial CATE estimator and nuisance parameter estimators as input to perform a pseudo-outcome regression. Different to existing literature, our framework is multiply robustfoot_0 , i.e., we show that it is consistent in the union of three different model specifications. This is different from existing methods for CATE estimation using IVs such as Okui et al. ( 2012), Syrgkanis et al. ( 2019), or plug-in estimators (Bargagli-Stoffi et al., 2021; Imbens & Angrist, 1994) . We provide a theoretical analysis, where we use tools from Kennedy (2022) to show that our framework achieves a multiply robust convergence rate, i.e., our MRIV converges with a fast rate even if several nuisance parameters converge slowly. We further show that, compared to existing plug-in IV methods, the performance of our framework is asymptotically superior. Finally, we leverage our framework and, on top of it, build a tailored deep neural network called MRIV-Net.

Contributions:

(1) We propose a novel, multiply robust machine learning framework (called MRIV) to learn the CATE using the binary IV setting. To the best of our knowledge, ours is the first that is shown to be multiply robust, i.e., consistent in the union of three model specifications. For comparison, existing works for CATE estimation only show double robustness (Wang & Tchetgen Tchetgen, 2018; Syrgkanis et al., 2019) . ( 2) We prove that MRIV achieves a multiply robust convergence rate. This is different to methods for IV settings which do not provide robust convergence rates (Syrgkanis et al., 2019) . We further show that our MRIV is asymptotically superior to existing plug-in estimators. (3) We propose a tailored deep neural network, called MRIV-Net, which builds upon our framework to estimate CATEs . We demonstrate that MRIV-Net achieves state-of-the-art performance.

2. PROBLEM SETUP

Data generating process: We observe data D = (x i , z i , a i , y i ) n i=1 consisting of n ∈ N observations of the tuple (X, Z, A, Y ). Here, X ∈ X are observed confounders, Z ∈ {0, 1} is a binary instrument, A ∈ {0, 1} is a binary treatment, and Y ∈ R is an outcome of interest. Furthermore, we assume the existence of unobserved confounders U ∈ U, which affect both the treatment A and the outcome Y . Figure 1 : Underlying causal graph. The instrument Z has a direct influence on the treatment A, but does not have a direct effect on the outcome Y . Note that we allow for unobserved confounders for both Z-A (dashed line) and A-Y (given by U ). The causal graph is shown in Fig. 1 . Applicability: Our proposed framework is widely applicable in practice, namely to all settings with the above data generating process. This includes both (1) observational data and (2) RCTs with non-compliance. For (1), observational data is commonly encountered in, e.g., personalized medicine. Here, modeling treatments as binary variables is consistent with previous literature on causal effect estimation and standard in medical practice (Robins et al., 2000) . For (2), our setting is further encountered in RCTs when the instrument Z is a randomized treatment assignment but individuals do not comply with their treatment assignment. Such RCTs have been extensively used by public decision-makers, e.g., to analyze the effect of health insurance on health outcome (Finkelstein et al., 2012) or the effect of military service on lifetime earnings (Angrist, 1990) . We build upon the potential outcomes framework (Rubin, 1974) for modeling causal effects. Let Y (a, z) denote the potential outcome that



For a detailed introduction to multiple robustness and its importance in treatment effect estimation, we refer to(Wang & Tchetgen Tchetgen, 2018), Section 4.5.

