ESTIMATING INDIVIDUAL TREATMENT EFFECTS UNDER UNOBSERVED CONFOUNDING USING BINARY INSTRU-MENTS

Abstract

Estimating conditional average treatment effects (CATEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where the treatment assignment is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating CATEs using binary IVs and thus yield an unbiased CATE estimator. Different from previous work for binary IVs, our framework estimates the CATE directly via a pseudo-outcome regression. (1) We provide a theoretical analysis where we show that our framework yields multiple robust convergence rates: our CATE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2) We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for CATE estimation, in the sense that it achieves a faster rate of convergence if the CATE is smoother than the individual outcome surfaces. (3) We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for CATE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-theart performance. To the best of our knowledge, our MRIV is the first multiply robust machine learning framework tailored to estimating CATEs in the binary IV setting.

1. INTRODUCTION

Conditional average treatment effects (CATEs) are relevant across many disciplines such as marketing (Varian, 2016) and personalized medicine (Yazdani & Boerwinkle, 2015) . Knowledge about CATEs provides insights into the heterogeneity of treatment effects, and thus helps in making potentially better treatment decisions (Frauen et al., 2023) . Many recent works that use machine learning to estimate causal effects, in particular CATEs, are based on the assumption of unconfoundedness (Alaa & van der Schaar, 2017; Lim et al., 2018; Melnychuk et al., 2022a; b) . In practice, however, this assumption is often violated because it is common that some confounders are not reported in the data. Typical examples are income or the socioeconomic status of patients, which are not stored in medical files. If the confounding is sufficiently strong, standard methods for estimating CATEs suffer from confounding bias (Pearl, 2009) , which may lead to inferior treatment decisions. To handle unobserved confounders, instrumental variables (IVs) can be leveraged to relax the assumption of unconfoundedness and still compute reliable CATE estimates. IV methods were originally developed in economics (Wright, 1928) , but, only recently, there is a growing interest in combining IV methods with machine learning (see Sec. 3). Importantly, IV methods outperform classical CATE estimators if a sufficient amount of confounding is not observed (Hartford et al., 2017) . We thus aim at estimating CATEs from observational data under unobserved confounding using IVs.

