TARGETED VAE: STRUCTURED INFERENCE AND TAR-GETED LEARNING FOR CAUSAL PARAMETER ESTIMA-TION

Abstract

Undertaking causal inference with observational data is extremely useful across a wide range of domains including the development of medical treatments, advertisements and marketing, and policy making. There are two main challenges associated with undertaking causal inference using observational data: treatment assignment heterogeneity (i.e., differences between the treated and untreated groups), and an absence of counterfactual data (i.e. not knowing what would have happened if an individual who did get treatment, were instead to have not been treated). We address these two challenges by combining structured inference and targeted learning. To our knowledge, Targeted Variational AutoEncoder (TVAE) is the first method to incorporate targeted learning into deep latent variable models. Results demonstrate competitive and state of the art performance.

1. INTRODUCTION

The estimation of the causal effects of interventions or treatments on outcomes is of the upmost importance across a range of decision making processes and scientific endeavours, such as policy making (Kreif & DiazOrdaz, 2019 ), advertisement (Bottou et al., 2013) , the development of medical treatments (Petersen et al., 2017) , the evaluation of evidence within legal frameworks (Pearl, 2009; Siegerink et al., 2016) and social science (Vowels, 2020; Hernan, 2018; Grosz et al., 2020) . Despite the common preference for Randomized Controlled Trial (RCT) data over observational data, this preference is not always justified. Besides the lower cost and fewer ethical concerns, observational data may provide a number of statistical advantages including greater statistical power and increased generalizability (Deaton & Cartwright, 2018) . However, there are two main challenges when dealing with observational data. Firstly, the group that receives treatment is usually not equivalent to the group that does not (treatment assignment heterogeneity), resulting in selection bias and confounding due to associated covariates. For example, young people may prefer surgery, older people may prefer medication. Secondly, we are unable to directly estimate the causal effect of treatment, because only the factual outcome for a given treatment assignment is available. In other words, we do not have the counterfactual associated with the outcome for a different treatment assignment to that which was given. Treatment effect inference with observational data is concerned with finding ways to estimate the causal effect by considering the expected differences between factual and counterfactual outcomes. We seek to address the two challenges by proposing a method that incorporates targeted learning techniques into a disentangled variational latent model, trained according to the approximate maximum likelihood paradigm. Doing so enables us to estimate the expected treatment effects, as well as individual-level treatment effects. Estimating the latter is especially important for treatments that interact with patient attributes, whilst also being crucial for enabling individualized treatment assignment. Thus, we propose the Targeted Variational AutoEncoder (TVAE), undertake an ablation study, and compare our method's performance against current alternatives on two benchmark datasets. 1

