ESTIMATING TREATMENT EFFECTS VIA ORTHOGONAL REGULARIZATION

Abstract

Decision-making often requires accurate estimation of causal effects from observational data. This is challenging as outcomes of alternative decisions are not observed and have to be estimated. Previous methods estimate outcomes based on unconfoundedness but neglect any constraints that unconfoundedness imposes on the outcomes. In this paper, we propose a novel regularization framework in which we formalize unconfoundedness as an orthogonality constraint. We provide theoretical guarantees that this yields an asymptotically normal estimator for the average causal effect. Compared to other estimators, its asymptotic variance is strictly smaller. Based on our regularization framework, we develop deep orthogonal networks for unconfounded treatments (DONUT) which learn outcomes that are orthogonal to the treatment assignment. Using a variety of benchmark datasets for causal inference, we demonstrate that DONUT outperforms the state-of-the-art substantially.

1. INTRODUCTION

Estimating the causal effect of an intervention (i. e., treatment effect) is integral for individual decision making in many domains such as marketing (Brodersen et al., 2015; Hatt & Feuerriegel, 2020) , economics (Heckman et al., 1997), and epidemiology (Robins et al., 2000) . For instance, in order to control an epidemic, it is relevant for public decision-makers to estimate the causal effect of school-closures (intervention) on the infection rate (outcome). The causal effect of an intervention can be estimated in two ways: randomized control trials (RCTs) and observational studies. RCTs are widely recognized as the gold standard for estimating causal effects, yet conducting RCTs is often infeasible (Robins et al., 2000) . For instance, randomly allocating different policy interventions during an epidemic might be unethical and impractical. Unlike RCTs, observational studies adopt observed data to infer causal effects. For this, covariates must be collected that contain all confounders (i. e., variables that affect both treatment and outcome). This is becoming increasingly common due to ease of access to rich data. In this paper, we estimate the average causal effect of a treatment from observational data. In order to estimate the causal effect of a treatment, the outcome of an alternative treatment has to be estimated. However, this is challenging, since we do not know what the outcome would have been if another treatment had been applied. Existing methods for estimating treatment effects use the treatment assignment as a feature and train regression models to estimate the outcomes (Funk et al., 2011; Kallus, 2017b) . Methods based on nearest neighbors and matching are adopted to find similar subjects (Ho et al., 2007; Crump et al., 2008; Kallus, 2017a; 2020) . Tree and forest-based methods (Wager & Athey, 2018) estimate the treatment effect at the leaf node and train many weak learners to build expressive ensemble models. Gaussian process-based methods provide uncertainty quantification (Alaa & van der Schaar, 2017; Ray & Szabo, 2019) . Weighting-based approaches re-weight the outcomes using weights based on covariate and treatment data (Kallus, 2018) . For instance, Fong et al. (2018); Yiu & Su (2018) seek weights such that the treatment assignment is unassociated with the covariates. However, they do not require the treatment assignment to be unassociated with the potential outcomes. Doubly robust methods combine a model for the outcomes and a model for the treatment propensity in a manner that is robust to misspecification (Funk et al., 2011; Benkeser et al., 2017; Chernozhukov et al., 2018) . Recently, deep learning has been successful for this task due to its strong predictive performance and ability to learn representations of the data 1

