TREATMENT EFFECT ESTIMATION WITH COLLIDER BIAS AND CONFOUNDING BIAS Anonymous

Abstract

To answer causal questions from observational data, it is important to consider the mechanisms that determine which data values are observed and which are missing. Prior work has considered the treatment assignment mechanism and proposed methods to remove the confounding bias from the common causes of treatment and outcome. However, there are other issues in sample selection, commonly overlooked in prior work, that can bias the treatment effect estimation, such as the issue of censored outcome as a form of collider bias. In this paper, we propose the novel Selection Controlled CounterFactual Regression (SC-CFR) to simultaneously address confounding and collider bias. Specifically, we first calculate the magnitude of the collider bias of different instances by estimating the selection model and then add a control term to remove the collider bias while learning a balanced representation to remove the confounding bias when estimating the outcome model. Our theoretical analysis shows that we can achieve an unbiased treatment effect estimates from observational data with confounding and collider bias under certain assumptions. Extensive empirical results on both synthetic and real-world datasets show that our method consistently outperforms benchmarks on treatment effect estimation when both types of biases exist.

1. INTRODUCTION

Causal inference is a powerful statistical modeling tool for explanatory analysis and a central problem in causal inference is the estimation of treatment effect. The gold standard approach for treatment effect estimation is to conduct Randomized Controlled Trials (RCTs), but RCTs can be expensive (Kohavi & Longbotham, 2011) and sometimes infeasible (Bottou et al., 2013) . Therefore, it is important to develop effective approaches to estimate treatment effect from observational data. In observational studies, association does not imply causation, mainly due to the presence of (sample selection) biases in the data. There are two main sources of biases: confounding bias and collider bias (Hernán & Robins, 2020) . To define confounding bias and collider bias, we use causal diagrams in Figure 1 , and let X be the observed pre-treatment variables, T be the treatment variable, and Y be the outcome variable. Confounding bias results from common causes of treatment and outcome (Guo et al., 2020; Greenland, 2003; Hernán & Robins, 2020) . As shown in Figure 1(a) , there are two sources of association between T and Y : the path T → Y that represents the causal effect of T on Y , and the path T ← X → Y between T and Y that includes the common cause X, named the backdoor path (Pearl, 2009) , which introduces spurious associations into the observational data and results in P (T | X) ≠ P (T ). Confounding bias is very common in observational studies and can lead to incorrect treatment effect estimation. For example, when estimating the effect of job training programs on future incomes (LaLonde, 1986), work ability is a confounder that determines both whether an individual participated in the program and the individual's income. Due to the confounding bias, we may draw an incorrect conclusion about the effect of the job training programs on future incomes. Collider bias is a special case of sample selection bias that results from conditioning on a common effect of T and Y (Greenland et al., 1999; Greenland, 2003; Hernan et al., 2004; Westreich, 2012; Elwert & Winship, 2014) , as shown in Figure 1(b) , where S is the selection variable indicating whether a unit is selected, i.e., S = 1 when the unit is selected for observation and Y is observable, otherwise S = 0 and we cannot observe Y (Smith & Elkan, 2004b) . Except for the path T → Y , the

