TREATMENT EFFECT ESTIMATION WITH COLLIDER BIAS AND CONFOUNDING BIAS Anonymous

Abstract

To answer causal questions from observational data, it is important to consider the mechanisms that determine which data values are observed and which are missing. Prior work has considered the treatment assignment mechanism and proposed methods to remove the confounding bias from the common causes of treatment and outcome. However, there are other issues in sample selection, commonly overlooked in prior work, that can bias the treatment effect estimation, such as the issue of censored outcome as a form of collider bias. In this paper, we propose the novel Selection Controlled CounterFactual Regression (SC-CFR) to simultaneously address confounding and collider bias. Specifically, we first calculate the magnitude of the collider bias of different instances by estimating the selection model and then add a control term to remove the collider bias while learning a balanced representation to remove the confounding bias when estimating the outcome model. Our theoretical analysis shows that we can achieve an unbiased treatment effect estimates from observational data with confounding and collider bias under certain assumptions. Extensive empirical results on both synthetic and real-world datasets show that our method consistently outperforms benchmarks on treatment effect estimation when both types of biases exist.

1. INTRODUCTION

Causal inference is a powerful statistical modeling tool for explanatory analysis and a central problem in causal inference is the estimation of treatment effect. The gold standard approach for treatment effect estimation is to conduct Randomized Controlled Trials (RCTs), but RCTs can be expensive (Kohavi & Longbotham, 2011) and sometimes infeasible (Bottou et al., 2013) . Therefore, it is important to develop effective approaches to estimate treatment effect from observational data. In observational studies, association does not imply causation, mainly due to the presence of (sample selection) biases in the data. There are two main sources of biases: confounding bias and collider bias (Hernán & Robins, 2020) . To define confounding bias and collider bias, we use causal diagrams in Figure 1 , and let X be the observed pre-treatment variables, T be the treatment variable, and Y be the outcome variable. Confounding bias results from common causes of treatment and outcome (Guo et al., 2020; Greenland, 2003; Hernán & Robins, 2020) . As shown in Figure 1 (a), there are two sources of association between T and Y : the path T → Y that represents the causal effect of T on Y , and the path T ← X → Y between T and Y that includes the common cause X, named the backdoor path (Pearl, 2009) , which introduces spurious associations into the observational data and results in P (T | X) ≠ P (T ). Confounding bias is very common in observational studies and can lead to incorrect treatment effect estimation. For example, when estimating the effect of job training programs on future incomes (LaLonde, 1986) , work ability is a confounder that determines both whether an individual participated in the program and the individual's income. Due to the confounding bias, we may draw an incorrect conclusion about the effect of the job training programs on future incomes. Collider bias is a special case of sample selection bias that results from conditioning on a common effect of T and Y (Greenland et al., 1999; Greenland, 2003; Hernan et al., 2004; Westreich, 2012; Elwert & Winship, 2014) , as shown in Figure 1 Currently, many causal inference methods have been proposed to estimate treatment effect directly from observational data with confounding bias, including propensity score based methods (Rosenbaum & Rubin, 1983; Dehejia & Wahba, 2002; Hirano et al., 2003; Hirano & Imbens, 2004; Williamson et al., 2012) , confounder balancing methods (Hainmueller, 2012; Kuang et al., 2017; Athey et al., 2018; Fong et al., 2018) and causal representation learning methods (Johansson et al., 2016; Shalit et al., 2017; Yao et al., 2018; Hassanpour & Greiner, 2020) . However, existing causal inference works mostly ignore collider bias in data, and thus suffer from the most common case where confounding bias and collider bias are both present, as shown in Figure 1 (c). In real-world scenarios, the above two biases both exist in observational data in most of time. Still taking the analysis of job training programs as an example, ones who has not participated in such programs with a lower income may be unwilling to report their current incomes, leading to collider bias. In this case, if we only control one of the two biases, the other will still affect our estimation. Therefore, it is necessary to develop an approach to solve both biases in treatment effect estimation. Most of the previous work on selection bias (Heckman, 1979; Chib et al., 2009; Marchenko & Genton, 2012; Ding, 2014; Ogundimu & Hutton, 2016; Wiemann et al., 2022) , including work that took confounding bias into account (Bareinboim et al., 2014; Bareinboim & Tian, 2015; Correa & Bareinboim, 2017) , has only solved the simple case of selection bias caused by covariates or the treatment variable. However, for the more complex situation of collider bias, only (Bareinboim & Pearl, 2012) discussed the feasibility of removing it with the assistance of some variables that meet certain conditions. At present, there is still no mature method that can solve both kinds of bias simultaneity. In this paper, our theoretical analysis shows that under certain assumptions, treatment effects can be unbiasedly estimated from observational data even in the presence of both confounding and collider biases. We propose the Selection Controlled CounterFactual Regression (SC-CFR) to simultaneously address both biases for treatment effect estimation. In SC-CFR, we first calculate the magnitude of the collider bias of different instances by estimating the selection model and then add a control term to remove the collider bias while learning a balanced representation to remove the confounding bias when estimating the outcome model. We conduct experiments on both synthetic and real-world datasets, and the results demonstrate that our method outperforms other baselines. The main contributions in this paper are as follows: (1) We propose and investigate a practical problem on treatment effect estimation from observational data with both confounding and collider biases, which is still an open problem in causal inference to the best of our knowledge. (2) We propose a novel SC-CFR algorithm to estimate average treatment effect in observational studies with both confounding bias and collider bias. (3) Our theoretical analysis shows that both collider and confounding biases can be simultaneously removed under certain assumptions. ( 4) Extensive experiments show our proposed SC-CFR algorithm achieves a better performance of treatment effect estimation in observational studies with both synthetic and real-world datasets.



Figure 1: Causal diagrams with either confounding bias, collider bias or both.

