SYNCTWIN: TRANSPARENT TREATMENT EFFECT ESTIMATION UNDER TEMPORAL CONFOUNDING

Abstract

Estimating causal treatment effects using observational data is a problem with few solutions when the confounder has a temporal structure, e.g. the history of disease progression might impact both treatment decisions and clinical outcomes. For such a challenging problem, it is desirable for the method to be transparent -the ability to pinpoint a small subset of data points that contribute most to the estimate and to clearly indicate whether the estimate is reliable or not. This paper develops a new method, SyncTwin, to overcome temporal confounding in a transparent way. SyncTwin estimates the treatment effect of a target individual by comparing the outcome with its synthetic twin, which is constructed to closely match the target in the representation of the temporal confounders. SyncTwin achieves transparency by enforcing the synthetic twin to only depend on the weighted combination of few other individuals in the dataset. Moreover, the quality of the synthetic twin can be assessed by a performance metric, which also indicates the reliability of the estimated treatment effect. Experiments demonstrate that SyncTwin outperforms the benchmarks in clinical observational studies while still being transparent.

1. INTRODUCTION

Estimating the causal individual treatment effect (ITE) on patient outcomes using observational data (observational studies) has become a promising alternative to clinical trials as large-scale electronic health records become increasingly available (Booth & Tannock, 2014) . Figure 1 illustrates a common setting in medicine and it will be the focus of this work (DiPietro, 2010): an individual may start the treatment at some observed time (black dashed line) and we want to estimate the ITE on the outcomes over time after the treatment starts (shaded area). The key limitation of observational studies is that treatment allocation is not randomised but typically influenced by prior measurable static covariates (e.g. gender, ethnicity) and temporal covariates (e.g. all historical medical diagnosis and conditions, squares in Figure 1 ). When the covariates also modulate the patient outcomes, they lead to the confounding bias in the direct estimation of the ITE (Psaty et al., 1999) . Although a plethora of methods overcome the confounding bias by adjusting for the static covariates (Yoon et al., 2018; Yao et al., 2018; Louizos et al., 2017; Shalit et al., 2017; Li & Fu, 2017; Alaa & van der Schaar, 2017; Johansson et al., 2016) , few existing works take advantage of the temporal covariates that are measured irregularly over time (Figure 1 ) (Bica et al., 2020; Lim et al., 2018; Schulam & Saria, 2017; Roy et al., 2017) . Overcoming the confounding bias due to temporal covariates is especially important for medical research as clinical treatment decisions are often based on the temporal progression of a disease. Transparency is highly desirable in such a challenging problem. Although transparency is a general concept, we will focus on two specific aspects (Arrieta et al., 2020) . ( 1) Explainability: the method should estimate the ITE of any given individual (the target individual) based on a small subset of other individuals (contributors) whose amount of contribution can be quantified (e.g using a weight between 0 and 1). Although the estimate of different target individuals may depend on different contributors, the method can always shortlist the few contributors for the expert to understand the 1



Figure 1: Illustration of a treated individual. Yellow dots represent the outcomes under no treatment.

