TIME SERIES COUNTERFACTUAL INFERENCE WITH HIDDEN CONFOUNDERS

Abstract

We present augmented counterfactual ordinary differential equations (ACODEs), a new approach to counterfactual inference on time series data with a focus on healthcare applications. ACODEs model interventions in continuous time with differential equations, augmented by auxiliary confounding variables to reduce inference bias. Experiments on tumor growth simulation and sepsis patient treatment response show that ACODEs outperform other methods like counterfactual Gaussian processes, recurrent marginal structural networks, and time series deconfounders in the accuracy of counterfactual inference. The learned auxiliary variables also reveal new insights into causal interventions and hidden confounders.

1. INTRODUCTION

Decision makers want to know how to produce desired outcomes and act accordingly, which requires causal understanding of cause and effect. In this paper, we consider applications in healthcare, where time series data on past features and outcomes are now widely available. Causality in time series have been long studied in statistics (Box et al., 2008) , and allows more powerful analysis than methods on time-independent data, like instrumental variable regression (Stock & Trebbi, 2003) . However, temporal causality in statistics and econometrics focuses mainly on passively discovering time lag structure (Eichler, 2012) . In contrast, decision-making applications need concrete interventions, which is more amenable to an interventionist approach to causality (Woodward, 2005; Pearl, 2009 ). To give one example, electronic health records (EHR) in healthcare provide an accessible history of a patient's disease progression over time, together with their treatment records and their results. To identify effective treatments, a doctor may want to ask counterfactual questions (Johansson et al., 2016) , like "Would this patient have lower blood sugar had she received a different medication?" Through such counterfactual analysis, medical professionals may hope to discover new cures and improve existing treatments. Similar situations arise in other use cases. For example, a user interface designer may want to ask "Would the user have clicked on this ad had it been in a different color?", substantiating their answer from counterfactual inference on clickstream data or other user behaviors. Counterfactual inference in time series has studied, assuming that all possible causal variables are observed (Soleimani et al., 2017; Schulam & Saria, 2017; Lim, 2018) . In practice, however, this assumption of perfect observability is not testable and too strong for many real-world scenarios (Bica et al., 2020) . For example, there are many ways in general to treat cancer, but each patient requires their own bespoke treatment plan based on unique characteristics of each case such as drug resistance and toxic response (Vlachostergios & Faltas, 2018; Kroschinsky et al., 2017; Bica et al., 2020) . However, these factors are also likely to be unmeasurable in practice, or otherwise not recorded in EHRs. Detecting these hidden confounding variables is therefore crucial to avoid bias in the estimation of treatment effects. The challenge introduced by confounders in counterfactual inference was first studied in the static setting. Wang & Blei (2019) developed a two-step method that estimates confounders with latent factor models, then infers potential outcomes with bias adjustment. However, confounders in time series can have their own dynamics, and can themselves be affected by the history of interventions. Subsequently, Bica et al. (2020) introducing recurrent neural networks (RNNs) into the factor model to estimate the dynamics of confounders. However, this method only works in discrete time setting with a fixed time step, due to how RNNs are structured. In this paper, we consider the continuous-time setting, which is more flexible in practice and provides more insights of the underlying mechanisms

