IDENTIFYING TREATMENT EFFECTS UNDER UNOB-SERVED CONFOUNDING BY CAUSAL REPRESENTA-TION LEARNING

Abstract

As an important problem of causal inference, we discuss the estimation of treatment effects under the existence of unobserved confounding. By representing the confounder as a latent variable, we propose Counterfactual VAE, a new variant of variational autoencoder, based on recent advances in identifiability of representation learning. Combining the identifiability and classical identification results of causal inference, under mild assumptions on the generative model and with small noise on the outcome, we theoretically show that the confounder is identifiable up to an affine transformation and then the treatment effects can be identified. Experiments on synthetic and semi-synthetic datasets demonstrate that our method matches the state-of-the-art, even under settings violating our formal assumptions.

1. INTRODUCTION

Causal inference (Imbens & Rubin, 2015; Pearl, 2009) , i.e, estimating causal effects of interventions, is a fundamental problem across many domains. In this work, we focus on the estimation of treatment effects, e.g., effects of public policies or a new drug, based on a set of observations consisting of binary labels for treatment / control (non-treated), outcome, and other covariates. The fundamental difficulty of causal inference is that we never have observations of counterfactual outcomes, which would have been if we had made another decision (treatment or control). While the ideal protocol for causal inference is randomized controlled trials (RCTs), they often have ethical and practical issues, or are prohibitively expensive. Thus, causal inference from observational data is indispensable, though they introduce other challenges. Perhaps the most crucial one is confounding: there might be variables (called confounders) that causally affect both the treatment and the outcome, and spurious correlation follows. Most of works in causal inference rely on the unconfoundedness assumption that appropriate covariates are collected so that the confounding can be controlled by conditioning on or adjusting for those variables. This is still challenging, due to systematic difference of the distributions of the covariates between the treatment and control groups. One classical way of dealing with this difference is re-weighting (Horvitz & Thompson, 1952) . There are semi-parametric methods, which have better finite sample performance, e.g. TMLE (Van der Laan & Rose, 2011), and also non-parametric, tree-based, methods, e.g. Causal Forests (CF) (Wager & Athey, 2018) . Notably, there is a recent rise of interest in representation learning for causal inference starting from Johansson et al. (2016) . There are a few lines of works that challenge the difficult but important problem of causal inference under unobserved confounding. Without covariates we can adjust for, many of them assume special structures among the variables, such as instrumental variables (IVs) (Angrist et al., 1996 ), proxy variables (Miao et al., 2018) , network structure (Ogburn, 2018), and multiple causes (Wang & Blei, 2019). Among them, instrumental variables and proxy (or surrogate) variables are most commonly exploited. Instrumental variables are not affected by unobserved confounders, influencing the outcome only through the treatment. On the other hand, proxy variables are causally connected to unobserved confounders, but are not confounding the treatment and outcome by themselves. Other methods use restrictive parametric models (Allman et al., 2009) , or only give interval estimation (Manski, 2009; Kallus et al., 2019) . In this work, we address the problem of estimating treatment effects under unobserved confounding. We further discuss the individual-level treatment effect, which measures the treatment effect conditioned on the covariate, for example, on a patient's personal data. To model the problem, we regard the covariate as a proxy variable and the confounder as a latent variable in representation learning. Our method particularly exploits the recent advance of identifiability of representation learning for VAE (Khemakhem et al., 2020) . The hallmark of deep neural networks (NNs) might be that they can learn representations of data. It is desirable that the learned representations are interpretable, that is, in approximately the same relationship to the latent sources for each down-stream task. A principled approach to this is identifiability, that is, when optimizing our learning objective w.r.t. the representation function, only a unique optimum will be returned. Our method builds on this and further provides the stronger identifiability of representations that is needed in causal inference. The proposed method is also based firmly on the well-established results in causal inference. In many works exploiting proxies, it is assumed that the proxies are independent of the outcome given the confounder (Greenland, 1980; Rothman et al., 2008; Kuroki & Pearl, 2014) . This also motivates our method. Further, our method naturally combines a new VAE architecture with the classical result of Rosenbaum & Rubin (1983) regarding the sufficient information for identification of treatment effects, showing identifiability proof of both latent representations and treatment effects. The main contributions of this paper are as follows: 1) interpretable, causal representation learning by a new VAE architecture for estimating treatment effects under unobserved confounding; 2) theoretical analysis of the identifiability of representation and treatment effect; 3) experimental study on diverse settings showing performance of state-of-the-art.

2. RELATED WORK

Identifiability of representation learning. With recent advances in nonlinear ICA, identifiability of representations is proved under a number of settings, e.g., auxiliary task for representation learning (Hyvärinen & Morioka, 2016; Hyvärinen et al., 2019) and VAE (Khemakhem et al., 2020) . Recently, Roeder et al. (2020) extends the the result to include a wide class of state-of-the-art deep discriminative models. The results are exploited in bivariate causal discovery (Wu & Fukumizu, 2020) and structure learning (Yang et al., 2020) . To the best of our knowledge, this work is the first to explore this new possibility in causal inference. Representation learning for causal inference. Recently, researchers start to design representation learning methods for causal inference, but mostly limited to unconfounded settings. Some methods focus on learning a balanced covariate representation, e.g., BLR/BNN (Johansson et al., 2016) , and TARnet/CFR (Shalit et al., 2017) . Adding to this, Yao et al. ( 2018) also exploits the local similarity of between data points. Shi et al. (2019) uses similar architecture to TARnet, considering the importance of treatment probability. There are also methods using GAN (Yoon et al., 2018, GANITE) and Gaussian process (Alaa & van der Schaar, 2017). Our method adds to these by also tackling the harder problem of unobserved confounding. Causal inference with auxiliary structures. Both our method and CEVAE (Louizos et al., 2017) are motivated by exploiting proxies and use VAE as a learning method. However, CEVAE assumes a specific causal graph where the covariates should be independent of the treatment given the confounder. Further, CEVAE relies on the assumption that VAE can recover the true latent distribution. Kallus et al. (2018) uses matrix factorization to infer the confounders from proxy variables, and gives consistent ATE estimator and its error bound. Miao et al. (2018) established conditions for identification using more general proxies, but without practical estimation method. Note that, two active lines of works in machine learning exist in their own right, exploiting IV (Hartford et al., 2017) and network structure (Veitch et al., 2019) .

3.1. TREATMENT EFFECTS AND CONFOUNDERS

Following Imbens & Rubin (2015) , we begin by introducing potential outcomes (or counterfactual outcomes) y(t), t = 0, 1. y(t) is the outcome we would observe, if we applied treatment value t.

