VCNET AND FUNCTIONAL TARGETED REGULARIZA-TION FOR LEARNING CAUSAL EFFECTS OF CONTINU-OUS TREATMENTS

Abstract

Motivated by the rising abundance of observational data with continuous treatments, we investigate the problem of estimating the average dose-response curve (ADRF). Available parametric methods are limited in their model space, and previous attempts in leveraging neural network to enhance model expressiveness relied on partitioning continuous treatment into blocks and using separate heads for each block; this however produces in practice discontinuous ADRFs. Therefore, the question of how to adapt the structure and training of neural network to estimate ADRFs remains open. This paper makes two important contributions. First, we propose a novel varying coefficient neural network (VCNet) that improves model expressiveness while preserving continuity of the estimated ADRF. Second, to improve finite sample performance, we generalize targeted regularization to obtain a doubly robust estimator of the whole ADRF curve.

1. INTRODUCTION

Continuous treatments arise in many fields, including healthcare, public policy, and economics. With the widespread accumulation of observational data, estimating the average dose-response function (ADRF) while correcting for confounders has become an important problem (Hirano & Imbens, 2004; Imai & Van Dyk, 2004; Kennedy et al., 2017; Fong et al., 2018) . Recently, papers in causal inference (Johansson et al., 2016; Alaa & van der Schaar, 2017; Shalit et al., 2017; Schwab et al., 2019; Farrell et al., 2018; Shi et al., 2019) have utilized feed forward neural network for modeling. The success of using neural network model lies in the fact that neural networks, unlike traditional parametric models, are very flexible in modeling the complex causal relationship as shown by the universal approximation theorem (Csáji et al., 2001) . Also, unlike traditional non-parametric models, neural network has been shown to be powerful when dealing with high-dimensional input (i.e., Masci et al. (2011); Johansson et al. (2016) ), which implies its potential for dealing with high-dimensional confounders. A successful application of neural network to causal inference requires a specially designed network structure that distinguishes the treatment variable from other covariates, since otherwise the treatment information might be lost in the high dimensional latent representation (Shalit et al., 2017) . However, most of the existing network structures are designed for binary treatments and are difficult to generalize to treatments taking value in continuum. 2019) used separate prediction heads for the two treatment options and this structure is not directly applicable for continuous treatments as there is an infinite number of treatment levels. To deal with a continuous treatment, recent work (Schwab et al., 2019) proposed a modification called DRNet. DRNet partitions a continuous treatment into blocks and for each block, trains a separate head, in which the treatment is concatenated into each hidden layer (see Figure 2 ). Despite the improvements made by the building block of DRNet, this structure does not



For example, Shalit et al. (2017); Louizos et al. (2017); Schwab et al. (2019); Shi et al. (

