CAUSAL ESTIMATION FOR TEXT DATA WITH (APPAR-ENT) OVERLAP VIOLATIONS

Abstract

Consider the problem of estimating the causal effect of some attribute of a text document; for example: what effect does writing a polite vs. rude email have on response time? To estimate a causal effect from observational data, we need to adjust for confounding aspects of the text that affect both the treatment and outcome-e.g., the topic or writing level of the text. These confounding aspects are unknown a priori, so it seems natural to adjust for the entirety of the text (e.g., using a transformer). However, causal identification and estimation procedures rely on the assumption of overlap: for all levels of the adjustment variables, there is randomness leftover so that every unit could have (not) received treatment. Since the treatment here is itself an attribute of the text, it is perfectly determined, and overlap is apparently violated. The purpose of this paper is to show how to handle causal identification and obtain robust causal estimation in the presence of apparent overlap violations. In brief, the idea is to use supervised representation learning to produce a data representation that preserves confounding information while eliminating information that is only predictive of the treatment. This representation then suffices for adjustment and satisfies overlap. Adapting results on non-parametric estimation, we find that this procedure is robust to conditional outcome misestimation, yielding a low-absolute-bias estimator with valid uncertainty quantification under weak conditions. Empirical results show strong improvements in bias and uncertainty quantification relative to the natural baseline. Code, demo data and a tutorial are available at https://github.com/gl-ybnbxb/TI-estimator. 1. We give a formal causal estimand corresponding to the text-attribute question. We show this estimand is causally identified under weak conditions, even in the presence of apparent overlap issues. 2. We show how to efficiently estimate this quantity using the adapted double-ML technique just described. We show that this estimator admits a central limit theorem at a fast ( n) rate under weak conditions on the rate at which the ML model learns the text-outcome relationship (namely, convergence at n 1/4 rate). This implies absolute bias decreases rapidly, and an (asymptotically) valid procedure for uncertainty quantification. 3. We test the performance of this procedure empirically, finding significant improvements in bias and uncertainty quantification relative to the outcome-model-only baseline.

1. INTRODUCTION

We consider the problem of estimating the causal effect of an attribute of a passage of text on some downstream outcome. For example, what is the effect of writing a polite or rude email on the amount of time it takes to get a response? In principle, we might hope to answer such questions with a randomized experiment. However, this can be difficult in practice-e.g., if poor outcomes are costly or take long to gather. Accordingly, in this paper, we will be interested in estimating such effects using observational data. There are three steps to estimating causal effects using observational data (See Chapter 36 Murphy (2023) ). First, we need to specify a concrete causal quantity as our estimand. That is, give a formal quantity target of estimation corresponding to the high-level question of interest. The next step is causal identification: we need to prove that this causal estimator can, in principle, be estimated using only observational data. The standard approach for identification relies on adjusting for confounding variables that affect both the treatment and the outcome. For identification to hold, our adjustment variables must satisfy two conditions: unconfoundedness and overlap. The former requires the adjustment variables contain sufficient information on all common causes. The latter requires that the adjustment variable does not contain enough information about treatment assignment to let us perfectly predict it. Intuitively, to disentangle the effect of treatment from the effect of confounding, we must observe each treatment state at all levels of confounding. The final step is estimation using a finite data sample. Here, overlap also turns out to be critically important as a major determinant of the best possible accuracy (asymptotic variance) of the estimator Chernozhukov et al. (2016) . Since the treatment is a linguistic property, it is often reasonable to assume that text data has information about all common causes of the treatment and the outcome. Thus, we may aim to satisfy unconfoundedness in the text setting by adjusting for all the text as the confounding part. However, doing so brings about overlap violation. Since the treatment is a linguistic property determined by the text, the probability of treatment given any text is either 0 or 1. The polite/rude tone is determined by the text itself. Therefore, overlap does not hold if we naively adjust for all the text as the confounding part. This problem is the main subject of this paper. Or, more precisely, our goal is to find a causal estimand, causal identification conditions, and a robust estimation procedure that will allow us to effectively estimate causal effects even in the presence of such (apparent) overlap violations. In fact, there is an obvious first approach: simply use a standard plug-in estimation procedure that relies only on modeling the outcome from the text and treatment variables. In particular, do not make any explicit use of the propensity score, the probability each unit is treated. Pryzant et al. (2020) use an approach of this kind and show it is reasonable in some situations. Indeed, we will see in Sections 3 and 4 that this procedure can be interpreted as a point estimator of a controlled causal effect. Even once we understand what the implied causal estimand is, this approach has a major drawback: the estimator is only accurate when the text-outcome model converges at a very fast rate. This is particularly an issue in the text setting, where we would like to use large, flexible, deep learning models for this relationship. In practice, we find that this procedure works poorly: the estimator has significant absolute bias and (the natural approach to) uncertainty quantification almost never includes the estimand true value; see Section 5. The contribution of this paper is a method for robustly estimating causal effects in text. The main idea is to break estimation into a two-stage procedure, where in the first stage we learn a representation of the text that preserves enough information to account for confounding, but throws away enough information to avoid overlap issues. Then, we use this representation as the adjustment variables in a standard double machine-learning estimation procedure Chernozhukov et al. (2016; 2017a) . To establish this method, the contributions of this paper are: timation of the causal effect of text attributes. Their focus is primarily on mismeasurement of the treatments, while our motivation is robust estimation. This paper also relates to work on causal estimation with (near) overlap violations. D' Amour et al. (2021) points out high-dimensional adjustment (e.g., Rassen et al., 2011; Louizos et al., 2017; Li et al., 2016; Athey et al., 2017) suffers from overlap issues. Extra assumptions such as sparsity are often needed to meet the overlap condition. These results do not directly apply here because we assume there exists a low-dimensional summary that suffices to handle confounding. D'Amour & Franks (2021) studies summary statistics that suffice for identification, which they call deconfounding scores. The supervised representation learning approach in this paper can be viewed as an extremal case of the deconfounding score. However, they consider the case where ordinary overlap holds with all observed features, with the aim of using both the outcome model and propensity score to find efficient statistical estimation procedures (in a linear-gaussian setting). This does not make sense in the setting we consider. Additionally, our main statistical result (robustness to outcome model estimation) is new. 2 NOTATION AND PROBLEM SETUP Z < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > A < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > Ã < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 a x W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 a x W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 a x W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 a x W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > X < l a t e x i t s h a 1 _ b a s e 6 4 We follow the causal setup of Pryzant et al. (2020) . We are interested in estimating the causal effect of treatment A on outcome Y . For example, how does writing a negative sentiment (A) review (X ) affect product sales (Y )? There are two immediate challenges to estimating such effects with observed text data. First, we do not actually observe A, which is the intent of the writer. Instead, we only observe Ã, a version of A that is inferred from the text itself. In this paper, we will assume that A = Ã almost surely-e.g., a reader can always tell if a review was meant to be negative or positive. This assumption is often reasonable, and follows Pryzant et al. (2020) . The next challenge is that the treatment may be correlated with other aspects of the text (Z) that are also relevant to the outcomee.g., the product category of the item being reviewed. Such Z can act as confounding variables, and must somehow be adjusted for in a causal estimation problem. = " O 1 G C N F O L E f X r H + 8 0 J B k v T Q l 6 2 o 4 = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E e q x 6 M V j C / Y D 2 l A 2 2 0 m 7 d r M J u x u h h P 4 C L x 4 U 8 e p P 8 u a / c d v m o K 0 v L D y 8 M 8 P O v E E i u D a u + + 0 U N j a 3 t n e K u 6 W 9 / Y P D o / L x S V v H q W L Y Y r G I V T e g G g W X 2 D L c C O w m C m k U C O w E k 7 t 5 v f O E S v N Y P p h p g n 5 E R 5 K H n F F j r W Z 3 U K 6 4 V X c h s g 5 e D h X I 1 R i U v / r D m K U R S s M E 1 b r n u Y n x M 6 o M Z w J n p X 6 q M a F s Q k f Y s y h p h N r P F o v O y I V 1 h i S M l X 3 S k I X 7 e y K j k d b T K L C d E T V j v V q b m / / V e q k J b / y M y y Q 1 K N n y o z A V x M R k f j U Z c o X M i K k F y h S 3 u x I 2 p o o y Y 7 M p 2 R C 8 1 Z P X o X 1 V 9 S w 3 r y v 1 2 z y O I p z B O V y C B z W o w z 0 0 o A U M E J 7 h F d 6 c R + f F e X c + l q 0 F J 5 8 5 h T 9 y P n 8 A t b u M 3 A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " O 1 G C N F O L E f X r H + 8 0 J B k v T Q l 6 2 o 4 = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E e q x 6 M V j C / Y D 2 l A 2 2 0 m 7 d r M J u x u h h P 4 C L x 4 U 8 e p P 8 u a / c d v m o K 0 v L D y 8 M 8 P O v E E i u D a u + + 0 U N j a 3 t n e K u 6 W 9 / Y P D o / L x S V v H q W L Y Y r G I V T e g G g W X 2 D L c C O w m C m k U C O w E k 7 t 5 v f O E S v N Y P p h p g n 5 E R 5 K H n F F j r W Z 3 U K 6 4 V X c h s g 5 e D h X I 1 R i U v / r D m K U R S s M E 1 b r n u Y n x M 6 o M Z w J n p X 6 q M a F s Q k f Y s y h p h N r P F o v O y I V 1 h i S M l X 3 S k I X 7 e y K j k d b T K L C d E T V j v V q b m / / V e q k J b / y M y y Q 1 K N n y o z A V x M R k f j U Z c o X M i K k F y h S 3 u x I 2 p o o y Y 7 M p 2 R C 8 1 Z P X o X 1 V 9 S w 3 r y v 1 2 z y O I p z B O V y C B z W o w z 0 0 o A U M E J 7 h F d 6 c R + f F e X c + l q 0 F J 5 8 5 h T 9 y P n 8 A t b u M 3 A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " O 1 G C N F O L E f X r H + 8 0 J B k v T Q l 6 2 o 4 = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E e q x 6 M V j C / Y D 2 l A 2 2 0 m 7 d r M J u x u h h P 4 C L x 4 U 8 e p P 8 u a / c d v m o K 0 v L D y 8 M 8 P O v E E i u D a u + + 0 U N j a 3 t n e K u 6 W 9 / Y P D o / L x S V v H q W L Y Y r G I V T e g G g W X 2 D L c C O w m C m k U C O w E k 7 t 5 v f O E S v N Y P p h p g n 5 E R 5 K H n F F j r W Z 3 U K 6 4 V X c h s g 5 e D h X I 1 R i U v / r D m K U R S s M E 1 b r n u Y n x M 6 o M Z w J n p X 6 q M a F s Q k f Y s y h p h N r P F o v O y I V 1 h i S M l X 3 S k I X 7 e y K j k d b T K L C d E T V j v V q b m / / V e q k J b / y M y y Q 1 K N n y o z A V x M R k f j U Z c o X M i K k F y h S 3 u x I 2 p o o y Y 7 M p 2 R C 8 1 Z P X o X 1 V 9 S w 3 r y v 1 2 z y O I p z B O V y C B z W o w z 0 0 o A U M E J 7 h F d 6 c R + f F e X c + l q 0 F J 5 8 5 h T 9 y P n 8 A t b u M 3 A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " O 1 G C N F O L E f X r H + 8 0 J B k v T Q l 6 2 o 4 = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E e q x 6 M V j C / Y D 2 l A 2 2 0 m 7 d r M J u x u h h P 4 C L x 4 U 8 e p P 8 u a / c d v m o K 0 v L D y 8 M 8 P O v E E i u D a u + + 0 U N j a 3 t n e K u 6 W 9 / Y P D o / L x S V v H q W L Y Y r G I V T e g G g W X 2 D L c C O w m C m k U C O w E k 7 t 5 v f O E S v N Y P p h p g n 5 E R 5 K H n F F j r W Z 3 U K 6 4 V X c h s g 5 e D h X I 1 R i U v / r D m K U R S s M E 1 b r n u Y n x M 6 o M Z w J n p X 6 q M a F s Q k f Y s y h p h N r P F o v O y I V 1 h i S M l X 3 S k I X 7 e y K j k d b T K L C d E T V j v V q b m / / V e q k J b / y M y y Q 1 K N n y o z A V x M R k f j U Z c o X M i K k F y h S 3 u x I 2 p o o y Y 7 M p 2 R C 8 1 Z P X o X 1 V 9 S w 3 r y v 1 2 z y O I p z B O V y C B z W o w z 0 0 o A U M E J 7 h F d 6 c R + f F e X c + l q 0 F J 5 8 5 h T 9 y P n 8 A t b u M 3 A = = < / l a t e x i t > Y < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > Treatment) Other properties) Outcome) Perceived treatment) Each unit (A i , Z i , X i , Y i ) is drawn independently and identically from an unknown distribution P. Figure 1 shows the causal relationships among variables, where solid arrows represent causal relations, and the dotted line represents possible correlations between two variables. We assume that text X contains all common causes of Ã and the outcome Y .

3. IDENTIFICATION AND CAUSAL ESTIMAND

The first task is to translate the qualitative causal question of interest-what is the effect of A on Y -into a causal estimand. This estimand must both be faithful to the qualitative question and be identifiable from observational data under reasonable assumptions. The key challenges here are that we only observe Ã (not A itself), there are unknown confounding variables influencing the text, and Ã is a deterministic function of the text, leading to overlap violations if we naively adjust for all the text. Our high-level idea is to split the text into abstract (unknown) parts depending on whether they are confounding-affect both Ã and Y -or whether they affect Ã alone. The part of the text that affects only Ã is not necessary for causal adjustment, and can be thrown away. If this part contains "enough" information about Ã, then throwing it away can eliminate our ability to perfectly predict Ã, thus fixing the overlap issue. We now turn to formalizing this idea, showing how it can be used to define an estimand and to identify this estimand from observational data.

Causal model

The first idea is to decompose the text into three parts: one part affected by only A, one part affected interactively by A and Z, and another part affected only by Z. We use X A , X A∧Z and X Z to denote them, respectively; see Figure 2 for the corresponding causal model. Note that there could be additional information in the text in addition to these three parts. However, since they are irrelevant to both A and Z, we do not need to consider them in the model. < l a t e x i t s h a 1 _ b a s e 6 4 = " q 6 S w H Z < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " k v Z g z x K 4 V b r o w 2 W z P 7 h D g V Z / h 8 I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x B f u B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z k N N h Z 8 p v J R F s = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x B f s B b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 d 2 s 3 n 5 C p X k s H 8 w k Q T + i Q 8 l D z q i x V u O m X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 2 z y O I p z A K Z y D B 1 d Q g 3 u o Q x M Y I D z + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z + K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A z F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F F Q S E f R Y 9 O K x B f s h b S i b 7 a R d u 9 m E 3 Y 1 Q Q n + B F w + K e P U n e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 i A R X B v X / X Y K a + s b m 1 v F 7 d L O 7 t 7 + Q f n w q K X j V D F s s l j E q h N Q j Y J L b B p u B H Y S h T Q K B L a D 8 e 2 s 3 n 5 C p X k s 7 8 0 k Q T + i Q 8 l D z q i x V u O h X 6 6 4 V X c u s g p e D h X I V e + X v 3 q D m K U R S s M E 1 b r r u Y n x M 6 o M Z w K n p V 6 q M a F s T I f Y t S h p h N r P 5 o t O y Z l 1 B i S M l X 3 S k L n 7 e y K j k d a T K L C d E T U j v V y b m f / V u q k J r / 2 M y y Q 1 K N n i o z A V x M R k d j U Z c I X M i I k F x i + B x / 2 o K u q a + K u 9 f I H c S Y = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D e r t 5 9 Q a Z 7 I R z N J M Y j p U P K I M 2 q s 5 X f 6 + c 2 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y w / X N Y a t 0 U c Z T i B U z g H D 6 6 g A f f Q B B 8 Y c H i G V 3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z + 3 m I 6 c < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q 6 S w H x i + B x / 2 o K u q a + K u 9 f I H c S Y = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D e r t 5 9 Q a Z 7 I R z N J M Y j p U P K I M 2 q s 5 X f 6 + c 2 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y w / X N Y a t 0 U c Z T i B U z g H D 6 6 g A f f Q B B 8 Y c H i G V 3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z + 3 m I 6 c < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q 6 S w H x i + B x / 2 o K u q a + K u 9 f I H c S Y = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D e r t 5 9 Q a Z 7 I R z N J M Y j p U P K I M 2 q s 5 X f 6 + c 2 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y w / X N Y a t 0 U c Z T i B U z g H D 6 6 g A f f Q B B 8 Y c H i G V 3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z + 3 m I 6 c < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q 6 S w H  x i + B x / 2 o K u q a + K u 9 f I H c S Y = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D e r t 5 9 Q a Z 7 I R z N J M Y j p U P K I M 2 q s 5 X f 6 + c 2 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y w / X N Y a t 0 U c Z T i B U z g H D 6 6 g A f f Q B B 8 Y c H i G V V J k 5 6 H 3 T 2 R M O Q 7 3 L h Q x K 0 f 4 8 6 / s Z P M Q h M v N B x u V V H V 1 0 s E V 9 q 2 v 6 3 C y u r a + k Z x s 7 S 1 v b O 7 V 9 4 / a K o 4 l Q w b L B a x b H t U o e A R N j T X A t u J R B p 6 A l v e 8 G Z a b 4 1 Q K h 5 H 9 3 q c o B v S f s Q D z q g 2 l t v u Z V f d J / T 7 S B 4 m v X L F r t o z k W V w c q h A r n q v / N X 1 Y 5 a G G G k m q F I d x 0 6 0 m 1 G p O R M 4 K X V T h Q l l Q 9 r H j s G I h q j c b H b 0 h J w Y x y d B L M 2 L N J m 5 v y c y G i o 1 D j 3 T G V I 9 U I u 1 q f l f r Z P q 4 N L N e J S k G i M 2 X x S F 7 r d Q X b G m g = " > A A A B 9 H i c b Z D L S g N B E E V r 4 i v G V 9 S l m 8 Y g u A o z I u g y 6 s Z l B P P A Z A g 9 P T V J k 5 6 H 3 T 2 R M O Q 7 3 L h Q x K 0 f 4 8 6 / s Z P M Q h M v N B x u V V H V 1 0 s E V 9 q 2 v 6 3 C y u r a + k Z x s 7 S 1 v b O 7 V 9 4 / a K o 4 l Q w b L B a x b H t U o e A R N j T X A t u J R B p 6 A l v e 8 G Z a b 4 1 Q K h 5 H 9 3 q c o B v S f s Q D z q g 2 l t v u Z V f d J / T 7 S B 4 m v X L F r t o z k W V w c q h A r n q v / N X 1 Y 5 a G G G k m q F I d x 0 6 0 m 1 G p O R M 4 K X V T h Q l l Q 9 r H j s G I h q j c b H b 0 h J w Y x y d B L M 2 L N J m 5 v y c y G i o 1 D j 3 T G V I 9 U I u 1 q f l f r Z P q 4 N L N e J S k G i M 2 X x S k g u i Y T B M g P p f I t B g b o E x y c y t h A y o p 0 y a n k g n B W f z y M j T P q o 7 h u / N K 7 T q P o w h H c A y n 4 M A F 1 O A W 6 t A A B o / w D K / w Z o 2 s F + v d + p i 3 F q x 8 5 h D + y P r 8 A V e C k c 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " y 7 w J E X D A x 3 l e w 5 3 j s F 7 r d Q X b G m g = " > A A A B 9 H i c b Z D L S g N B E E V r 4 i v G V 9 S l m 8 Y g u A o z I u g y 6 s Z l B P P A Z A g 9 P T V J k 5 6 H 3 T 2 R M O Q 7 3 L h Q x K 0 f 4 8 6 / s Z P M Q h M v N B x u V V H V 1 0 s E V 9 q 2 v 6 3 C y u r a + k Z x s 7 S 1 v b O 7 V 9 4 / a K o 4 l Q w b L B a x b H t U o e A R N j T X A t u J R B p 6 A l v e 8 G Z a b 4 1 Q K h 5 H 9 3 q c o B v S f s Q D z q g 2 l t v u Z V f d J / T 7 S B 4 m v X L F r t o z k W V w c q h A r n q v / N X 1 Y 5 a G G G k m q F I d x 0 6 0 m 1 G p O R M 4 K X V T h Q l l Q 9 r H j s G I h q j c b H b 0 h J w Y x y d B L M 2 L N J m 5 v y c y G i o 1 D j 3 T G V I 9 U I u 1 q f l f r Z P q 4 N L N e J S k G i M 2 X x S k g u i Y T B M g P p f I t B g b o E x y c y t h A y o p 0 y a n k g n B W f z y M j T P q o 7 h u / N K 7 T q P o w h H c A y n 4 M A F 1 O A W 6 t A A B o / w D K / w Z o 2 s F + v d + p i 3 F q x 8 5 h D + y P r 8 A V e C k c 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " y 7 w J E X D A x 3 l e w 5 3 j s A and Z are linguistic properties a writer based on and thus cannot be observed directly from data. When investigating the causal relationship between Ã and Y , (X A∧Z , X Z ) is a confounding part satisfying both unconfoundedness and overlap. F 7 r d Q X b G m g = " > A A A B 9 H i c b Z D L S g N B E E V r 4 i v G V 9 S l m 8 Y g u A o z I u g y 6 s Z l B P P A Z A g 9 P T V J k 5 6 H 3 T 2 R M O Q 7 3 L h Q x K 0 f 4 8 6 / s Z P M Q h M v N B x u V V H V 1 0 s E V 9 q 2 v 6 3 C y u r a + k Z x s 7 S 1 v b O 7 V 9 4 / a K o 4 l Q w b L B a x b H t U o e A R N j T X A t u J R B p 6 A l v e 8 G Z a b 4 1 Q K h 5 H 9 3 q c o B v S f s Q D z q g 2 l t v u Z V f d J / T 7 S B 4 m v X L F r t o z k W V w c q h A r n q v / N X 1 Y 5 a G G G k m q F I d x 0 6 0 m 1 G p O R M 4 K X V T h Q l l Q 9 r H j s G I h q j c b H b 0 h J w Y x y d B L M 2 L N J m 5 v y c y G i o 1 D j 3 T G V I 9 U I u 1 q f l f r Z P q 4 N L N e J S k G i M 2 X x S

Controlled direct effect (CDE)

The treatment A affects the outcome through two paths. Both "directly" through X A -the part of the text determined just by the treatment-and also through a path going through X A∧Z -the part of the text that relies on interaction effects with other factors. Our formal causal effect aims at capturing the effect of A through only the first, direct, path. CDE := X A∧Z ,X Z |A=1 [Y | X A∧Z , X Z , do(A = 1)] -[Y | X A∧Z , X Z , do(A = 0)] . (3.1) Here, do is Pearl's do notation, and the estimand is a variant of the controlled direct effect (Pearl, 2009) . Intuitively, it can be interpreted as the expected change in the outcome induced by changing the treatment from 1 to 0 while keeping part of the text affected by Z the same as it would have been had we set A = 1. This is a reasonable formalization of the qualitative "effect of A on Y ". Of course, it is not the only possible formalization. Its advantage is that, as we will see, it can be identified and estimated under reasonable conditions. Identification To identify CDE we must rewrite the expression in terms of observable quantities. There are three challenges: we need to get rid of the do operator, we don't observe A (only Ã), and the variables X A∧Z , X Z are unknown (they are latent parts of X ). Informally, the identification argument is as follows. First, X A∧Z , X Z block all backdoor paths (common causes) in Figure 2 . Moreover, because we have thrown away X A , we now satisfy overlap. Accordingly, the do operator can be replaced by conditioning following the usual causal-adjustment argument. Next, A = Ã almost surely, so we can just replace A with Ã. Now, our estimand has been reduced to: C DE := X A∧Z ,X Z | Ã=1 [Y | X A∧Z , X Z , Ã = 1] -[Y | X A∧Z , X Z , Ã = 0)] . (3.2) The final step is to deal with the unknown X A∧Z , X Z . To fix this issue, we first define the conditional outcome Q according to: Q( Ã, X ) := (Y | Ã, X ). (3.3) A key insight here is that, subject to the causal model in Figure 2 , we have Q( Ã, X ) = (Y | Ã, X A∧Z , X Z ). But this is exactly the quantity in (3.2). Moreover, Q( Ã, X ) is an observable data quantity (it depends only on the distribution of the observed quantities). In summary: Theorem 1. Assume the following: 1. (Causal structure) The causal relationships among A, Ã, Z, Y , and X satisfy the causal DAG in Figure 2 ; 2. (Overlap) 0 < P(A = 1 | X A∧Z , X Z ) < 1; 3. (Intention equals perception) A = Ã almost surely with respect to all interventional distributions. Then, the CDE is identified from observational data as CDE = τ CDE := X | Ã=1 [Y | η(X ), Ã = 1] -[Y | η(X ), Ã = 0] , (3.4) where η(X ) := (Q(0, X ), Q(1, X )). The proof is in Appendix B. We give the result in terms of an abstract sufficient statistic η(X ) to emphasize that the actual conditional expectation model is not required, only some statistic that is informationally equivalent. We emphasize that, regardless of whether the overlap condition holds or not, the propensity score of η(X ) is accessible and meaningful. Therefore, we can easily identify when identification fails as long as η(X ) is well-estimated.

4. METHOD

Our ultimate goal is to draw a conclusion about whether the treatment has a causal effect on the outcome. Following the previous section, we have reduced this problem to estimating τ CDE , defined in Theorem 1. The task now is to develop an estimation procedure, including uncertainty quantification.

4.1. OUTCOME ONLY ESTIMATOR

We start by introducing the naive outcome only estimator as a first approach to CDE estimation. The estimator is adapted from Pryzant et al. (2020) . The observation here is that, taking η(X ) = (Q(0, X ), Q(1, X )) in (3.4), we have τ CDE = X |A=1 [ (Y | A = 1, X ) -(Y | A = 0, X ) ] . (4.1) Since Q(A, X ) is a function of the whole text data X , it is estimable from observational data. Namely, it is the solution to the square error risk: Q = arg min Q [(Y -Q(A, X )) 2 ]. (4.2) With a finite sample, we can estimate Q as Q by fitting a machine-learning model to minimize the (possibly regularized) square error empirical risk. That is, fit a model using mean square error as the objective function. Then, a straightforward estimator is: τQ := 1 n 1 i:A i =1 Q1 (X i ) -Q0 (X i ), where n 1 is the number of treated units. It should be noted that the model for Q is not arbitrary. One significant issue for those models which directly regress Y on A and X is when overlap does not hold, the model could ignore A and only use X as the covariate. As a result, we need to choose a class of models that force the use of the treatment A. To address this, we use a two-headed model that regress Y on X for A = 0/1 separately in the conditional outcome learning model (See Section 4.2 and Figure 3 ). As discussed in the introduction Section 1, this estimator yields a consistent point estimate, but does not offer a simple approach for uncertainty quantification. A natural guess for an estimate of its variance is: v ar(τ Q ) := 1 n v ar( Q1 (X i ) -Q0 (X i ) | Q). (4.4) That is, just compute the variance of the mean conditional on the fitted model. However, this procedure yields asymptotically valid confidence intervals only if the outcome model converges extremely quickly; i.e., if [( Q -Q) 2 ] 1 2 = o(n -1 2 ). We could instead bootstrap, refitting Q on each bootstrap sample. However, with modern language models, this can be prohibitively computationally expensive.

4.2. TREATMENT IGNORANT EFFECT ESTIMATION (TI-ESTIMATOR)

Following Theorem 1, it suffices to adjust for η(X ) = (Q(0, X ), Q(1, X )). Accordingly, we use the following pipeline. We first estimate Q0 (X ) and Q1 (X ) (using a neural language model), as with the outcome-only estimator. Then, we take η(X ) := ( Q0 (X ), Q1 (X )) and estimate ĝη ≈ P(A = 1 | η). That is, we estimate the propensity score corresponding to the estimated representation. Finally, we plug the estimated Q and ĝη into a standard double machine learning estimator (Chernozhukov et al., 2016) . We describe the three steps in detail.

Table 1:

The TI-estimator significantly improves both bias and coverage relative to the baseline. Tables show average absolute bias and confidence interval coverage of CDE estimates, over 100 resimulations. The TI estimator τTI displays higher accuracy/smaller absolute bias of point estimate and much larger coverage proportions compared to outcomeonly estimator τQ . The treatment level equals true CDE, which takes 1.0 (with causal effect) and 0.0 (without causal effect). Low and high noise level corresponds to γ set to 1.0 and 4.0. Low and high confounding level corresponds to β c set to 50.0 and 100.0. choices do not matter asymptotically, we find they have a significant impact in actual finite sample estimation. We find that, in general, kernel regression works well for propensity score estimation and the vanilla the Augmented Inverse Probability of Treatment weighted Estimator (AIPTW) corresponding to the CDE works well. Finally, we reproduce the real-data analysis from Pryzant et al. (2020) . We find that politeness has a positive effect on reducing email response time.

Dataset

We closely follow the setup of Pryzant et al. (2020) . We use publicly available Amazon reviews for music products as the basis for our semi-synthetic data. We include reviews for mp3, CD and vinyl, and among these exclude reviews for products costing more than $100 or shorter than 5 words. The treatment A is whether the review is five stars (A = 1) or one/two stars (A = 0). To have a ground truth causal effect, we must now simulate the outcome. To produce a realistic dataset, we choose a real variable as the confounder. Namely, the confounder C is whether the product is a CD (C = 1) or not (C = 0). Then, outcome Y is generated according to Y ← β a A+β c (π(C) -β o )+γN (0, 1). The true causal effect is controlled by β a . We choose β a = 1.0, 0.0 to generate data with and without causal effects. In this setting, β a is the oracle value of our causal estimand. The strength of confounding is controlled by β c . We choose β c = 50.0, 100.0. The ground-truth propensity score is π(C) = P(A = 1|C). We set it to have the value π(0) = 0.8 and π(1) = 0.6 (by subsampling the data). β o is an offset [π(C)] = π(0)P(C = 0) + π(1)P(C = 1), where P(C = a), a = 0, 1 are estimated from data. Finally, the noise level is controlled by γ; we choose 1.0 and 4.0 to simulate data with small and large noise. The final dataset has 10, 685 data entries. Protocol For the language model, we use the pretrained distilbert-base-uncased model provided by the transformers package. The model is trained in the k-folding fashion with 5 folds. We apply the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 2e -5 and a batch size of 64. The maximum number of epochs is set as 20, with early stopping based on validation loss with a patience of 6. Each experiment is replicated with five different seeds and the final Q(a, x i ) predictions are obtained by averaging the predictions from the 5 resulting models. The propensity model is implemented by running the Gaussian process regression using GaussianProcessClassifier in the sklearn package with DotProduct + WhiteKernel kernel. (We choose different random state for the GPR to guarantee the convergence of the GPR.) The coverage experiment uses 100 replicates. 

Results

The main question here is the efficacy of the estimation procedure. Table 1 compares the outcome-only estimator τQ and the estimator τTI . First, the absolute bias of the new method is significantly lower than the absolute bias of the outcome-only estimator. This is particularly true where there is moderate to high levels of confounding. Next, we check actual coverage rates over 100 replicates of the experiment. First, we find that the naive approach for the outcome-only estimator fails completely. The nominal confidence interval almost never actually includes the true effect. It is wildly optimistic. By contrast, the confidence intervals from the new method often cover the true value. This is an enormous improvement over the baseline. Nevertheless, they still do not actually achieve their nominal (95%) coverage. This may be because the Q estimate is still not good enough for the asymptotics to kick in, and we are not yet justified in ignoring the uncertainty from model fitting.

5.2. APPLICATION: CONSUMER COMPLAINTS TO THE FINANCIAL PROTECTION BUREAU

We follow the same pipeline of the real data experiment in (Pryzant et al., 2020, §6.2 ). The dataset is consumers complaints made to the financial protection. Treatment A is politeness (measured using Yeomans et al. (2018) ) and the outcome Y is a binary indicator of whether complaints receive a response within 15 days. We use the same training procedure as for the simulation data. Table 2 shows point estimates and their 95% confidence intervals. Notice that the naive estimator show a significant negative effect of politeness on reducing response time. On the other hand, the more accurate AIPTW method as well as the outcome-only estimator have confidence intervals that cover only positive values, so we conclude that consumers' politeness has a positive effect on response time. This matches our intuitions that being more polite should increase the probability of receiving a timely reply.

6. DISCUSSION

In this paper, we address the estimation of the causal effect of a text document attribute using observational data. The key challenge is that we must adjust for the text-to handle confounding-but adjusting for all of the text violates overlap. We saw that this issue could be effectively circumvented with a suitable choice of estimand and estimation procedure. In particular, we have seen an estimand that corresponds to the qualitative causal question, and an estimator that is valid even when the outcome model is learned slowly. The procedure also circumvents the need for bootstrapping, which is prohibitively expensive in our setting. There are some limitations. The actual coverage proportion of our estimator is below the nominal level. This is presumably due to the imperfect fit of the conditional outcome model. Diagnostics (see Appendix D) show that as conditional outcome estimations become more accurate, the TI estimator becomes less biased, and its coverage increases. It seems plausible that the issue could be resolved by using more powerful language models. Although we have focused on text in this paper, the problem of causal estimation with apparent overlap violation exists in any problem where we must adjust for unstructured and high-dimensional covariates. Another interesting direction for future work is to understand how analogous procedures work outside the text setting. due to the dominated convergence theorem. By (A.1) and (A.2), we can bound the mean squared error of estimated propensity score in the following form: ĝη (X ) -g η (X ) 2 ≤ ĝη (X ) -f g ( Q0 (X ), Q1 (X )) 2 + f g ( Q0 (X ), Q1 (X )) -g η (X ) 2 = f g ( Q0 (X ), Q1 (X )) -f g (Q(0, X ), Q(1, X )) 2 + fg ( Q0 (X ), Q1 (X )) -f g ( Q0 (X ), Q1 (X )) 2 =O(n -1/2 ), (A.3) that is ĝη (X ) -g η (X ) 2 1 2 = O(n -1 4 ). Before we apply the conclusion of Theorem 5.1 in (Chernozhukov et al., 2017b) , we need to check all assumptions in Assumption 5.1 hold in Chernozhukov et al. (2017b) . Let C := max (2C  q 1 + 2 q ) 1 q , C 2 . (a) [Y -Q(A, X ) | η(X ), A] = 0, [A -g η (X ) | η(X )] = 0 are easily checked by invoking definitions of Q and g η . (b) [|Y | q ] 1 q ≤ C, [(Y -Q(A, X )) 2 ] 1 2 ≥ c, and sup η∈supp(η(X )) [(Y -Q(A, X )) 2 | η(X ) = η] ≤ C [ Q1 (X ) -Q(1, X ) q ] + [ Q0 (X ) -Q(0, X ) q ] + [ ĝη (X ) -g η (X ) q ] 1 q ≤ (C q 1 + C q 1 + 2 q ) 1 q ≤ C (e) Based on (A.3) and condition 1 in the theorem, we have [ Q1 (X ) -Q(1, X ) 2 ] + [ Q0 (X ) -Q(0, X ) 2 ] + [ ĝη (X ) -g η (X ) 2 ] 1 2 ≤ o(n -1 2 ) + o(n -1 2 ) + O(n -1 2 ) 1 2 ≤ O(n -1 4 ), [ Q0 (X ) -Q(0, X ) 2 ] 1 2 • [ ĝη (X ) -g η (X ) 2 ] 1 2 = o(n -1 2 ) (f) Based on condition 3 in the theorem, we have sup x∈supp(X ) [ ĝη (X ) -P(A = 1 | η(X )) 2 | η(X ) = η(x)] = O(n -1 2 ). We consider a smaller positive constant ε instead of ϵ. Note that for ε < ϵ, we still have P(ε ≤ g η (X ) ≤ 1 -ε) = 1. Then, P sup x∈supp(X ) ĝη (x) - 1 2 > 1 2 -ε = P inf x∈supp(X ) ĝη (x) < ε + P sup x∈supp(X ) ĝη (x) > 1 -ε ≤P inf x∈supp(X ) P(A = 1 | η(X ) = η(x)) -inf x∈supp(X ) ĝη (x) > ϵ -ε + P sup x∈supp(X ) ĝη (x) -sup x∈supp(X ) P(A = 1 | η(X ) = η(x)) > 1 -ε -(1 -ϵ) ≤ inf x∈supp(X ) ĝη (x) -inf x∈supp(X ) P(A = 1 | η(X ) = η(x)) 2 (ϵ -ε) 2 + sup x∈supp(X ) ĝη (x) -sup x∈supp(X ) P(A = 1 | η(X ) = η(x)) 2 (ϵ -ε) 2 ≤ 2 sup x∈supp(X ) ĝη (X ) -P (A = 1 | η(X ) = η(x)) 2 (ϵ -ε) 2 =O(n -1 2 ) Hence, P(sup x∈supp(X ) ĝη (x) -1 2 ≤ 1 2 -ε) ≥ 1 -O(n -1 2 ). With (a)-(f), we can invoke the conclusion in Theorem 5.1 in (Chernozhukov et al., 2017b) , and get the asymptotic normality of the TI estimator.

B PROOF OF CAUSAL IDENTIFICATION

Theorem 1. Assume the following: 1. (Causal structure) The causal relationships among A, Ã, Z, Y , and X satisfy the causal DAG in Figure 2 ; 2. (Overlap) 0 < P(A = 1 | X A∧Z , X Z ) < 1; 3. (Intention equals perception) A = Ã almost surely with respect to all interventional distributions. Then, the CDE is identified from observational data as CDE = τ CDE := X | Ã=1 [Y | η(X ), Ã = 1] -[Y | η(X ), Ã = 0] , (3.4) where η(X ) := (Q(0, X ), Q(1, X )). Proof. We first prove that this two-dimensional confounding part η(X ) satisfies positivity. Since (Q(0, X ), Q(1, X )) = ( [Y | A = 1, X A∧Z , X Z ] , [Y | A = 0, X A∧Z , X Z ] ) is a function of (X A∧Z , X Z ), the following equations hold: P(A = 1 | Q(0, X ), Q(1, X )) = (A | Q(0, X ), Q(1, X )) = [E (A | X A∧Z , X Z ) | Q(0, X ), Q(1, X )] = [P(A = 1| X A∧Z , X Z ) | Q(0, X ), Q(1, X )] . (B.1) As 0 < P(A = 1| X A∧Z , X Z ) < 1, we have 0 < P(A = 1| Q(0, X ), Q(1, X )) < 1. Furthermore, we have 0 < P( Ã = 1| Q(0, X ), Q(1, X )) < 1 due to almost everywhere equivalence of A and Ã. ATT AIPTW τTI has comparable results with other double machine learning estimators in most cases. For coverage proportion of confidence intervals, though it has lower rates in some cases, τTI has consistently the best performance. Especially in high confounding situations, the advantage of τTI is obvious. Estimator For each dataset, we compute estimators as follows. n 1 and n 0 stands for the number of individuals in the treated and controlled group. n = n 1 + n 0 is the total number of individuals. • "Unadjusted" baseline estimator: τnaive = 1 n 1 i:A i =1 Y i -1 n 0 i:A i =0 Y i • "Outcome-only" estimator: τQ = 1 n 1 i:A i =1 Q1,i -Q0,i • ATT AIPTW: τTI = 1 In this section, we discuss why the confidence intervals we get (See Table 1 ) have lower coverage than the nominated level 95%. We conduct diagnostics and find that the inaccuracy of Q's estimations is responsible for the low coverage. We compute absolute biases, variances, and coverages of τ TI 's with different mean squared errors ˆ [(Q -Q) 2 ] by using different numbers of datasets. According to Figure 4 -Figure 5 , as the mean squared error of Q increases, the bias of τ TI grows and the coverage of τ TI drops. Specifically, the highest coverage of each setting is almost 95% (use 50 datasets with most accurate conditional outcome estimations). In practice, one direct way to improve the TI estimator's accuracy is to apply better NLP models so that more accurate conditional outcome estimations can be obtained.  n 1 i:A i =1 A i (Y i -Q0,i ) -(1 -A i )(Y i -Q0,i ) ĝi 1-ĝ i



Figure 1: The causal DAG of the problem. A writer writes a text document X based on linguistic properties A and Z, where A is the treatment in the causal problem. A and Z cannot be observed directly in data and can only be seen via text. The dotted line represents possible correlation between A and Z. A reader perceives the treatment Ã from the text. The perceived treatment Ã together with contents of X determine the outcome Y .

D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A u M O M 3 g = = < / l a t e x i t > A < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D

D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D

D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D

D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g l s I Y P 9 2 P S M c W 8 D

D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A k t + M x Q = = < / l a t e x i t > Ã < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 ax W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K Ba d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T

t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 ax W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K Ba d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T

t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 ax W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K Ba d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T

t s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " v z b J 1 ax W E L O g R Z J I Z W Z l c J g L n x k = " > A A A B 7 3 i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 e K x g q 2 F N p T N Z t I u 3 W z i 7 k Y o o X / C i w d F v P p 3 v P l v 3 L Y 5 a O s L C w / v z L A z b 5 A K r o 3 r f j u l l d W 1 9 Y 3 y Z m V r e 2 d 3 r 7 p / 0 N Z J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A h + C 0 c 2 0 / v C E S v N E 3 p t x i n 5 M B 5 J H n F F j r U 7 P c B E i u e p X a 2 7 d n Y k s g 1 d A D Q o 1 + 9 W v X p i w L E Z p m K Ba d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L s W J Y 1 R + / l s 3 w k 5 s U 5 I o k T Z J w 2 Z u b 8 n c h p r P Y 4 D 2 x l T M 9 S L t a n 5 X 6 2 b m e j S z 7 l M M 4 O S z T+ K M k F M Q q b H k 5 A r Z E a M L V C m u N 2 V s C F V l B k b U c W G 4 C 2 e v A zt s 7 p n + e 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g I G A Z 3 i F N + f R e X H e n Y 9 5 a 8 k p Z g 7 h j 5 z P H 3 / X j 5 k = < / l a t e x i t > Y < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8

y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8

y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8

y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 5 R x A j u s 7 J 8 f H M z W O d P W Z p t g q n I = " > A A A B 6 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8

y h S 3 u x I 2 o o o y Y 7 M p 2 R C 8 5 Z N X o X V R 9 S w 3 L i u 1 m z y O I p z A K Z y D B 1 d Q g z u o Q x M Y I D z D K 7 w 5 j 8 6 L 8 + 5 8 L F o L T j 5 z D H / k f P 4 A t z + M 3 Q = = < / l a t e x i t > X A

3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z + 3 m I 6 c < / l a t e x i t > XZ < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 1 L N e u / k w C A T 2 f 2 d c A P w c / 7 J V y 0 = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x g m m L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f uG 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D u r t 5 9 Q a Z 7 I B z N J M Y j p U P K I M 2 q s 5 X f 6 + e O 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y z f X 9 Y a N 0 U c Z T i B U z g H D 6 6 g A X f Q B B 8 Y c H i G V3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z / d l Y 6 1 < / l a t e x i t > < l a t e x i t s ha 1 _ b a s e 6 4 = " 1 1L N e u / k w C A T 2 f 2 d c A P w c / 7 J V y 0 = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x g m m L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D u r t 5 9 Q a Z 7 I B z N J M Y j p U P K I M 2 q s 5 X f 6 + e O 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y z f X 9 Y a N 0 U c Z T i B U z g H D 6 6 g A X f Q B B 8 Y c H i G V3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z / d l Y 6 1 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 1 L N e u / k w C A T 2 f 2 d c A P w c / 7 J V y 0 = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8F Q S E f R Y 9 O K x g m m L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D u r t 5 9 Q a Z 7 I B z N J M Y j p U P K I M 2 q s 5 X f 6 + e O 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y z f X 9 Y a N 0 U c Z T i B U z g H D 6 6 g A X f Q B B 8 Y c H i G V3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z / d l Y 6 1 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 1 L N e u / k w C A T 2 f 2 d c A P w c / 7 J V y 0 = " > A A A B 7 H i c b Z B N S 8 N A E I Y n 9 a v W r 6 p H L 4 t F 8 F Q S E f R Y 9 O K x g m m L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y X F h 7 e m W F n 3 j A V X B v X / X Z K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K W T T D H 0 W S I S 1 Q m p R s E l + o Y b g Z 1 U I Y 1 D g e 1 w f D u r t 5 9 Q a Z 7 I B z N J M Y j p U P K I M 2 q s 5 X f 6 + e O 0 X 6 2 5 d X c u s g p e A T U o 1 O x X v 3 q D h G U x S s M E 1 b r r u a k J c q o M Z w K n l V 6 m M a V s T I f Y t S h p j D r I 5 8 t O y Z l 1 B i R K l H 3 S k L n 7 e y K n s d a T O L S d M T U j v V y b m f / V u p m J r o O c y z Q z K N n i o y g T x C R k d j k Z c I X M i I k F y h S 3 u x I 2 o o o y Y / O p 2 B C 8 5 Z N X o X V R 9 y z f X 9 Y a N 0 U c Z T i B U z g H D 6 6 g A X f Q B B 8 Y c H i G V 3 h z p P P i v D s f i 9 a S U 8 w c w x 8 5 n z / d l Y 6 1 < / l a t e x i t > XA^Z < l a t e x i t s h a 1 _ b a s e 6 4 = " y 7 w J E X D A x 3 l e w 5 3 j s F 7 r d Q X b G m g = " > A A A B 9 H i c b Z D L S g N B E E V r 4 i v G V 9 S l m 8 Y g u A o z I u g y 6 s Z l B P P A Z A g 9 P T

k g u i Y T B M g P p f I t B g b o E x y c y t h A y o p 0 y a n k g n B W f z y M j T P q o 7 h u / N K 7 T q P o w h H c A y n 4 M A F 1 O A W 6 t A A B o / w D K / w Z o 2 s F + v d + p i 3 F q x 8 5 h D + y P r 8 A V e C k c 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " y 7 w J E X D A x 3 l e w 5 3 j s

Figure 2: A more sophisticated causal model with the decomposition of text X . X A , X A∧Z , and X Z are parts of the text affected by only A, both A and Z, and only Z, respectively.A and Z are linguistic properties a writer based on and thus cannot be observed directly from data. When investigating the causal relationship between Ã and Y , (X A∧Z , X Z ) is a confounding part satisfying both unconfoundedness and overlap.

are guaranteed by the fourth condition in the theorem. (c) P ϵ ≤ g η (X ) ≤ 1 -ϵ = 1 is the second condition in the theorem. (d) Since propensity score function and its estimation are bounded under 1, we have

Figure 4: Absolute biases and variances increase while coverages decrease as the mean squared errors of Q (Q loss) becomes larger. This experiment uses 100 datasets with β t = 1 (with causal effect), β c = 50.0 (low confounding), and γ = 4.0 (high noise).

Politeness has a positive causal effect on response time. The table displays different CDE estimates and their 95% confidence intervals. The unadjusted one is the difference of sample means of treatment (polite) group and control group. The confidence interval of τTI only covers positive values, which means politeness can increase the probability of timely response.

The choice of nonparametric models for the TI-estimator is significant. Tables show average absolute bias and 95% confidence intervals' coverage of τTI with applying different nonparametric models in the second stage. The Gaussian process regression with the dot product+ white noise kernel has the best performance (lowest absolute bias and highest coverage proportion). The treatment level is equal to true CDE, which takes 1.0 (with causal effect) and 0.0 (without causal effect). Low and high noise level corresponds to γ = 1.0 and 4.0. Low and high confounding level corresponds to β c = 50.0 and 100.0.

The ATT AIPTW is consistently the best double machine learning estimator for this causal problem. Tables show average absolute bias and 95% confidence intervals' coverage of different causal estimations. ATT AIPTW τTI shows consistently the lowest absolute bias and highest coverage rate. For propensity score estimation, the Gaussian process regression with the dot product+ white noise kernel is applied for all estimators. The treatment level is equal to true CDE/true ATE, which takes 1.0 (with causal effect) and 0.0 (without causal effect). Low and high noise level corresponds to γ = 1.0 and 4.0. Low and high confounding level corresponds to β c = 50.0 and 100.0.

ACKNOWLEDGEMENT

Thanks to Alexander D'Amour for feedback on an earlier draft. We acknowledge the University of Chicago's Research Computing Center for providing computing resources. This work was partially supported by Open Philanthropy.

Q-Net

In the first stage, we estimate the conditional outcomes and hence obtain the estimated two-dimensional confounding vector η(X ). For concreteness, we will use the dragonnet architecture of Shi et al. (2019) . Specifically, we train DistilBERT (Sanh et al., 2019) modified to include three heads, as shown in Figure 3 . Two of the heads correspond to Q0 (X ) and Q1 (X ) respectively. As discussed in the Section 4.1, applying two heads can force the model to use the treatment A. The final head is a single linear layer predicting the treatment. This propensity score prediction head can help prevent (implicit) regularization of the model from throwing away X A∧Z information that is necessary for identification. The output of this head is not used for the estimation since its purpose is to force the DistilBERT representation to preserve all confounding information. This has been shown to improve causal estimation (Shi et al., 2019; Veitch et al., 2019) .We train the model by minimizing the objective functionwhere θ are the model parameters, α, β are hyperparameters and mlm (•) is the masked language modeling objective of DistilBERT. (Shi et al., 2019) for estimation of Q. Specifically, given representations λ(X ) from input text data, the Q-Net predicts Y for samples with A = 0 and A = 1 using two separate heads. A third head predicting A is also included for training, though the predictions are not used for estimation. Parameters in DistilBERT and three prediction heads are trained together in an end-to-end manner.There is a final nuance. In practice, we split the data into K-folds. For each fold j, we train a model Q-j on the other K -1 folds. Then, we make predictions for the data points in fold j using Q-j . Slightly abusing notation, we use Qa (x) to denote the predictions obtained in this manner.Propensity score estimation Next, we define η(x) := ( Q0 (x), Q1 (x)) and estimate the propensity score ĝη (x) ≈ P(A = 1 | η(x)). To do this, we fit a nonparametric estimator to the binary classification task of predicting A from η(X ) in a cross fitting or K-fold fashion. The important insight here is that since η(X ) is 2-dimensional, non-parametric estimation is possible at a fast rate. In Section 5, we try several methods and find that kernel regression usually works well.We also define g η (X ) := P(A = 1 | η(X )) as the idealized propensity score. The idea is that as η → η, we will also have ĝη → g η so long as we have a valid non-parametric estimate.

CDE estimation

The final stage is to combine the estimated outcome model and propensity score into a CDE estimator. To that end, we define the influence curve of τ CDE as follows:where p = P (A = 1). Then, the standard double machine learning estimator of τ CDE Chernozhukov et al. (2016), and the α-level confidence interval of this estimator, is given by τTIis the α/2-upper quantile of the standard normal, and ŝd(•) is the sample standard deviation.Validity We now have an estimation procedure. It remains to give conditions under which this procedure is valid. In particular, we require that it should yield a consistent estimate and asymptotically correct confidence intervals.Theorem 2. Assume the following.1. The mis-estimation of conditional outcomes can be bounded as follows(4.9)2. The propensity score function P(A = 1|•, •) is Lipschitz continuous on 2 , and ∃ ϵ > 0, P ϵ ≤ g η (X ) ≤ 1 -ϵ = 13. The propensity score estimate converges at least as quickly as k nearest neighbor; i.e., [ ĝη (X ) - Györfi et al. (2002) ;4. There exist positive constants C 1 , C 2 , c, and q > 2 such thatThen, the estimator τTI is consistent andwhereThe proof is provided in Appendix A.The key point from this theorem is that we get asymptotic normality at the (fast) nrate while requiring only a slow (n 1/4 ) convergence rate of Q. Intuitively, the reason is simply that, because η(X ) is only 2-dimensional, it is always possible to nonparametrically estimate the propensity score from η at a fast rate-even naive KNN works! Effectively, this means the rate at which we estimate the true propensity score g η (X ) = P(A = 1 | η(X )) is dominated by the rate at which we estimate η(X ), which is in turn determined by the rate for Q. Now, the key property of the double ML estimator is that convergence only depends on the product of the convergence rates of Q and ĝ. Accordingly, this procedure is robust in the sense that we only need to estimate Q at the square root of the rate we needed for the naive Q-only procedure. This is much more plausible in practice. As we will see in Section 5, the TI-estimator dramatically improves the quality of the estimated confidence intervals and reduces the absolute bias of estimation.Remark 3. In addition to robustness to noisy estimation of Q, there are some other advantages this estimation procedure inherits from the double ML estimator. If Q is consistent, then the estimator is nonparametrically efficient in the sense that no other non-parametric estimator has a smaller asymptotic variance. That is, the procedure using the data as efficiently as possible.

5. EXPERIMENTS

We empirically study the method's capability to provide accurate causal estimates with good uncertainty quantification Testing using semi-synthetic data (where ground truth causal effects are known), we find that the estimation procedure yields accurate causal estimates and confidence intervals. In particular, the TI-estimator has significantly lower absolute bias and vastly better uncertainty quantification than the Q-only method.Additionally, we study the effect of the choice of nonparametric propensity score estimator and the choice of double machine-learning estimator, and the method's robustness in regard to Q's miscalibration. These results are reported in Appendices C and D. Although these A PROOF OF ASYMPTOTIC NORMALITY Theorem 2. Assume the following.1. The mis-estimation of conditional outcomes can be bounded as follows 

al. (2002);

4. There exist positive constants C 1 , C 2 , c, and q > 2 such thatThen, the estimator τTI is consistent andProof. We first prove that misestimation of propensity score has the rate n -1 4 . For simplicity, we use f g , fg : (u, v) ∈ 2 → to denote conditional probability P(A = 1|u, v) = f g (u, v) and the estimated propensity function by running the nonparametric regression) and f g is Lipschitz continuous, we haveSince the true propensity function f g is Lipschitz continuous on 2 , the mean squared error rate of the k nearest neighbor is O(n -1/2 ) Györfi et al. (2002) . In addition, since the propensity score function and its estimation are bounded under 1, we have the following equationSince A = Ã, we can rewrite (3.1) by replacing A with Ã in the following form:2)The equivalence of the first and the second line is because X A∧Z , X Z block all backdoor paths between Ã and Y (See Figure 2 ) and 0 < P( Ã = 1| Q(0, X ), Q(1, X )) < 1. Thus, the "do-operation" in the first line can be safely removed. Equivalence of the second line and the third line is due to Q( Ã, X ) = Y | Ã, X A∧Z , X Z , which is subject to the causal model in Figure 2 . The last equation is based on the fact that η(X ) is a function of only X A∧Z and X Z . (It can be easily checked by using the definition of the expectation.) (B.2) shows that (Q(0, X ), Q(1, X )) is a two-dimensional confounding variable such that CDE is identifiable when we adjust for it as the confounding part.Note that if f and h are two invertible functions on , ( f (Q(0, X )), h(Q(1, X ))) also suffices the identification for CDE. Since the sigma algebra should be the same for (Q(0, X ), Q(1, X )) and f (Q(0, X )), h(Q(1, X )), i.e., σ (Q(0, X ), Q(1, X )) = σ ( f (Q(0, X )), h(Q(1, X ))) .Hence, we have

C ADDITIONAL EXPERIMENTS

We conduct additional experiments to show how the estimation of causal effect changes 1) over different nonparametric models for the propensity score estimation, and 2) when using different double machine learning estimators on causal estimation. Specifically, for the first study, we apply different nonparametric models and the logistic regression to the estimated confounding part η(X ) = Q0 (X ), Q1 (X ) to obtain propensity scores. We use ATT AIPTW in all above cases for causal effect estimation. For the second study, we fix the first two stages of the TI estimator, i.e. we apply Q-Net for the conditional outcomes and compute propensity scores with the Gaussian process regression where the kernel function is the summation of dot product and white noise. Estimated conditional outcomes and propensity scores are plugged into different double machine learning estimators. We make the following conclusions with results of above experiments.The choice of nonparametric models is significant. Table 3 summarizes results with applying different regression models for the propensity estimation. We can see that suitable nonparametric models will strongly increase the coverage proportion over true causal estimand. Therefore, we conclude that the accuracy in causal estimation is highly dependent on the choice of nonparametric models. In practice, when there is some prior information about the propensity score function, we should apply the most suitable nonparametric model to increase the reliability of our causal estimation.The ATT AIPTW is consistently the best double machine learning estimator. Table 4 shows results by applying different double machine learning estimators. We apply both estimators for the average treatment effect (ATE) and the controlled direct effect (CDE). The bias of "unadjusted" estimator τnaive is also included in Table 4 (a). For absolute bias,

