

Abstract

Selective rationalizations improve the explainability of neural networks by selecting a subsequence of the input (i.e., rationales) to explain the prediction results. Although existing methods have achieved promising results, they still suffer from adopting the spurious correlations in data (aka., shortcuts) to compose rationales and make predictions. Inspired by the causal theory, in this paper, we develop an interventional rationalization (Inter-RAT) to discover the causal rationales. Specifically, we first analyse the causalities among the input, rationales and results with a structural causal model. Then, we discover spurious correlations between the input and rationales, and between rationales and results, respectively, by identifying the confounder in the causalities. Next, based on the backdoor adjustment, we propose a causal intervention method to remove the spurious correlations in input and rationales. Further, we discuss reasons why spurious correlations between the selected rationales and results exist by analysing the limitations of the sparsity constraint in the rationalization, and employ the causal intervention method to remove these correlations. Extensive experimental results on three real-world datasets clearly validate the effectiveness of our proposed method. Under review as a conference paper at ICLR 2023 selector predictor Manslaughter The defendant and the victim fought over a trivial matter the defendant punched the victim in the face with the fist, causing the victim to fall and hit his head on the ground resulting in serious injuries. The defendant immediately resuscitated the victim but he died after being sent to hospital...... The defendant and the victim fought over a trivial matter, the defendant punched the victim in the face with the fist, causing the victim to fall and hit his head on the ground resulting in serious injuries. The defendant immediately resuscitated the victim but he died after being sent to hospital......

1. INTRODUCTION

The remarkable success of deep neural networks (DNNs) in natural language understanding tasks has prompted the interest in how to explain the results of DNNs. Among them, the selective rationalization task Lei et al. (2016) ; Yu et al. (2019; 2021) has received increasing attention, answering the question "What feature has a significant impact on the prediction results of the model?". Specifically, the goal of selective rationalization is to extract a small subset of the input (i.e., rationale) to support and explain the prediction results when yielding them. Existing methods often generate rationales with a conventional framework consisting of a selector (aka., rationale generator) and a predictor Lei et al. (2016) . As shown in Figure 1 , giving the input X, the selector and the predictor generate rationales R and prediction results Y cooperatively (i.e., P (Y |X) = P (Y |R)P (R|X)). Among them, the selector (P (R|X)) first extracts a subsequence of the input. Then, the predictor (P (Y |R)) yields results based only on the selected tokens, and the selected subsequence is defined as the rationale. Despite the appeal of the rationalization methods, the current implementation is prone to exploit spurious correlations (aka., shortcuts) between the input and labels to yield the prediction results and select the rationales Chang et al. (2020); Wu et al. (2022) . We illustrate this problem with an example of the charge predictionfoot_0 . Considering Figure 1 , although this case is corresponding to the Manslaughter, a DNNs model readily predicts the charge as Intentional homicide. Specifically, as Intentional homicide occurs more frequently than Manslaughterfoot_1 and is often accompanied by tokens denoting violence and death, DNNs do not need to learn the real correlations between the case facts and the charge to yield the result. Instead, it is much easier to exploit spurious correlations in data to achieve high accuracy (i.e., predicting the charge as Intentional homicide directly when identifying the tokens about violence and death.). As a result, when facing the cases such as the example in Figure 1 , the effectiveness of such DNNs tends to degrade (e.g., the underlined tokens in Figure 1 denoting the offence is negligent will be ignored in rationales extraction and the charge will be misjudged.). Therefore, these types DNNs depending on spurious correlation in data fail to reveal truly critical subsequence for predicting labels. To solve that, Chang et al. ( 2020) propose an environment-invariant method (INVRAT) to discover the causal rationales. They argue that the causal rationales should remain stable as the environment shifts, while the spurious correlation between input and labels vary. Although this method performs well in selecting rationales, since the environment in rationalization is hard to observe and obtain, we argue that this "causal pattern" can be further explored to improve the rationalization. 2 (a). Then, we identify the confounder C in this SCM, which opens two backdoor paths X ← C → R and R ← C → Y , making X and R, R and Y spuriously correlated. Next, we address the above correlations, respectively. For spurious correlations between X and R, we assume the confounder is observed and intervene the X (i.e., calculating P (R|do(X)) instead of P (R|X)) to block the backdoor path and remove the spurious correlations based on the backdoor adjustment Glymour et al. ( 2016). Among them, the do-operation denotes the pursuit of real causality from X to R. For spurious correlations in R and Y , since by the definition of R (rationales are the only basis for yields prediction results), we argue that there should be no spurious correlations between R and Y . However, in practice, we discover the sparsity constraint commonly defined in rationalization Lei et al. ( 2016 

2. THE CONVENTIONAL FRAMEWORK OF RATIONALIZATION

This section formally defines the problem of rationalization, and then presents the details about the conventional rationalization framework consisting of the selector and predictor, where these two components are trained cooperatively to generate rationales and yield the prediction results.

2.1. PROBLEM FORMULATION

Considering a text classification task, only the text input X = {x 1 , x 2 , . . . , x n }, where x i represents the i-th token, and the discrete ground truth Y are observed during training, while the rationale R is unavailable. The goal of selective rationalization is first adopting the selector to learn a binary mask variable M = {m 1 , m 2 , . . . , m n }, where m j ∈ {0, 1}, and further select a subsequence of input R = M X = {m 1 • x 1 , m 2 • x 2 , . . . , m n • x n }, and then employing the predictor to re-recode the mask input R to yield the results. Finally, the whole process of rationalization is defined as: (1)



Charge prediction: predicting the charge such as Robbery and Theft based on the case fact. Detailed definition of charge prediction is described in section 4.3. https://wenshu.court.gov.cn The definition of token F1-score can be found in section 4.1.1.



Figure 1: Conventional framework of rationalization presented in this paper. In the charge prediction, the input X represents the case fact and the result Y denotes the charge.

); Cao et al. (2020); Chang et al. (2020); Yu et al. (2019), ensuring the selector to extract short rationales, results in the spurious correlations between R and Y . Therefore, we further analyse this discovery and employ the causal intervention to remove these correlations. Our experiments are conducted on a multi-aspect sentiment analysis dataset BeerAdvocate McAuley et al., a movie reviews prediction dataset MovieReview Zaidan & Eisner (2008) and a legal judgment prediction dataset CAIL Xiao et al. (2018). The experimental results validate the effectiveness of removing spurious correlation with causal interventions, where our proposed approach gains an average improvement of 8.6 token F1-score 3 over the INVRAT baseline on BeerAdvocate, 7.4 token F1-score on MovieReview, and 4.3 F1-score on CAIL.

(Y |X) = P (Y |R) predictor P (R|X) selector .

Along this research line, in this paper, we propose an interventional rationalization (Inter-RAT) method which removes the spurious correlation by the causal interventionGlymour et al. (2016).

