

Abstract

Selective rationalizations improve the explainability of neural networks by selecting a subsequence of the input (i.e., rationales) to explain the prediction results. Although existing methods have achieved promising results, they still suffer from adopting the spurious correlations in data (aka., shortcuts) to compose rationales and make predictions. Inspired by the causal theory, in this paper, we develop an interventional rationalization (Inter-RAT) to discover the causal rationales. Specifically, we first analyse the causalities among the input, rationales and results with a structural causal model. Then, we discover spurious correlations between the input and rationales, and between rationales and results, respectively, by identifying the confounder in the causalities. Next, based on the backdoor adjustment, we propose a causal intervention method to remove the spurious correlations in input and rationales. Further, we discuss reasons why spurious correlations between the selected rationales and results exist by analysing the limitations of the sparsity constraint in the rationalization, and employ the causal intervention method to remove these correlations. Extensive experimental results on three real-world datasets clearly validate the effectiveness of our proposed method.

1. INTRODUCTION

The remarkable success of deep neural networks (DNNs) in natural language understanding tasks has prompted the interest in how to explain the results of DNNs. Among them, the selective rationalization task Lei et al. (2016); Yu et al. (2019; 2021) has received increasing attention, answering the question "What feature has a significant impact on the prediction results of the model?". Specifically, the goal of selective rationalization is to extract a small subset of the input (i.e., rationale) to support and explain the prediction results when yielding them. Existing methods often generate rationales with a conventional framework consisting of a selector (aka., rationale generator) and a predictor Lei et al. (2016) . As shown in Figure 1 , giving the input X, the selector and the predictor generate rationales R and prediction results Y cooperatively (i.e., P (Y |X) = P (Y |R)P (R|X)). Among them, the selector (P (R|X)) first extracts a subsequence of the input. Then, the predictor (P (Y |R)) yields results based only on the selected tokens, and the selected subsequence is defined as the rationale. Despite the appeal of the rationalization methods, the current implementation is prone to exploit spurious correlations (aka., shortcuts) between the input and labels to yield the prediction results and select the rationales Chang et al. ( 2020); Wu et al. (2022) . We illustrate this problem with an example of the charge predictionfoot_0 . Considering Figure 1 , although this case is corresponding to the Manslaughter, a DNNs model readily predicts the charge as Intentional homicide. Specifically, as Intentional homicide occurs more frequently than Manslaughterfoot_1 and is often accompanied by tokens denoting violence and death, DNNs do not need to learn the real correlations between the case facts and the charge to yield the result. Instead, it is much easier to exploit spurious correlations in data to achieve high accuracy (i.e., predicting the charge as Intentional homicide directly when identifying the tokens about violence and death.). As a result, when facing the cases such as the example in Figure 1 , the effectiveness of such DNNs tends to degrade (e.g., the underlined tokens in Figure 1 denoting the offence is negligent will be ignored in rationales extraction and the charge will be misjudged.). Therefore, these types DNNs depending on spurious correlation in data fail to reveal truly critical subsequence for predicting labels.



Charge prediction: predicting the charge such as Robbery and Theft based on the case fact. Detailed definition of charge prediction is described in section 4.3. https://wenshu.court.gov.cn 1

