CAUSAL EXPLANATIONS OF STRUCTURAL CAUSAL MODELS Anonymous authors Paper under double-blind review

Abstract

In explanatory interactive learning (XIL) the user queries the learner, then the learner explains its answer to the user and finally the loop repeats. XIL is attractive for two reasons, (1) the learner becomes better and (2) the user's trust increases. For both reasons to hold, the learner's explanations must be useful to the user and the user must be allowed to ask useful questions. Ideally, both questions and explanations should be grounded in a causal model since they avoid spurious fallacies. Ultimately, we seem to seek a causal variant of XIL. The question part on the user's end we believe to be solved since the user's mental model can provide the causal model. But how would the learner provide causal explanations? In this work we show that existing explanation methods are not guaranteed to be causal even when provided with a Structural Causal Model (SCM). Specifically, we use the popular, proclaimed causal explanation method CXPlain to illustrate how the generated explanations leave open the question of truly causal explanations. Thus as a step towards causal XIL, we propose a solution to the lack of causal explanations. We solve this problem by deriving from first principles an explanation method that makes full use of a given SCM, which we refer to as SCE (E standing for explanation). Since SCEs make use of structural information, any causal graph learner can now provide human-readable explanations. We conduct several experiments including a user study with 22 participants to investigate the virtue of SCE as causal explanations of SCMs.

1. INTRODUCTION

There has been an exponential rise in the use of machine learning, especially deep learning in several real-world applications such as medical image analysis (Ker et al., 2017) , particle physics (Bourilkov, 2019 ), drug discovery (Chen et al., 2018) and cybersecurity (Xin et al., 2018) to name a few. While there have been several arguments that claim deep models are interpretable, the practical reality is much to the contrary. The very reason for the extraordinary discriminatory power of deep models (namely, their depth) is also the reason for their lack of interpretability. To alleviate this shortcoming, interpretable and explainable AI/ML (Chen et al., 2019; Molnar, 2020) has gained traction to explain algorithm predictions and thereby increase the trust in the deployed models. However, providing explanations to increase user trust is only part of the problem. Ultimately, explanations or interpretations (however one defines these otherwise ill-posed terms) are a means for humans to understand something-in this case the deployed AI model. Therefore, a closed feedback loop between user and model is necessary for both boosting trust through understanding/transparency and improving models robustness by exposing and correcting their shortcomings. The new paradigm of XIL (Teso & Kersting, 2019) offers exactly the described where a model can be "right or wrong for the right or wrong reasons" and depending on the specific scenario the usermodel interaction will adapt (e.g. giving the right answer and a correction when being "wrong for the wrong reasons"). Now the question arises, what would constitute a good explanation inline with human reasoning? In their seminal book, Pearl & Mackenzie (2018) argue that causal reasoning is the most important factor for machines to achieve true human-level intelligence and ultimately constitutes the way humans reason. Several works in cognitive science are indeed in support of Pearl's counterfactual theory of causation as a great tool to capture important aspects of human reasoning (Gerstenberg et al., 2015; 2017) and thereby also how humans provide explanations (Lagnado et al., 2013) . The authors in 1

