CAUSAL EXPLANATIONS OF STRUCTURAL CAUSAL MODELS Anonymous authors Paper under double-blind review

Abstract

In explanatory interactive learning (XIL) the user queries the learner, then the learner explains its answer to the user and finally the loop repeats. XIL is attractive for two reasons, (1) the learner becomes better and (2) the user's trust increases. For both reasons to hold, the learner's explanations must be useful to the user and the user must be allowed to ask useful questions. Ideally, both questions and explanations should be grounded in a causal model since they avoid spurious fallacies. Ultimately, we seem to seek a causal variant of XIL. The question part on the user's end we believe to be solved since the user's mental model can provide the causal model. But how would the learner provide causal explanations? In this work we show that existing explanation methods are not guaranteed to be causal even when provided with a Structural Causal Model (SCM). Specifically, we use the popular, proclaimed causal explanation method CXPlain to illustrate how the generated explanations leave open the question of truly causal explanations. Thus as a step towards causal XIL, we propose a solution to the lack of causal explanations. We solve this problem by deriving from first principles an explanation method that makes full use of a given SCM, which we refer to as SCE (E standing for explanation). Since SCEs make use of structural information, any causal graph learner can now provide human-readable explanations. We conduct several experiments including a user study with 22 participants to investigate the virtue of SCE as causal explanations of SCMs.

1. INTRODUCTION

There has been an exponential rise in the use of machine learning, especially deep learning in several real-world applications such as medical image analysis (Ker et al., 2017) , particle physics (Bourilkov, 2019) , drug discovery (Chen et al., 2018) and cybersecurity (Xin et al., 2018) to name a few. While there have been several arguments that claim deep models are interpretable, the practical reality is much to the contrary. The very reason for the extraordinary discriminatory power of deep models (namely, their depth) is also the reason for their lack of interpretability. To alleviate this shortcoming, interpretable and explainable AI/ML (Chen et al., 2019; Molnar, 2020) has gained traction to explain algorithm predictions and thereby increase the trust in the deployed models. However, providing explanations to increase user trust is only part of the problem. Ultimately, explanations or interpretations (however one defines these otherwise ill-posed terms) are a means for humans to understand something-in this case the deployed AI model. Therefore, a closed feedback loop between user and model is necessary for both boosting trust through understanding/transparency and improving models robustness by exposing and correcting their shortcomings. The new paradigm of XIL (Teso & Kersting, 2019) offers exactly the described where a model can be "right or wrong for the right or wrong reasons" and depending on the specific scenario the usermodel interaction will adapt (e.g. giving the right answer and a correction when being "wrong for the wrong reasons"). Now the question arises, what would constitute a good explanation inline with human reasoning? In their seminal book, Pearl & Mackenzie (2018) argue that causal reasoning is the most important factor for machines to achieve true human-level intelligence and ultimately constitutes the way humans reason. Several works in cognitive science are indeed in support of Pearl's counterfactual theory of causation as a great tool to capture important aspects of human reasoning (Gerstenberg et al., 2015; 2017) and thereby also how humans provide explanations (Lagnado et al., 2013) . The authors in Hofman et al. ( 2021) even argue that systems that are efficient in both causality and explanations are need of the hour. Questions of the form "What if?" and "Why?" have been shown to be used by children to learn and explore their external environment (Gopnik, 2012; Buchsbaum et al., 2012) and are essential for human survival (Byrne, 2016) . These humane forms of causal inferences are part of the human mental model which can be defined as the illustration of one's thought process regarding the understanding of world dynamics (see also discussions in Simon (1961) ; Nersessian (1992); Chakraborti et al. ( 2017)). All these views make understanding and reasoning about causality an inherently important problem and suggest that what we truly seek is a causal variant of XIL in which the explanations of the model are grounded in a causal model and the user is also allowed to give feedback about causal facts. While acknowledging the difficulty of the problem we address it pragmatically by leveraging qualitative (partial) knowledge on the SCM (Pearl, 2009) justifying the naming of our truly causal explanations (by construction) that we will be referring to as Structural Causal Explanation (SCE). The motivation behind this work is the observation that spurious associations in the training data inevitably leads to the failure of non-causal models and explanation methods. For example, an image classifier that learns on watermarked images will have high accuracy on the test data from the same distribution but it will be "right for the wrong reasons" and furthermore the "right" part is brittle as it will fail when moving out-of-distribution (Lapuschkin et al., 2019) . In psychology this phenomenon of spurious association fallacy is known as "Clever Hans" behavior named after the 20th century Orlov Trotter horse Hans that was wrongly believed to be able to perform arithmetic (Pfungst, 1911) . Some works such as (Stammer et al., 2021) moved beyond basic methods (like heat maps for image data) using a XIL setup to move beyond "Clever Hans" fallacies. Some other works purely on the explanation part made an effort in devising a "causal" explanation algorithm to avoid spuriousness (Schwab & Karlen, 2019) , however, as we show in this work they still leave open the question of truly causal explanations-a gap that we fit. We provide a new, natural language expressible (thus, human understandable) explanation algorithm with SCE. Overall, we make several contributions: (I) we devise from first principles a new algorithm (SCE) for computing explanations from SCM making them truly causal explanations by construction, (II) we showcase how SCE fixes several of the shortcomings of previous explainers, (III) we apply the SCE algorithm to several popular causal inference methods, (IV) we discuss using a synthetic toy data set how one could use SCE for improving model learning, and finally (V) we perform a survey with 22 participants to investigate the difference between user and algorithmic SCE. We make our code repository publically available at: https://anonymous.4open.science/r/ Structural-Causal-Explanations-D0E7/ 

2. BACKGROUND AND RELATED WORK

We briefly review key concepts from previous and related work to establish a high-level understanding of the basics needed for the discussion in this paper. Causality. Following the Pearlian notion of Causality (Pearl, 2009) , an SCM is defined as a 4tuple M := ⟨U, V, F, P (U)⟩ where the so-called structural equations (which are deterministic functions) v i ← f i (pa i , u i ) ∈ F assign values (denoted by lowercase letters) to the respective endogenous/system variables V i ∈ V based on the values of their parents Pa i ⊆ V \ V i and the values of some exogenous variables U i ⊆ U (sometimes also referred to as unmodelled or nature terms), and P (U) denotes the probability function defined over U. The SCM formalism comes with several interesting properties. They induce a causal graph G, they induce an observational/associational distribution over V (typical question "What is?", example "What does the symptoms tell us about the disease?"), and they can generate infinitely many interventional/hypothetical distributions (typical question "What if?", example "What if I take an aspirin, will my headache be cured?") and counterfactual/retrospective distributions (typical question "Why?", example "Was it the aspirin that cured my headache?") by using the do-operator which "overwrites" structural equations. Note that, opposed to the Markovian SCM discussed in for instance (Peters et al., 2017) , the definition of M is semi-Markovian thus allowing for shared U between the different V i . Such a shared U is also called hidden confounder since it is a common cause of at least two V i , V j (i ̸ = j). Opposite to that, a "common" confounder would be a common cause from within V. In the case of linear SCM, where the structural equations f i are linear in their arguments, we call the coefficients "dependency terms"

