CAUSAL SCREENING TO INTERPRET GRAPH NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

With the growing success of graph neural networks (GNNs), the explainability of GNN is attracting considerable attention. However, current works on feature attribution, which frame explanation generation as attributing a prediction to the graph features, mostly focus on the statistical interpretability. They may struggle to distinguish causal and noncausal effects of features, and quantify redundancy among features, thus resulting in unsatisfactory explanations. In this work, we focus on the causal interpretability in GNNs and propose a method, Causal Screening, from the perspective of cause-effect. It incrementally selects a graph feature (i.e., edge) with large causal attribution, which is formulated as the individual causal effect on the model outcome. As a model-agnostic tool, Causal Screening can be used to generate faithful and concise explanations for any GNN model. Further, by conducting extensive experiments on three graph classification datasets, we observe that Causal Screening achieves significant improvements over state-of-the-art approaches w.r.t. two quantitative metrics: predictive accuracy, contrastivity, and safely passes sanity checks.

1. INTRODUCTION

Graph neural networks (GNNs) (Gilmer et al., 2017; Hamilton et al., 2017; Velickovic et al., 2018; Dwivedi et al., 2020) have exhibited impressive performance in a wide range of tasks. Such a success comes from the powerful representation learning, which incorporates the graph structure with node and edge features in an end-to-end fashion. With the growing interest in GNNs, the explainability of GNN is attracting considerable attention. A prevalent technique is to offer post-hoc explanations via feature attribution. The attributions in GNNs are typically defined as the contributions of input features (e.g., nodes and edges) to the model's outcome; thereafter, by selecting the most important features with top attributions, an explanatory subgraph is constructed to answer "Why this GNN model makes such predictions?". In this line, current works roughly fall into two categories: (1) decomposing the outcome prediction to graph structures via backpropagating the gradient-like signals (Pope et al., 2019; Baldassarre & Azizpour, 2019) ; and (2) approximating the decision boundary via structure perturbations (Huang et al., 2020) or structure masking (Ying et al., 2019) . However, these works mostly focus on the statistical interpretability (Pearl, 2018; Moraffah et al., 2020) , which could fail to uncover the causation of model predictions reliably. The key reason is that, they approach the input-outcome relationships from an associational standpoint, without distinguishing between causal and noncausal effects. Using correlation as causation to interpret feature importance will result in unfaithful explanations, as shown in the running example. Running Example. Consider the example in Figure 1 , where SA (Baldassarre & Azizpour, 2019) and GNNExplainer (Ying et al., 2019) use gradients and masks as attributions respectively, to explain why the scene type of a scene graph is predicted as Surfing by APPNP (Klicpera et al., 2019) . Two limitations in the statistical interpretability are: (1) Confounding association. The edges with large gradient (e.g., (shorts, on, man)) or masks (e.g., (man, has, hand)) are highly correlated with the prediction, rather than causing it (Moraffah et al., 2020) . Such confounding associations distort the estimation of the causation (e.g., (standing, on, surfboard)); (2) Redundancy. As the graph structure is highly entangled with GNNs, the gradient-like signals of edges are influenced, even scaled, by the connected edges. This makes redundant edges (e.g., (man, on ocean) and (man, riding, waves)) involved in the top explanations, while forgoing other edges with unique information. In this work, we focus on the causal interpretability (Pearl, 2018; Moraffah et al., 2020) in GNNs. From a cause-effect standpoint, we need to answer causality-related questions like "Was it a specific edge that caused the GNN's prediction?". Technically, explanatory subgraphs of interpretable features should account for two desirable characteristics: (1) Causality association. We need to identify important features, which may plausibly be causal determinants of the model outcome, away from these associated with the outcome due to confounding. As Figure 1 shows, (standing, on, surfboard) should be ranked at the top, instead of (shorts, on, man). (2) Briefness. We need brief explanations to avoid redundancy. For example, (man, on, surfboard) is informative but redundant since it can be replaced by the top selection (standing, on, surfboard). Hence, it is important to consider the dependency among edges. Although causality has been considered in very recent works of other domains (Moraffah et al., 2020) , they have yet to be applied in GNNs and fall short in the ability to explain GNNs. See Section 2 for an exhaustive list and full discussion. To the best of our knowledge, causal interpretability is yet unexplored in GNNs. Here we propose a novel method, Causal Screening, to take a cause-effect look at feature attribution. It takes a graph of interest, along with the prediction made by a trained GNN model, and aims to return an explanatory subgraph that plausibly has the largest causal attribution on the prediction. Towards this end, Causal Screening starts from an empty set as the explanatory subgraph, and adopts the screening strategy to incrementally select edges into the subgraph, one edge at a time step. At its core is to quantify the causal attribution of an edge candidate as its individual causal effect (ICE) (Pearl, 2009) , answering "What would happen to the prediction, if we add this edge into the GNN's input?". It coincides with maximizing the information-theoretic measure at each step -conditional mutual information (Ay & Polani, 2008; Janzing et al., 2013; O'Shaughnessy et al., 2020) between the edge and the prediction, conditioned on the previously selected edges. Specifically, at each step, ICE is formulated as the difference between two outcomes, where the edge had received treatment (i.e., combining it with the previous selection as the GNN input) or control (i.e., feeding the previous selection alone into the GNN). Last but not least, we propose an efficient version which considers the cause-effect of edge groups, rather than single edges, in order to speedup the exhaustive search. We apply Causal Screening to multiple graph classification datasets, generating qualitative results showcasing the effectiveness of our explanation subgraphs, which are more consistent, concise, and faithful to the predictions compared with existing methods. Contributions of this study can be summarized as: • We propose a model-agnostic method, Causal Screening, to provide a cause-effect perspective for explaining GNN models, i.e., uncovering causal relationships between graph features and model predictions. • We conduct extensive experiments on three datasets, showcasing the effectiveness of our method w.r.t. predictive accuracy, contrastivity, and sanity checks. 



Figure 1: An example of explaining the scene graph classification. (a) An input image with bounding boxes; (b) The scene graph predicted as Surfing; (c) Explanations of SA, GNNExplainer, and our Causal Screening.

Interpretability in Non-Graph Neural Networks. We focus preliminarily on feature attribution methods that generate post-hoc explanations for neural networks, especially convolutional neural networks (CNNs). We roughly categorize current works into two groups: (1) Studies decompose the model prediction to the input features via backpropagating the gradient-like signals. Early works like Gradient(Simonyan et al., 2014)  and Gradient*Input(Shrikumar et al., 2016)  directly use gradients w.r.t. inputs as feature importance. Some follow-on studies, such as LRP (Bach

