CAUSAL SCREENING TO INTERPRET GRAPH NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

With the growing success of graph neural networks (GNNs), the explainability of GNN is attracting considerable attention. However, current works on feature attribution, which frame explanation generation as attributing a prediction to the graph features, mostly focus on the statistical interpretability. They may struggle to distinguish causal and noncausal effects of features, and quantify redundancy among features, thus resulting in unsatisfactory explanations. In this work, we focus on the causal interpretability in GNNs and propose a method, Causal Screening, from the perspective of cause-effect. It incrementally selects a graph feature (i.e., edge) with large causal attribution, which is formulated as the individual causal effect on the model outcome. As a model-agnostic tool, Causal Screening can be used to generate faithful and concise explanations for any GNN model. Further, by conducting extensive experiments on three graph classification datasets, we observe that Causal Screening achieves significant improvements over state-of-the-art approaches w.r.t. two quantitative metrics: predictive accuracy, contrastivity, and safely passes sanity checks.

1. INTRODUCTION

Graph neural networks (GNNs) (Gilmer et al., 2017; Hamilton et al., 2017; Velickovic et al., 2018; Dwivedi et al., 2020) have exhibited impressive performance in a wide range of tasks. Such a success comes from the powerful representation learning, which incorporates the graph structure with node and edge features in an end-to-end fashion. With the growing interest in GNNs, the explainability of GNN is attracting considerable attention. A prevalent technique is to offer post-hoc explanations via feature attribution. The attributions in GNNs are typically defined as the contributions of input features (e.g., nodes and edges) to the model's outcome; thereafter, by selecting the most important features with top attributions, an explanatory subgraph is constructed to answer "Why this GNN model makes such predictions?". In this line, current works roughly fall into two categories: (1) decomposing the outcome prediction to graph structures via backpropagating the gradient-like signals (Pope et al., 2019; Baldassarre & Azizpour, 2019) ; and (2) approximating the decision boundary via structure perturbations (Huang et al., 2020) or structure masking (Ying et al., 2019) . However, these works mostly focus on the statistical interpretability (Pearl, 2018; Moraffah et al., 2020) , which could fail to uncover the causation of model predictions reliably. The key reason is that, they approach the input-outcome relationships from an associational standpoint, without distinguishing between causal and noncausal effects. Using correlation as causation to interpret feature importance will result in unfaithful explanations, as shown in the running example. Running Example. Consider the example in Figure 1 , where SA (Baldassarre & Azizpour, 2019) and GNNExplainer (Ying et al., 2019) use gradients and masks as attributions respectively, to explain why the scene type of a scene graph is predicted as Surfing by APPNP (Klicpera et al., 2019) . Two limitations in the statistical interpretability are: (1) Confounding association. The edges with large gradient (e.g., (shorts, on, man)) or masks (e.g., (man, has, hand)) are highly correlated with the prediction, rather than causing it (Moraffah et al., 2020) . Such confounding associations distort the estimation of the causation (e.g., (standing, on, surfboard)); (2) Redundancy. As the graph structure is highly entangled with GNNs, the gradient-like signals of edges are influenced, even scaled, by the connected edges. This makes redundant edges (e.g., (man, on ocean) and (man, riding, waves)) involved in the top explanations, while forgoing other edges with unique information.

