HARD MASKING FOR EXPLAINING GRAPH NEURAL NETWORKS

Abstract

Graph Neural Networks (GNNs) are a flexible and powerful family of models that build nodes' representations on irregular graph-structured data. This paper focuses on explaining or interpreting the rationale underlying a given prediction of already trained graph neural networks for the node classification task. Existing approaches for interpreting GNNs try to find subsets of important features and nodes by learning a continuous mask. Our objective is to find discrete masks that are arguably more interpretable while minimizing the expected deviation from the underlying model's prediction. We empirically show that our explanations are both more predictive and sparse. Additionally, we find that multiple diverse explanations are possible, which sufficiently explain a prediction. Finally, we analyze the explanations to find the effect of network homophily on the decisionmaking process of GNNs.

1. INTRODUCTION

Graph Neural Networks (GNNs) are a flexible and powerful family of models that build representations of nodes or edges on irregular graph-structured data and have experienced significant attention in recent years. These methods are based on the so-called "neighborhood aggregation" scheme in which a node representation is learned by aggregation of features from their neighbors and have achieved state-of-the-art performance on node and graph classification tasks. Despite their popularity, approaches investigating their interpretability have received limited attention. This paper focuses on explaining or interpreting the rationale underlying a given prediction of already trained graph neural networks. There have been numerous approaches proposed in the literature for the general interpretability of machine learning models. The most popular approaches are feature attribution methods that intend to attribute importance to input features given an input prediction either agnostic to the model parameter (Ribeiro et al., 2018; 2016) or using model-specific attribution approaches (Xu et al., 2015; Binder et al., 2016; Sundararajan et al., 2017) . However, models learned over graph-structured data have some unique challenges. Specifically, predictions on graphs are induced by a complex combination of nodes and paths of edges between them in addition to the node features. Thus explanations for a prediction should ideally be a small subgraph of the input graph and a small subset of node features that are most influential for the prediction (Ying et al., 2019) . The only existing approach for GNN explainability proposes to learn a real-valued graph mask that selects the important subgraph of the GNNs computation graph to maximize the mutual information with the GNNs prediction (Ying et al., 2019) . We identify two crucial limitations of such an approach. Firstly, although mathematically tractable, a continuous mask does not ensure sparsity compared to a discrete mask -a desirable property for interpretability. Secondly, suitable notions of what constitutes an explanation in a GNN model and its evaluation are missing. This paper proposes an alternate notion of interpretability for GNNs grounded in ideas from data compression in information theory. Specifically, we consider an explanation as a compressed form of the original feature matrix. The goodness of the explanation is measured by the expected deviation from the prediction of the underlying model. We formalize this idea of interpreting GNN decisions as an explicit optimization problem in a rate-distortion framework. A subgraph of the node's computational graph and its set of features are relevant for a classification decision if the expected classifier score remains nearly the same when randomizing the remaining features. This

