MOTIFEXPLAINER: A MOTIF-BASED GRAPH NEURAL NETWORK EXPLAINER

Abstract

We consider the explanation problem of Graph Neural Networks (GNNs). Most existing GNN explanation methods identify the most important edges or nodes but fail to consider substructures, which are more important for graph data. One method considering subgraphs tries to search all possible subgraphs and identifies the most significant ones. However, the subgraphs identified may not be recurrent or statistically important for interpretation. This work proposes a novel method, named MotifExplainer, to explain GNNs by identifying important motifs, which are recurrent and statistically significant patterns in graphs. Our proposed motif-based methods can provide better human-understandable explanations than methods based on nodes, edges, and regular subgraphs. Given an instance graph and a pre-trained GNN model, our method first extracts motifs in the graph using domain-specific motif extraction rules. Then, a motif embedding is encoded by feeding motifs into the pre-trained GNN. Finally, we employ an attention-based method to identify the most influential motifs as explanations for the prediction results. The empirical studies on both synthetic and real-world datasets demonstrate the effectiveness of our method.

1. INTRODUCTION

Graph neural networks (GNNs) have shown capability in solving various challenging tasks in graph fields, such as node classification, graph classification, and link prediction. Although many GNNs models (Kipf & Welling, 2016; Gao et al., 2018; Xu et al., 2018; Gao & Ji, 2019; Liu et al., 2020) have achieved state-of-the-art performances in various tasks, they are still considered black boxes and lack sufficient knowledge to explain them. Inadequate interpretation of GNN decisions severely hinders the applicability of these models in critical decision-making contexts where both predictive performance and interpretability are critical. A good explainer allows us to debate GNN decisions and shows where algorithmic decisions may be biased or discriminated against. In addition, we can apply precise explanations to other scientific research like fragment generation. A fragment library is a key component in drug discovery, and accurate explanations may help its generation. Several methods have been proposed to explain GNNs, divided into instance-level explainers and model-level explainers. Most existing instance-level explainers such as GNNExplainer (Ying et al., 2019 ), PGExplainer (Luo et al., 2020 ), Gem (Lin et al., 2021 ), and ReFine (Wang et al., 2021) produce an explanation to every graph instance. These methods explain pre-trained GNNs by identifying important edges or nodes but fail to consider substructures, which are more important for graph data. The only method that considers subgraphs is SubgraphX (Yuan et al., 2021) , which searches all possible subgraphs and identifies the most significant one. However, the subgraphs identified may not be recurrent or statistically important, which raises an issue on the application of the produced explanations. For example, fragment-based drug discovery (FBDD) (Erlanson et al., 2004) has been proven to be powerful for developing potent small-molecule compounds. FBDD is based on fragment libraries, containing fragments or motifs identified as relevant to the target property by domain experts. Using a motif-based GNN explainer, we can directly identify relevant fragments or motifs that are ready to be used when generating drug-like lead compounds in FBDD. In addition, searching and scoring all possible subgraphs is time-consuming and inefficient. We claim that using motifs, recurrent and statistically important subgraphs, to explain GNNs can provide a more intuitive explanation than methods based on nodes, edges, or subgraphs. This work proposes a novel GNN explanation method named MotifExplainer, which can identify significant motifs to explain an instance graph. In particular, our method first extracts motifs from a given graph using domain-specific motif extraction rules based on domain knowledge. Then, motif embeddings of extracted motifs are generated by feeding motifs into the target GNN model. After that, an attention model is employed to select relevant motifs based on attention weights. These selected motifs are used as an explanation for the target GNN model on the instance graph. To our knowledge, the proposed method represents the first attempt to apply the attention mechanism to explain the GNN from the motif-level perspective. We evaluate our method using both qualitative and quantitative experiments. The experiments show that our MotifExplainer can generate a better explanation than previous GNN explainers. In addition, the efficiency studies demonstrate the efficiency advantage of our methods in terms of a much shorter training and inference time.

2. PROBLEM FORMULATION

This section formulates the problem of explanations on graph neural networks. Let G i = {V, E} ∈ G = {G 1 , G 2 , ..., G i , ..., G N } denotes a graph where V = {v 1 , v 2 , ..., v i , ...v n } is the node set of the graph and E is the edge set. G i is associated with a d-dimensional set of node features X = {x 1 , x 2 , ..., x i , ..., x n }, where x i ∈ R d is the feature vector of node v i . Without loss of generality, we consider the problem of explaining a GNN-based downstream classification task. For a node classification task, we associate each node v i of a graph G with a label y i , where y i ∈ Y = {l 1 , ..., l c } and c is the number of classes. For a graph classification task, each graph G i is assigned a corresponding label.

2.1. BACKGROUND ON GRAPH NEURAL NETWORKS

Most Graph Neural Networks (GNNs) follow a neighborhood aggregation learning scheme. In a layer ℓ, GNNs contain three steps. First, a GNN first calculates the messages that will be transferred between every node pair. A message for a node pair (v i , v j ) can be represented by a function θ(•) : b ℓ ij = θ(x ℓ-1 i , x ℓ-1 j , e ij ) , where e ij is the edge feature vector, x ℓ-1 i and x ℓ-1 j are the node features of v i and v j at the previous layer, respectively. Second, for each node v i , GNN aggregates all messages from its neighborhood N i using an aggregation function ϕ(•) : B ℓ i = ϕ {b ℓ ij |v j ∈ N i } . Finally, the GNN combine the aggregated message B ℓ i with node v i 's feature representation from previous layer x ℓ-1 i , and use a non-linear activation function to obtain the representation for node v i at layer l : x ℓ i = f (x ℓ-1 i , B ℓ i ). Formally, a ℓ-th GNN layer can be represented by x ℓ i = f x ℓ-1 i , ϕ θ x l-1 i , x l-1 j , e ij | v j ∈ N i } .

2.2. GRAPH NEURAL NETWORK EXPLANATIONS

In a GNN explanation task, we are given a pre-trained GNN model, which can be represented by Ψ(•) and its corresponding dataset D. The task is to obtain an explanation model Φ(•) that can provide a fast and accurate explanation for the given GNN model. Most existing GNN explanation approaches can be categorized into two branches: instance-level methods and model-level methods. Instance-level methods can provide an explanation for each input graph, while model-level methods are input-independent and analyze graph patterns without input data. Following previous works (Luo et al., 2020; Yuan et al., 2021; Lin et al., 2021; Wang et al., 2021; Bajaj et al., 2021) , we focus on instance-level methods with explanations using graph sub-structures. Also, our approach is modelagnostic. In particular, given an input graph, our explanation model can generate a subgraph that is the most important to the outcomes of a pre-trained GNN on any downstream graph-related task, such as graph classification tasks.

3. MOTIF-BASED GRAPH NEURAL NETWORK EXPLAINER

Most existing GNN explainers (Ying et al., 2019; Luo et al., 2020) identify the most important nodes or edges. SubgraphX (Yuan et al., 2021) is the first work that proposed a method to explain GNN models by generating the most significant subgraph for an input graph. However, the subgraphs

