A DIFFERENTIAL GEOMETRIC VIEW AND EXPLAIN-ABILITY OF GNN ON EVOLVING GRAPHS

Abstract

Graphs are ubiquitous in social networks and biochemistry, where Graph Neural Networks (GNN) are the state-of-the-art models for prediction. Graphs can be evolving and it is vital to formally model and understand how a trained GNN responds to graph evolution. We propose a smooth parameterization of the GNN predicted distributions using axiomatic attribution, where the distributions are on a low-dimensional manifold within a high-dimensional embedding space. We exploit the differential geometric viewpoint to model distributional evolution as smooth curves on the manifold. We reparameterize families of curves on the manifold and design a convex optimization problem to find a unique curve that concisely approximates the distributional evolution for human interpretation. Extensive experiments on node classification, link prediction, and graph classification tasks with evolving graphs demonstrate the better sparsity, faithfulness, and intuitiveness of the proposed method over the state-of-the-art methods.

1. INTRODUCTION

Graph neural networks (GNN) are now the state-of-the-art method for graph representation in many applications, such as social network modeling Kipf & Welling (2017) (2021) . For example, a GNN's prediction of whether a chemical compound is promising for a target disease during compound design can change as the compound is fine-tuned, and it is useful for the designers to understand how the GNN's prediction evolves with respect to compound perturbations. To model graph evolution, existing work Leskovec et al. (2007; 2008) 



and molecule property prediction Wu et al. (2018), pose estimation in computer vision Yang et al. (2021), smart cities Ye et al. (2020), fraud detection Wang et al. (2019), and recommendation systems Ying et al. (2018). A GNN outputs a probability distribution Pr(Y |G; θ) of Y , the class random variable of a node (node classification), a link (link prediction), or a graph (graph classification), using trained parameters θ. Graphs can be evolving, with edges/nodes added and removed. For example, social networks are undergoing constant updates Xu et al. (2020a); graphs representing chemical compounds are constantly tweaked and tested during molecule design. In a sequence of graph snapshots, without loss of generality, let G 0 → G 1 be any two snapshots where the source graph G 0 evolves to the destination graph G 1 . Pr(Y |G 0 ; θ) will evolve to Pr(Y |G 1 ; θ) accordingly, and we aim to model and explain the evolution of Pr(Y |G; θ) with respect to G 0 → G 1 to help humans understand the evolution Ying et al. (2019); Schnake et al. (2020); Pope et al. (2019); Ren et al. (2021); Liu et al.

analyzed the macroscopic change in graph properties, such as graph diameter, density, and power law, but did not analyze how a parametric model responses to graph evolution. Recent work Kumar et al. (2019); Rossi et al. (2020); Kazemi et al. (2020); Xu et al. (2020b;a) investigated learning a model for each graph snapshot and thus the model is evolving, while we focus on modeling a fixed GNN model over evolving graphs. A more fundamental drawback of the above work is the discrete viewpoint of graph evolution, as individual edges and nodes are added or deleted. Such discrete modeling fails to describe the corresponding change in Pr(Y |G; θ), which is generated by a computation graph that can be perturbed with infinitesimal amount and can be understood as a sufficiently smooth function. The smoothness can help identify subtle infinitesimal changes contributing significantly to change in Pr(Y |G; θ), and thus more faithfully explain the change.

Figure 1: G0 at time time s = 0 is updated to G1 at time s = 1 after the edge (J, K) is added, and the predicted class distribution (Pr(Y |G0)) of node J changes accordingly. The contributions of each path p on a computation graph to Pr(Y = j|G) for class j give the coordinates of Pr(Y |G) in a high-dimensional Euclidean space, with axes indexed by (p, j). Pr(Y |G) varies smoothly on a low dimensional manifold, where multiple curves γ(s) can explain the evolution from Pr(Y |G0) to Pr(Y |G1) at very fine-grained. We select a γ(s) that use a sparse set of axes for explaining the prediction evolution. Edge deletion, mixture of addition and deletion, link prediction, and graph classification are handled similarly.

