CHARACTERIZING THE INFLUENCE OF GRAPH ELE-MENTS

Abstract

Influence function, a method from robust statistics, measures the changes of model parameters or some functions about model parameters concerning the removal or modification of training instances. It is an efficient and useful post-hoc method for studying the interpretability of machine learning models without the need for expensive model re-training. Recently, graph convolution networks (GCNs), which operate on graph data, have attracted a great deal of attention. However, there is no preceding research on the influence functions of GCNs to shed light on the effects of removing training nodes/edges from an input graph. Since the nodes/edges in a graph are interdependent in GCNs, it is challenging to derive influence functions for GCNs. To fill this gap, we started with the simple graph convolution (SGC) model that operates on an attributed graph and formulated an influence function to approximate the changes of model parameters when a node or an edge is removed from an attributed graph. Moreover, we theoretically analyzed the error bound of the estimated influence of removing an edge. We experimentally validated the accuracy and effectiveness of our influence estimation function. In addition, we showed that the influence function of a SGC model could be used to estimate the impact of removing training nodes/edges on the test performance of the SGC without re-training the model. Finally, we demonstrated how to use influence functions to guide the adversarial attacks on GCNs effectively.

1. INTRODUCTION

Graph data is pervasive in real-world applications, such as, online recommendations (Shalaby et al., 2017; Huang et al., 2021; Li et al., 2021) , drug discovery (Takigawa & Mamitsuka, 2013; Li et al., 2017) , and knowledge management (Rizun, 2019; Wang et al., 2018) , to name a few. The growing need to analyze huge amounts of graph data has inspired work that combines Graph Neural Networks with deep learning (Gori et al., 2005; Scarselli et al., 2005; Li et al., 2016; Hamilton et al., 2017; Xu et al., 2019b; Jiang et al., 2019) . Graph Convolutional Networks (GCNs) (Kipf & Welling, 2017; Zhang & Chen, 2018; Fan et al., 2019) , the most cited GNN architecture, adopts convolution and message-passing mechanisms. To better understand GCNs from a data-centric perspective, we consider the following question: Without model retraining, how can we estimate the changes of parameters in GCNs when the graph used for learning is perturbed by edge-or node-removals? This question proposes to estimate counterfactual effects on the parameters of a well-trained model when there is a manipulation in the basic elements in a graph, where the ground truth of such an effect should be obtained from model retraining. With a computational tool as the answer, we can efficiently manipulate edges or nodes in a graph to control the change of model parameters of trained GCNs. The solution would provide further extensions like graph data rectification, improving model generalization, and graph data poison attacks through a pure data modeling way. Yet, current methods for training GCNs offer limited interpretability of the interactions between the training graph and the GCN model. More specifically, we fall short of understanding the influence of the input graph elements on both the changes in model parameters and the generalizability of a trained model (Ying et al., 2019; Huang et al., 2022; Yuan et al., 2021; Xu et al., 2019a; Zheng et al., 2021) . In the regime of robust statistics, an analyzing tool called influence functions (Hampel, 1974; Koh & Liang, 2017) is proposed to study the counterfactual effect between training data and model performance. For independent and identically distributed (i.i.d.) data, influence functions offer an approximate estimation of the model's change when there is an infinitesimal perturbation added to the training distribution, e.g., a reweighing on some training instances. However, unlike i.i.d. data, manipulation on a graph would incur a knock-on effect through GCNs. For example, an edge removal will break down all message passing that is supposed to pass through this edge and consequentially change node representations and affect the final model optimization. Therefore, introducing influence functions to graph data and GCNs is non-trivial work and requires extra considerations. In this work, we aim to derive influence functions for GCNs. As the first attempt in this direction, we focused on Simple Graph Convolution (Wu et al., 2019) . Our contributions are three-fold: • We derived influence functions for Simple Graph Convolution. Based on influence functions, we developed computational approaches to estimate the changes in model parameters caused by two basic perturbations: edge removal and node removal. • We derived the theoretical error bounds to characterize the gap between the estimated changes and the actual changes in model parameters in terms of both edge and node removal. • We show that our influence analysis on the graph can be utilized to (1) rectify the training graph to improve model testing performance, and (2) guide adversarial attacks to SGC or conduct grey-box attacks on GCNs via a surrogate SGC. Code is publicly available at https://github.com/Cyrus9721/Characterizing_ graph_influence.

2. PRELIMINARIES

In the following sections, we use a lowercase x for a scalar or an entity, an uppercase X for a constant or a set, a bolder lowercase x for a vector, and a bolder uppercase X for a matrix. Influence Functions Influence functions (Hampel, 1974) x,y ℓ(x, y) + λ 2 ∥θ∥ 2 2 for some loss function ℓ(•, •) through a parameterized model θ and with a regularization term. When down weighing a training sample (x i , y i ) by an infinitely small fraction ϵ, the substitutional ERM can be expressed as θ(x i ; -ϵ) = arg min θ∈Θ 1 N x,y ℓ(x, y) -ϵℓ(x i , y i ) + λ 2 ∥θ∥ 2 2 . Influence functions estimate the actual change I * (x i ; -ϵ) = θ(x i ; -ϵ) -θ for a strictly convex and twice differentiable ℓ(•, •): I(x i ; -ϵ) = lim ϵ→0 θ(x i ; -ϵ) -θ = -H -1 θ ∇ θ ℓ(x i , y i ), where H θ := 1 N N i=1 ∇ 2 θ ℓ(x i , y i ) + λI is the Hessian matrix with regularization at parameter θ. For some differentiable model evaluation function f : Θ → R like calculating total model loss over a test set, the change from down weighing ϵ → (x i , y i ) to the evaluative results can be approximated by ∇ θ f ( θ)H -1 θ ∇ θ ℓ(x i , y i ). When N the size of the training data is large, by setting ϵ = 1 N , we can approximate the change of θ incurred by removing an entire training sample I(x i ; -1 N ) = I(-x i ) via linear extrapolations 1 N → 0. Obviously, in terms of the estimated influence I, removing a training sample has the opposite value of adding the same training sample I(-x i ) = -I(+x i ). In our work, we shall assume an additivity of influence functions in computations when several samples are removed, e.g., when removing two samples: I(-x i , -x j ) = I(-x i ) + I(-x j ). Though efficient, as a drawback, influence functions on non-convex models suffer from estimation errors due to the variant local minima and usually a computational approximation to H -1 θ for a noninvertible Hessian matrix. To introduce influence functions from i.i.d. data to graphs and precisely characterize the influence of graph elements to model parameters' changes, we consider a convex model called Simple Graph Convolution from the GCNs family.



estimate the change in model parameters when the empirical weight distribution of i.i.d. training samples is perturbed infinitesimally. Such estimations are computationally efficient compared to learn-one-out retraining iterating every training sample. For N training instances x and label y, consider empirical risk minimization (ERM) θ = arg min θ∈Θ 1 N

