HOW POWERFUL IS IMPLICIT DENOISING IN GRAPH NEURAL NETWORKS

Abstract

Graph Neural Networks (GNNs), which aggregate features from neighbors, are widely used for graph-structured data processing due to their powerful representation learning capabilities. It is generally believed that GNNs can implicitly remove the non-predictive noises. However, the analysis of implicit denoising effect in graph neural networks remains open. In this work, we conduct a comprehensive theoretical study and analyze when and why the implicit denoising happens in GNNs. Specifically, we study the convergence properties of noise matrix. Our theoretical analysis suggests that the implicit denoising largely depends on the connectivity, the graph size, and GNN architectures. Moreover, we formally define and propose the adversarial graph signal denoising (AGSD) problem by extending graph signal denoising problem. By solving such a problem, we derive a robust graph convolution, where the smoothness of the node representations and the implicit denoising effect can be enhanced. Extensive empirical evaluations verify our theoretical analyses and the effectiveness of our proposed model.

1. INTRODUCTION

Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) have been widely used in graph learning and achieved remarkable performance on graphbased tasks, such as traffic prediction (Guo et al., 2019) , drug discovery (Dai et al., 2019) , and recommendation system (Ying et al., 2018) . A general principle behind Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) is to perform a message passing operation that aggregates node features over neighborhoods, such that the smoothness of learned node representations on the graph is enhanced. By promoting graph smoothness, the message passing and aggregation mechanism naturally leads to GNN models whose predictions are not only dependent on the feature of one specific node, but also the features from a set of neighboring nodes. Therefore, this mechanism can, to a certain extent, protect GNN models from noises: real-world graphs are usually noisy, e.g., Gaussian white noise exists on node features (Zhou et al., 2021) , however, the influence of feature noises on the model's output could be counteracted by the feature aggregation operation in GNNs. We term this effect as implicit denoising. While many works have been conducted in the empirical exploration of GNNs, relatively fewer advances have been achieved in theoretically studying this denoising effect. Early GNN models, such as the vanilla GCN (Kipf & Welling, 2017) , GAT (Veličković et al., 2018) and GraphSAGE (Hamilton et al., 2017) , propose different designs of aggregation functions, but the denoising effect is not discussed in these works. Some recent attempts (Ma et al., 2021b) are made to mathematically establish the connection between a variety of GNNs and the graph signal denoising problem (GSD) (Chen et al., 2014) : q(F) = min F ∥F -X∥ 2 F + λ tr F ⊤ LF , where X = X * + η is the observed noisy feature matrix, η ∈ R n×d is the noise matrix, X * is the clean feature matrix, and L is the graph Laplacian. The second term encourages the smoothness of the filtered feature matrix F over the graph., i.e., nearby vertices should have similar vertex features. By regarding the feature aggregation process in GNNs as solving a GSD problem, more advanced GNNs are proposed, such as GLP (Li et al., 2019) , S 2 GC (Zhu & Koniusz, 2021), and IRLS (Yang et al., 2021) . Despite these prior attempts, little efforts have been made to rigorously study the denoising effect of message passing and aggregation operation. This urges us to think about a fundamental but not clearly answered question: Why and when implicit denoising happens in GNNs? In this work, we focus on the non-predictive stochasticity of noise in GNNs' aggregated features and analyze its properties. We prove that with the increase in graph size and graph connectivity factor, the stochasticity tends to diminish, which is called the "denoising effect" in our work. We will address this question using the tools from concentration inequalities and matrix theories, which are concerned with the study of the convergence of noise matrix. It offers a new framework to study the properties of graphs and GNNs in terms of the denoising effect. In order to facilitate our theoretical analysis, we derive Neumann Graph Convolution (NGC) from GSD. Specifically, to study the convergence rate, we introduce an insightful measurement on the convolution operator, termed high-order graph connectivity factor, which reveals how uniform the nodes are distributed in the neighborhood and reflects the strength of information diluted on a single neighboring node during the feature aggregation step. Intuitively, as the General Hoeffding Inequality (Hoeffding, 1994) (Lemma. D.1) suggests, a larger high-order graph connectivity factor, i.e., nodes are more uniformly distributed in the neighborhood, accelerates the convergence of the noise matrix and a larger graph size leads to faster convergence. Besides, GNN architectures also affect the convergence rate. Deeper GNNs can have a faster convergence rate. To further strengthen the denoising effect, inspired by the adversarial training method (Madry et al., 2018) , we propose the adversarial graph signal denoising problem (AGSD). By solving such a problem, we derive a robust graph convolution model based on the correlation of node feature and graph structure to increase the high-order graph connectivity factor, which helps us improve the denoising performance. Extensive experimental results on standard graph learning tasks verify our theoretical analyses and the effectiveness of our derived robust graph convolution model. Notations. Let G = (V, E) represent a undirected graph, where V is the set of vertices {v 1 , • • • , v n } with |V| = n and E is the set of edges. The adjacency matrix is defined as A ∈ {0, 1} n×n , and A i,j = 1 if and only if (v i , v j ) ∈ E. Let N i = {v j |A i,j = 1} denote the neighborhood of node v i and D denote the diagonal degree matrix, where D i,i = n j=1 A i,j . The feature matrix is denoted as X ∈ R n×d where each node v i is associated with a d-dimensional feature vector X i . Y ∈ {0, 1} n×c denotes the matrix, where Y i ∈ {0, 1} c is a one-hot vector and c j=1 Y i,j = 1 for any v i ∈ V .

2. A SIMPLE UNIFYING FRAMEWORK: NEUMANN GRAPH CONVOLUTION

A General Framework. In this section, we discuss a simple yet general framework for solving graph signal denoising problem, namely Neumann Graph Convolution (NGC). Note that NGC is not a new GNN architecture. There also exist similar GNN architectures, such as GLP (Li et al., 2019) , S 2 GC (Zhu & Koniusz, 2021), and GaussianMRF (Jia & Benson, 2022) . We focus on the theoretical analysis of the denoising effect in GNNs in this work. NGC can facilitate our theoretical analysis. By taking the derivative ∇q (F) = 2 LF + 2(F -X) to zero, we obtain the solution of GSD optimization problem as follows: F = (I + λ L) -1 X. (2) To avoid the expensive computation of the inverse matrix, we can use Neumann series (Stewart, 1998) expansion to approximate Eq. (2) up to up to S-th order: I + λ L -1 = 1 λ + 1 I - λ λ + 1 A -1 ≈ 1 λ + 1 S s=0 λ λ + 1 A s , where A can take the form of A = D -1 2 A D -1 2 or A = D -1 A, and the proof can be found in Appendix B. Based on the Neumann series expansion of the solution of GSD, we introduce a general graph convolution model -Neumann Graph Convolution defined as the following expansion: H = A S XW = 1 λ + 1 S s=0 λ λ + 1 A s XW,

