HOW POWERFUL IS IMPLICIT DENOISING IN GRAPH NEURAL NETWORKS

Abstract

Graph Neural Networks (GNNs), which aggregate features from neighbors, are widely used for graph-structured data processing due to their powerful representation learning capabilities. It is generally believed that GNNs can implicitly remove the non-predictive noises. However, the analysis of implicit denoising effect in graph neural networks remains open. In this work, we conduct a comprehensive theoretical study and analyze when and why the implicit denoising happens in GNNs. Specifically, we study the convergence properties of noise matrix. Our theoretical analysis suggests that the implicit denoising largely depends on the connectivity, the graph size, and GNN architectures. Moreover, we formally define and propose the adversarial graph signal denoising (AGSD) problem by extending graph signal denoising problem. By solving such a problem, we derive a robust graph convolution, where the smoothness of the node representations and the implicit denoising effect can be enhanced. Extensive empirical evaluations verify our theoretical analyses and the effectiveness of our proposed model.

1. INTRODUCTION

Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) have been widely used in graph learning and achieved remarkable performance on graphbased tasks, such as traffic prediction (Guo et al., 2019) , drug discovery (Dai et al., 2019) , and recommendation system (Ying et al., 2018) . A general principle behind Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) is to perform a message passing operation that aggregates node features over neighborhoods, such that the smoothness of learned node representations on the graph is enhanced. By promoting graph smoothness, the message passing and aggregation mechanism naturally leads to GNN models whose predictions are not only dependent on the feature of one specific node, but also the features from a set of neighboring nodes. Therefore, this mechanism can, to a certain extent, protect GNN models from noises: real-world graphs are usually noisy, e.g., Gaussian white noise exists on node features (Zhou et al., 2021) , however, the influence of feature noises on the model's output could be counteracted by the feature aggregation operation in GNNs. We term this effect as implicit denoising. While many works have been conducted in the empirical exploration of GNNs, relatively fewer advances have been achieved in theoretically studying this denoising effect. Early GNN models, such as the vanilla GCN (Kipf & Welling, 2017) , GAT (Veličković et al., 2018) and GraphSAGE (Hamilton et al., 2017) , propose different designs of aggregation functions, but the denoising effect is not discussed in these works. Some recent attempts (Ma et al., 2021b) are made to mathematically establish the connection between a variety of GNNs and the graph signal denoising problem (GSD) (Chen et al., 2014) : q(F) = min F ∥F -X∥ 2 F + λ tr F ⊤ LF , where X = X * + η is the observed noisy feature matrix, η ∈ R n×d is the noise matrix, X * is the clean feature matrix, and L is the graph Laplacian. 



The second term encourages the smoothness of the filtered feature matrix F over the graph., i.e., nearby vertices should have similar vertex features. By regarding the feature aggregation process in GNNs as solving a GSD problem, more advanced GNNs are proposed, such as GLP(Li et al., 2019), S 2 GC (Zhu & Koniusz, 2021), and IRLS(Yang et al.,

