A UNIFIED VIEW ON GRAPH NEURAL NETWORKS AS GRAPH SIGNAL DENOISING Anonymous authors Paper under double-blind review

Abstract

Graph Neural Networks (GNNs) have risen to prominence in learning representations for graph structured data. A single GNN layer typically consists of a feature transformation and a feature aggregation operation. The former normally uses feed-forward networks to transform features, while the latter aggregates the transformed features over the graph. Numerous recent works have proposed GNN models with different designs in the aggregation operation. In this work, we establish mathematically that the aggregation processes in a group of representative GNN models including GCN, GAT, PPNP, and APPNP can be regarded as (approximately) solving a graph denoising problem with a smoothness assumption. Such a unified view across GNNs not only provides a new perspective to understand a variety of aggregation operations but also enables us to develop a unified graph neural network framework UGNN. To demonstrate its promising potential, we instantiate a novel GNN model, ADA-UGNN, derived from UGNN, to handle graphs with adaptive smoothness across nodes. Comprehensive experiments show the effectiveness of ADA-UGNN.

1. INTRODUCTION

Graph Neural Networks (GNNs) have shown great capacity in learning representations for graphstructured data and thus have facilitated many down-stream tasks such as node classification (Kipf & Welling, 2016; Veličković et al., 2017; Ying et al., 2018a; Klicpera et al., 2018) and graph classification (Defferrard et al., 2016; Ying et al., 2018b) . As traditional deep learning models, a GNN model is usually composed of several stacking GNN layers. Given a graph G with N nodes, a GNN layer typically contains a feature transformation and a feature aggregation operation as: Feature Transformation: X in = f trans (X in ); Feature Aggregation: X out = f agg (X in ; G) (1) where X in ∈ R N ×din and X out ∈ R N ×dout denote the input and output features of the GNN layer with d in and d out as the corresponding dimensions, respectively. Note that the non-linear activation is not included in Eq. ( 1) to ease the discussion. The feature transformation operation f trans (•) transforms the input of X in to X in ∈ R N ×dout as its output; and the feature aggregation operation f agg (•; G) updates the node features by aggregating the transformed node features via the graph G. In general, different GNN models share similar feature transformations (often, a single feed-forward layer), while adopting different designs for aggregation operation. We raise a natural question -is there an intrinsic connection among these feature aggregation operations and their assumptions? The significance of a positive answer to this question is two-fold. Firstly, it offers a new perspective to create a uniform understanding on representative aggregation operations. Secondly, it enables us to develop a general GNN framework that not only provides a unified view on multiple existing representative GNN models, but also has the potential to inspire new ones. In this paper, we aim to build the connection among feature aggregation operations of representative GNN models including GCN (Kipf & Welling, 2016) , GAT (Veličković et al., 2017), PPNP and APPNP (Klicpera et al., 2018) . In particular, we mathematically establish that the aggregation operations in these models can be unified as the process of exactly, and sometimes approximately, addressing a graph signal denoising problem with Laplacian regularization (Shuman et al., 2013) . This connection suggests that these aggregation operations share a unified goal: to ensure feature smoothness of connected nodes. With this understanding, we propose a general GNN framework, UGNN, which not only provides a straightforward, unified view for many existing aggregation operations, but also suggests various promising directions to build new aggregation operations suitable for distinct applications. To demonstrate its potential, we build an instance of UGNN called ADA-UGNN, which is suited for handling varying smoothness properties across nodes, and conduct experiments to show its effectiveness. 1

