A UNIFIED VIEW ON GRAPH NEURAL NETWORKS AS GRAPH SIGNAL DENOISING Anonymous authors Paper under double-blind review

Abstract

Graph Neural Networks (GNNs) have risen to prominence in learning representations for graph structured data. A single GNN layer typically consists of a feature transformation and a feature aggregation operation. The former normally uses feed-forward networks to transform features, while the latter aggregates the transformed features over the graph. Numerous recent works have proposed GNN models with different designs in the aggregation operation. In this work, we establish mathematically that the aggregation processes in a group of representative GNN models including GCN, GAT, PPNP, and APPNP can be regarded as (approximately) solving a graph denoising problem with a smoothness assumption. Such a unified view across GNNs not only provides a new perspective to understand a variety of aggregation operations but also enables us to develop a unified graph neural network framework UGNN. To demonstrate its promising potential, we instantiate a novel GNN model, ADA-UGNN, derived from UGNN, to handle graphs with adaptive smoothness across nodes. Comprehensive experiments show the effectiveness of ADA-UGNN.

1. INTRODUCTION

Graph Neural Networks (GNNs) have shown great capacity in learning representations for graphstructured data and thus have facilitated many down-stream tasks such as node classification (Kipf & Welling, 2016; Veličković et al., 2017; Ying et al., 2018a; Klicpera et al., 2018) and graph classification (Defferrard et al., 2016; Ying et al., 2018b) . As traditional deep learning models, a GNN model is usually composed of several stacking GNN layers. Given a graph G with N nodes, a GNN layer typically contains a feature transformation and a feature aggregation operation as: Feature Transformation: X in = f trans (X in ); Feature Aggregation: X out = f agg (X in ; G) (1) where X in ∈ R N ×din and X out ∈ R N ×dout denote the input and output features of the GNN layer with d in and d out as the corresponding dimensions, respectively. Note that the non-linear activation is not included in Eq. ( 1) to ease the discussion. The feature transformation operation f trans (•) transforms the input of X in to X in ∈ R N ×dout as its output; and the feature aggregation operation f agg (•; G) updates the node features by aggregating the transformed node features via the graph G. In general, different GNN models share similar feature transformations (often, a single feed-forward layer), while adopting different designs for aggregation operation. We raise a natural question -is there an intrinsic connection among these feature aggregation operations and their assumptions? The significance of a positive answer to this question is two-fold. Firstly, it offers a new perspective to create a uniform understanding on representative aggregation operations. Secondly, it enables us to develop a general GNN framework that not only provides a unified view on multiple existing representative GNN models, but also has the potential to inspire new ones. In this paper, we aim to build the connection among feature aggregation operations of representative GNN models including GCN (Kipf & Welling, 2016) , GAT (Veličković et al., 2017), PPNP and APPNP (Klicpera et al., 2018) . In particular, we mathematically establish that the aggregation operations in these models can be unified as the process of exactly, and sometimes approximately, addressing a graph signal denoising problem with Laplacian regularization (Shuman et al., 2013) . This connection suggests that these aggregation operations share a unified goal: to ensure feature smoothness of connected nodes. With this understanding, we propose a general GNN framework, UGNN, which not only provides a straightforward, unified view for many existing aggregation operations, but also suggests various promising directions to build new aggregation operations suitable for distinct applications. To demonstrate its potential, we build an instance of UGNN called ADA-UGNN, which is suited for handling varying smoothness properties across nodes, and conduct experiments to show its effectiveness.

2. REPRESENTATIVE GRAPH NEURAL NETWORKS

In this section, we introduce notations for graphs and briefly summarize several representative GNN models. A graph can be denoted as G = {V, E}, where V and E are its corresponding node and edge sets. The connections in G can be represented as an adjacency matrix A ∈ R N ×N , with N the number of nodes in the graph. The Laplacian matrix of the graph G is denoted as L. It is defined as L = D -A, where D is a diagonal degree matrix corresponding to A. There are also normalized versions of the Laplacian matrix such asIn this work, we sometimes adopt different Laplacians to establish connections between different GNNs and the graph denoising problem, clarifying in the text. In this section, we generally use X in ∈ R N ×din and X out ∈ R N ×dout to denote input and output features of GNN layers. Next, we describe a few representative GNN models.

2.1. GRAPH CONVOLUTIONAL NETWORKS (GCN)

Following Eq. ( 1), a single layer in GCN (Kipf & Welling, 2016) can be written as follows:Feature Transformation:where W ∈ R din×dout is a feature transformation matrix, and Ã is a normalized adjacency matrix which includes a self-loop, defined as follows:In practice, multiple GCN layers can be stacked, where each layer takes the output of its previous layer as input. Non-linear activation functions are included between consecutive layers.

2.2. GRAPH ATTENTION NETWORKS (GAT)

Graph Attention Networks (GAT) adopts the same feature transformation operation as GCN in Eq. ( 2). The feature aggregation operation (written node-wise) for a node i is as:where Ñ (i) = N (i) ∪ {i} denotes the neighbors (self-inclusive) of node i, and X out [i, :] is the i-th row of the matrix X out , i.e. the output node features of node i. In this aggregation operation, α ij is a learnable attention score to differentiate the importance of distinct nodes in the neighborhood. Specifically, α ij is a normalized form of e ij , which is modeled as:where [• •] denotes the concatenation operation and a ∈ R 2d is a learnable vector. Similar to GCN, a GAT model usually consists of multiple stacked GAT layers.

2.3. PERSONALIZED PROPAGATION OF NEURAL PREDICTIONS (PPNP)

Personalized Propagation of Neural Predictions (PPNP) (Klicpera et al., 2018) introduces an aggregation operation based on Personalized PageRank (PPR). Specifically, the PPR matrix is defined as, where α ∈ (0, 1) is a hyper-parameter. The ij-th element of the PPR matrix specifies the influence of node i on node j. The feature transformation operation is modeled as Multi-layer Perception (MLP). The PPNP model can be written in the form of Eq. ( 1) as follows:Feature Transformation: X in = MLP(Xin);Feature Aggregation:Unlike GCN and GAT, PPNP only consists of a single feature aggregation layer, but with a potentially deep feature transformation. Since the matrix inverse in Eq. ( 6) is costly, Klicpera et al. ( 2018) also introduces a practical, approximated version of PPNP, called APPNP, where the aggregation operation is performed in an iterative way as:whereout is the output of the feature aggregation operation. As proved in Klicpera et al. (2018) , X (K) out converges to the solution obtained by PPNP, i.e., X out in Eq. ( 6).

3. GNNS AS GRAPH SIGNAL DENOISING

In this section, we aim to establish the connections between the introduced GNN models and a graph signal denoising problem with Laplacian regularization. We first introduce the problem.

