WIDE GRAPH NEURAL NETWORK

Abstract

Graph Neural Networks from the spatial and the spectral domains often suffer from the following problems: over-smoothing, poor flexibility, and low performance on heterophily. In this paper, we provide a unified view of GNNs from the matrix space analysis perspective to identify potential reasons for these problems and propose a new GNN framework to address them, called Wide Graph Neural Network (WGNN). We formulate GNNs as two components: one is for constructing a non-parametric feature space, and the other is for learning the parameters to re-weight the feature space. For instance, spatial GNNs encode the adjacency matrix multiplication as the feature space and stack layers to re-weight it, and spectral ones sum the polynomials to build the feature space and learn shared model weights. Instead, WGNN constructs the space by concatenating all polynomials and re-weights them individually. This mechanism reduces the unnecessary constraints on the feature space due to the concatenation, which avoids over-smoothing and allows independent parameters for better flexibility. Beyond the parameter independence property, WGNN enjoys further flexibility in adding matrices with arbitrary columns. For instance, by taking the principal components of the adjacency matrix, we can significantly improve the representation of heterophilic graphs. We provide a detailed theoretical analysis and conduct extensive experiments on eight datasets to show the superiority of the proposed WGNN. 1

1. INTRODUCTION

𝑊 (") 𝐴 # 𝑊 ($) 𝐴 # 𝑊 (%) 𝐴 # Graph neural networks (GNNs) have demonstrated their great potential in representation learning for graph-structured data, such as social networks, transportation networks, protein interaction networks, and chemical structures (Fan et al., 2019; Wu et al., 2020; Zheng et al., 2022) . Despite the success, existing GNNs still suffer some issues in the following. Firstly, the spatial GNNs aggregate the information from the connected nodes, resulting in the well-known over-smoothing (Cai & Wang, 2020) . Secondly, the spatial models assume that the features of connected nodes are similar; however, this assumption does not hold in heterophilic graphs (Zheng et al., 2022) . Thirdly, the spectral GNNs use polynomials to approach arbitrary graph filters (He et al., 2021; Klicpera et al., 2019; Defferrard et al., 2016) . In the absence of layer stacking, the spectral GNNs are exempt from the issue of over-smoothing. However, these spectral GNNs still perform poorly on heterophilic graphs since each polynomial term also shares the same assumption of similarity in neighbors. In addition, spectral methods share the parameters for each polynomial term, leading to a less flexible architecture. To better understand the problems in both spatial and spectral domains, efforts exist that integrate GNNs, e.g., from the perspective of optimization objectives (Ma et al., 2021; Zhu et al., 2021) . However, they focus on summarizing general formulas while lacking a clear explanation of the problems.

𝑋

In this paper, we propose a unified view for both spectral and spatial GNNs from the matrix space analysis point of view to investigate possible reasons for these problems and contribute a new way to address them. Specifically, for the sake of theoretical investigations, we first abstract a linear approximation of the GNNs following Wu et al. (2019a); Xu et al. (2018a) . Then, as shown in the mathematical formulation and implementation structure of Figure 1 , we decompose the components with and without parameters in the linear approximation, where the latter is regarded as a feature space built by node attributes and graph structure (e.g., adjacency or Laplacian matrices), and the former denotes the learnable parameters to re-weight the features. Consider spatial GNNs that 1) build the feature space by taking the power of the adjacency matrix, and 2) form the parameter space by taking the product of the weight matrices. For spectral GNNs, they sum the polynomials to compose the feature space and share the parameter for each. Based on this view, we can identify the reasons for issues in GNNs. When forming the feature space by powers of adjacency matrices, we find that over-smoothing is due to feature space compression. The parameter-sharing manner of spectral GNNs limits the flexibility of their architectures. Besides, the common issue of poor performance in heterophilic graphs is caused by the construction of each feature sub-space that embodies the similarity of neighboring nodes in both methods. The primary contribution of this work is a wide architecture of GNNs named Wide Graph Neural Networks (WGNN), whose basic architecture is shown in Figure 1 . In particular, it constructs the feature space by concatenating the polynomial terms of the adjacency matrix. This concatenation avoids space compression caused by powers in the spatial domain and alleviates the over-smoothing problem. To account for the feature space with multiple polynomial terms, the WGNN re-weights each one with an independent parameter matrix. Unlike spectral GNNs, which use a single parameter matrix for all polynomial terms, our WGNN has better flexibility by allowing different parameters for each. WGNN architectures also enjoy augmenting the feature space with arbitrary width of matrices. With this characteristic, we can improve the performance on heterophilic graphs by adding principal components of the adjacency matrix. This augmentation reduces the dependency of the feature space on the similarity of adjacent nodes since the principal components only extract the graph structure. Comprehensive experiments on both homophilic and heterophilic datasets demonstrate the superiority of WGNN. Contributions. (1) We provide a unified view of both spatial and spectral GNNs, which formulates GNNs as the framework of jointly constructing the feature space and learning the parameters to re-weight. (2) We propose a new architecture, WGNN, which avoids over-smoothing, enjoys flexibility, alleviates heterophily problems, and provide a detailed theoretical analysis. (3) We conduct experiments on homophilic and heterophilic datasets and achieve significant improvements, e.g., an average accuracy increase of 32% on heterophilic graphs.

2. PRELIMINARIES

In this paper, we focus on the undirected graph G = (V, E), along with its node attributes of V as X ∈ R n×d and adjacency matrix A ∈ R n×n to present E. GNNs take the input of the node attributes and the adjacency matrix, and output the hidden node representations, as H = GNN(X, A) ∈ R n×d . By default, we employ the cross-entropy loss function in the node classification task to minimize the difference between node label Y and the obtained representation as L(H, Y ) = -i Y i log softmax(H i ). 2.1 SPATIAL AND SPECTRAL GNNS Spatial GNNs mostly fall into the message-passing paradigm. For any given node, it essentially aggregates features from its neighbors and updates the aggregated feature, H (k+1) i = σ upd H (k) i , agg Âij , H (k) j ; j ∈ N (i) , where σ (•) is a non-linear activation function, H (k) indicates the hidden representation in k-th layer, agg and upd are the aggregation and updating functions (Balcilar et al., 2021) , Â = (D + I) -1/2 (A + I)(D + I) -1/2 is the re-normalized adjacency matrix using the degree matrix D, and N (•) denotes the 1-hop neighbors. Here, we provide two examples to specify this general expression. One is the vanilla GCN (Kipf & Welling, 2017) that adopts the mean-aggregation and the average-update, as shown in the left part of Figure 1 . Its formulation is:



The implementation of WGNN is available at https://drive.google.com/drive/folders/ 1A6VWiPmKRhCNfdcuFJvnxTiTgzgbJIZ6?usp=sharing



Figure 1: WGNN compared with current GNNs

