VEM-GCN: TOPOLOGY OPTIMIZATION WITH VARIATIONAL EM FOR GRAPH CONVOLUTIONAL NETWORKS

Abstract

Over-smoothing has emerged as a severe problem for node classification with graph convolutional networks (GCNs). In the view of message passing, the oversmoothing issue is caused by the observed noisy graph topology that would propagate information along inter-class edges, and consequently, over-mix the features of nodes in different classes. In this paper, we propose a novel architecture, namely VEM-GCN, to address this problem by employing the variational EM algorithm to jointly optimize the graph topology and learn desirable node representations for classification. Specifically, variational EM approaches a latent adjacency matrix parameterized by the assortative-constrained stochastic block model (SBM) to enhance intra-class connection and suppress inter-class interaction of the observed noisy graph. In the variational E-step, graph topology is optimized by approximating the posterior probability distribution of the latent adjacency matrix with a neural network learned from node embeddings. In the M-step, node representations are learned using the graph convolutional network based on the refined graph topology for the downstream task of classification. VEM-GCN is demonstrated to outperform existing strategies for tackling over-smoothing and optimizing graph topology in node classification on seven benchmark datasets.

1. INTRODUCTION

Complex graph-structured data are ubiquitous in the real world, ranging from social networks to chemical molecules. Inspired by the remarkable performance of convolutional neural networks (CNNs) in processing data with regular grid structures (e.g., images), a myriad of studies on GCNs have emerged to execute "convolution" in the graph domain (Niepert et al., 2016; Kipf & Welling, 2017; Gilmer et al., 2017; Hamilton et al., 2017; Monti et al., 2017; Gao et al., 2018) . Many of these approaches follow a neighborhood aggregation mechanism (a.k.a., message passing scheme) that updates the representation of each node by iteratively aggregating the transformed messages sent from its neighboring nodes. Commencing with the pioneering works (Kipf & Welling, 2017; Gilmer et al., 2017) , numerous strategies have been developed to improve the vanilla message passing scheme such as introducing self-attention mechanism (Veličković et al., 2018; Zhang et al., 2020) , incorporating local structural information (Zhang et al., 2020; Jin et al., 2019; Ye et al., 2020) , and leveraging the link attributes (Gong & Cheng, 2019; Li et al., 2019; Jiang et al., 2019) . Despite significant success in many fundamental tasks of graph-based machine learning, message passing-based GCNs almost all process the observed graph structure as ground truth and might suffer from the over-smoothing problem (Li et al., 2018) , which would seriously affect the node classification performance. Given the observed noisy graph topology (i.e., excessive inter-class edges are linked while many intra-class edges are missing), when multiple message passing layers are stacked to enlarge the receptive field (the maximum hop of neighborhoods), features of neighboring nodes in different classes would be dominant in message passing. Thus, node representations would be corrupted by the harmful noise and affect the discrimination of graph nodes. The over-smoothing phenomenon in GCNs has already been studied from different aspects. Li et al. (2018) first interpreted over-smoothing from the perspective of Laplacian smoothing, while Xu et al. ( 2018) and Klicpera et al. (2019a) associated it with the limit distribution of random walk. Furthermore, Chen et al. (2020a) developed quantitative metrics to measure the over-smoothness from the topological view. They argued that the key factor leading to over-smoothing is the noise passing between nodes of different categories and the classification performance of GCNs is positively correlated with the proportion of intra-class node pairs in all edges. In this paper, we propose VEM-GCN, a novel architecture to address the over-smoothing problem with topology optimization for uncertain graphs. Considering that a "clearer" graph with more intra-class edges and fewer inter-class edges would improve the node classification performance of GCNs (Yang et al., 2019; Chen et al., 2020a) , VEM-GCN approaches a latent adjacency matrix parameterized by the assortative-constrained stochastic block model (SBM) where nodes share the same label are linked and inter-class edges should be cut off. To jointly refine the latent graph structure and learn desirable node representations for classification, variational EM algorithm (Neal & Hinton, 1998 ) is adopted to optimize the evidence lower bound (ELBO) of the likelihood function. In the inference procedure (E-step), graph topology is optimized by approximating the posterior probability distribution of the latent adjacency matrix with a neural network learned from node embeddings. In the learning procedure (M-step), a conventional GCN is trained to maximize the log-likelihood of the observed node labels based on the learned latent graph structure. The E-step and M-step optimize the graph topology and improve the classification of unlabeled nodes in an alternating fashion. The proposed VEM-GCN architecture is flexible and general. In the E-step, the neural network can support arbitrary desirable node embeddings generated by algorithms such as node2vec (Grover & Leskovec, 2016 ), struc2vec (Ribeiro et al., 2017) , and GCNs, or the raw node attributes. The GCN in the M-step can also be substituted with arbitrary graph models. Furthermore, recent strategies for relieving the over-smoothing issue, i.e., AdaEdge (Chen et al., 2020a) and DropEdge (Rong et al., 2020) , are shown to be the specific cases of VEM-GCN under certain conditions. For empirical evaluation, we conduct extensive experiments on seven benchmarks for node classification, including four citation networks, two Amazon co-purchase graphs, and one Microsoft Academic graph. Experimental results demonstrate the effectiveness of the proposed VEM-GCN architecture in optimizing graph topology and mitigating the over-smoothing problem for GCNs.

2. BACKGROUND AND RELATED WORKS

Problem Setting. This paper focuses on the task of graph-based transductive node classification. A simple attributed graph is defined as a tuple G obs = (V, A obs , X), where V = {v i } N i=1 is the node set, A obs = a obs ij ∈ {0, 1} N ×N is the observed adjacency matrix, and X ∈ R N ×f represents the collection of attributes with each row corresponding to the features of an individual node. Given the labels Y l = [y ic ] ∈ {0, 1} |V l |×C for a subset of graph nodes V l ⊂ V assigned to C classes, the task is to infer the classes Y u = [y jc ] ∈ {0, 1} |Vu|×C for the unlabeled nodes V u = V\V l based on G obs . Graph Convolutional Networks (GCNs). The core of most GCNs is message passing scheme, where each node updates its representation by iteratively aggregating features from its neighborhoods. Denote with W (l) the learnable weights in the l-th layer, N (i) the set of neighboring node indices for node v i , and σ(•) the nonlinear activation function. A basic message passing layer takes the following form: h (l+1) i = σ j∈N (i)∪{i} α (l) ij W (l) h (l) j . (1) Here, h j is the input features of node v j in the l-th layer, W (l) h (l) j is the corresponding transformed message, and α (l) ij is the aggregation weight for the message passing from node v j to node v i . Existing GCNs mainly differ in the mechanism for computing α (l) ij (Kipf & Welling, 2017; Veličković et al., 2018; Ye et al., 2020; Hamilton et al., 2017; Zhang et al., 2020) . Stochastic Block Model (SBM). SBM (Holland et al., 1983 ) is a generative model for producing graphs with community structures. It parameterizes the edge probability between each node pair by āij |y i , y j ∼ Bernoulli (p 0 ) , if y i = y j Bernoulli (p 1 ) , if y i = y j , ( ) where āij is an indicator variable for the edge linking nodes v i and v j , y i and y j denote their corresponding communities (classes), p 0 and p 1 are termed community link strength and cross-

