ADAGCN: ADABOOSTING GRAPH CONVOLUTIONAL NETWORKS INTO DEEP MODELS

Abstract

The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN (Adaboosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors of current nodes and then integrates knowledge from different hops of neighbors into the network in an Adaboost way. Different from other graph neural networks that directly stack many graph convolution layers, AdaGCN shares the same base neural network architecture among all "layers" and is recursively optimized, which is similar to an RNN. Besides, We also theoretically established the connection between AdaGCN and existing graph convolutional methods, presenting the benefits of our proposal. Finally, extensive experiments demonstrate the consistent state-of-the-art prediction performance on graphs across different label rates and the computational advantage of our approach AdaGCN 1 .

1. INTRODUCTION

Recently, research related to learning on graph structural data has gained considerable attention in machine learning community. Graph neural networks (Gori et al., 2005; Hamilton et al., 2017; Veličković et al., 2018) , particularly graph convolutional networks (Kipf & Welling, 2017; Defferrard et al., 2016; Bruna et al., 2014) have demonstrated their remarkable ability on node classification (Kipf & Welling, 2017), link prediction (Zhu et al., 2016) and clustering tasks (Fortunato, 2010) . Despite their enormous success, almost all of these models have shallow model architectures with only two or three layers. The shallow design of GCN appears counterintuitive as deep versions of these models, in principle, have access to more information, but perform worse. Oversmoothing (Li et al., 2018) has been proposed to explain why deep GCN fails, showing that by repeatedly applying Laplacian smoothing, GCN may mix the node features from different clusters and makes them indistinguishable. This also indicates that by stacking too many graph convolutional layers, the embedding of each node in GCN is inclined to converge to certain value (Li et al., 2018) , making it harder for classification. These shallow model architectures restricted by oversmoothing issue

