DEEPERGCN: TRAINING DEEPER GCNS WITH GEN-ERALIZED AGGREGATION FUNCTIONS

Abstract

Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs. Recent works developed frameworks to train deep GCNs. Such works show impressive results in tasks like point cloud classification and segmentation, and protein interaction prediction. In this work, we study the performance of such deep models in large scale graph datasets from the Open Graph Benchmark (OGB). In particular, we look at the effect of adequately choosing an aggregation function, and its effect on final performance. Common choices of aggregation are mean, max, and sum. It has shown that GCNs are sensitive to such aggregations when applied to different datasets. We further validate this point and propose to alleviate it by introducing a novel Generalized Aggregation Function. Our new aggregation not only covers all commonly used ones, but also can be tuned to learn customized functions for different tasks. Our generalized aggregation is fully differentiable, and thus its parameters can be learned in an end-to-end fashion. We add our generalized aggregation into a deep GCN framework and show it achieves state-of-the-art results in six benchmarks from OGB.

1. INTRODUCTION

The rise of availability of non-Euclidean data (Bronstein et al., 2017) has recently shed interest into the topic of Graph Convolutional Networks (GCNs). GCNs provide powerful deep learning architectures for irregular data, like point clouds and graphs. GCNs have proven valuable for applications in social networks (Tang & Liu, 2009) , drug discovery (Zitnik & Leskovec, 2017; Wale et al., 2008) , recommendation engines (Monti et al., 2017b; Ying et al., 2018), and point clouds (Wang et al., 2018; Li et al., 2019b) . Recent works looked at frameworks to train deeper GCN architectures (Li et al., 2019b; a) . These works demonstrate how increased depth leads to state-of-the-art performance on tasks like point cloud classification and segmentation, and protein interaction prediction. The power of deep models become more evident with the introduction of more challenging and largescale graph datasets. Such datasets were recently introduced in the Open Graph Benchmark (OGB) (Hu et al., 2020) , for tasks of node classification, link prediction, and graph classification. Graph convolutions in GCNs are based on the notion of message passing (Gilmer et al., 2017) . To compute a new node feature at each GCN layer, information is aggregated from the node and its connected neighbors. Given the nature of graphs, aggregation functions must be permutation invariant. This property guarantees invariance/equivariance to isomorphic graphs (Battaglia et al., 2018; Xu et al., 2019b; Maron et al., 2019a) . Popular choices for aggregation functions are mean (Kipf & Welling, 2016 ), max (Hamilton et al., 2017 ), and sum (Xu et al., 2019b) . Recent works suggest different aggregations have different performance impact depending on the task. For example, mean and sum perform best in node classification (Kipf & Welling, 2016), while max is favorable for dealing with 3D point clouds (Qi et al., 2017; Wang et al., 2019) . Currently, all works rely on empirical analysis to choose aggregation functions. In DeepGCNs (Li et al. (2019b) ), the authors complement aggregation functions with residual and dense connections, and dilated convolutions, in order to train very deep GCNs. Equipped with these new modules, GCNs with more than 100 layers can be reliably trained. Despite the potential of these new modules (Kipf & Welling, 2016; Hamilton et al., 2017; Veličković et al., 2018; Xu et al., 2019a) , it is still unclear if they are the ideal choice for DeepGCNs when handling large-scale graphs.

