COMBINING LABEL PROPAGATION AND SIMPLE MOD-ELS OUT-PERFORMS GRAPH NEURAL NETWORKS

ABSTRACT

Graph Neural Networks (GNNs) are a predominant technique for learning over graphs. However, there is relatively little understanding of why GNNs are successful in practice and whether they are necessary for good performance. Here, we show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs by combining shallow models that ignore the graph structure with two simple post-processing steps that exploit correlation in the label structure: (i) an "error correlation" that spreads residual errors in training data to correct errors in test data and (ii) a "prediction correlation" that smooths the predictions on the test data. We call this overall procedure Correct and Smooth (C&S), and the post-processing steps are implemented via simple modifications to standard label propagation techniques that have long been used in graph-based semi-supervised learning. Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks, with just a small fraction of the parameters and orders of magnitude faster runtime. For instance, we exceed the best-known GNN performance on the OGB-Products dataset with 137 times fewer parameters and greater than 100 times less training time. The performance of our methods highlights how directly incorporating label information into the learning algorithm (as is common in traditional methods) yields easy and substantial performance gains. We can also incorporate our techniques into big GNN models, providing modest gains in some cases.

1. INTRODUCTION

Following the success of neural networks in computer vision and natural language processing, there are now a wide range of graph neural networks (GNNs) for making predictions involving relational data (Battaglia et al., 2018; Wu et al., 2020) . These models have had much success and sit atop leaderboards such as the Open Graph Benchmark (Hu et al., 2020) . Often, the methodological developments for GNNs revolve around creating strictly more expressive architectures than basic variants such as the Graph Convolutional Network (GCN) (Kipf & Welling, 2017) or GraphSAGE (Hamilton et al., 2017a) ; examples include Graph Attention Networks (Veličković et al., 2018 ), Graph Isomorphism Networks (Xu et al., 2018) , and various deep models (Li et al., 2019; Rong et al., 2019; Chen et al., 2020) . Many ideas for new GNN architectures are adapted from new architectures in models for language (e.g., attention) or vision (e.g., deep CNNs) with the hopes that success will translate to graphs. However, as these models become more complex, understanding their performance gains is a major challenge, and scaling them to large datasets is difficult. Here, we see how far we can get by combining much simpler models, with an emphasis on understanding where there are easy opportunities for performance improvements in graph learning, particularly transductive node classification. We propose a simple pipeline with three main parts (Figure 1 ): (i) a base prediction made with node features that ignores the graph structure (e.g., a shallow multi-layer perceptron or just a linear model); (ii) a correction step, which propagates uncertainties from the training data across the graph to correct the base prediction; and (iii) a smoothing of the predictions over the graph. Steps (ii) and (iii) are post-processing and implemented with classical methods for graph-based semi-supervised learning, namely, label propagation techniques

