SCALABLE GRAPH NEURAL NETWORKS FOR HETEROGENEOUS GRAPHS

Abstract

Graph neural networks (GNNs) are a popular class of parametric model for learning over graph-structured data. Recent work has argued that GNNs primarily use the graph for feature smoothing, and have shown competitive results on benchmark tasks by simply operating on graph-smoothed node features, rather than using end-to-end learned feature hierarchies that are challenging to scale to large graphs. In this work, we ask whether these results can be extended to heterogeneous graphs, which encode multiple types of relationship between different entities. We propose Neighbor Averaging over Relation Subgraphs (NARS), which trains a classifier on neighbor-averaged features for randomly-sampled subgraphs of the "metagraph" of relations. We describe optimizations to allow these sets of node features to be computed in a memory-efficient way, both at training and inference time. NARS achieves a new state of the art accuracy on several benchmark datasets, outperforming more expensive GNN-based methods.

1. INTRODUCTION

In recent years, deep learning on graphs has attracted a great deal of interest, with new applications ranging from social networks and recommender systems, to biomedicine, scene understanding, and modeling of physics (Wu et al., 2020) . One popular branch of graph learning is based on the idea of stacking learned "graph convolutional" layers that perform feature transformation and neighbor aggregation (Kipf & Welling, 2017) , and has led to an explosion of variants collectively referred to as Graph Neural Networks (GNNs) (Hamilton et al., 2017; Xu et al., 2018; Velickovic et al., 2018) . Most benchmarks for learning on graphs focus on very small graphs, but the relevance of such models to large-scale social network and e-commerce datasets was quickly recognized (Ying et al., 2018) . Since the computational cost of training and inference on GNNs scales poorly to large graphs, a number of sampling approaches have been proposed that improve the time and memory cost of GNNs by operating on subsets of graph nodes or edges (Hamilton et al., 2017; Chen et al., 2017; Zou et al., 2019; Zeng et al., 2019; Chiang et al., 2019) . Recently several papers have argued that on a range of benchmark tasks -social network and e-commerce tasks in particular -GNNs primarily derive their benefits from performing feature smoothing over graph neighborhoods rather than learning non-linear hierarchies of features as implied by the analogy to CNNs (Wu et al., 2019; NT & Maehara, 2019; Chen et al., 2019; Rossi et al., 2020) . Surprisingly, Rossi et al. ( 2020) demonstrate that a one-layer MLP operating on concatenated N-hop averaged features, which they call Scalable Inception Graph Network (SIGN), performs competitively with state-of-the-art GNNs on large web datasets while being more scalable and simpler to use than sampling approaches. Neighbor-averaged features can be precomputed, reducing GNN training and inference to a standard classification task. However, in practice the large graphs used in web-scale classification problems are often heterogeneous, encoding many types of relationship between different entities (Lerer et al., 2019) . While GNNs extend naturally to these multi-relation graphs (Schlichtkrull et al., 2018) and specialized methods further improve the state-of-the-art on them (Hu et al., 2020b; Wang et al., 2019b) , it is not clear how to extend neighbor-averaging approaches like SIGN to these graphs. In this work, we investigate whether neighbor-averaging approaches can be applied to heterogeneous graphs (HGs). We propose Neighbor Averaging over Relation Subgraphs (NARS), which computes neighbor averaged features for random subsets of relation types, and combines them into a single

