AND THE REASONS FOR ITS SUCCESS

Abstract

What is the simplest, but still effective, graph neural network (GNN) that we can ← All reviewers design, say, for node classification? Einstein said that we should "make everything as simple as possible, but not simpler." We rephrase it into the 'careful simplicity' principle: a carefully-designed simple model can outperform sophisticated ones in real-world tasks, where data are scarce, noisy, and spuriously correlated. Based on that principle, we propose SlenderGNN that exhibits four desirable properties: It is (a) accurate, winning or tying on 11 out of 13 real-world datasets; (b) robust, being the only one that handles all settings (heterophily, random structure, useless features, etc.); (c) fast and scalable, with up to 18× faster training in million-scale graphs; and (d) interpretable, thanks to the linearity and sparsity we impose. We explain the success of SlenderGNN via a systematic study on existing models, comprehensive sanity checks, and ablation studies on its design decisions.

1. INTRODUCTION

What is the simplest, and still performant, graph neural network (GNN) that we can design? GNNs (Kipf & Welling, 2017; Hamilton et al., 2017; Gilmer et al., 2017) have succeeded in various graph mining tasks such as node classification, clustering, or link prediction. However, it is difficult for a practitioner to choose a proper model for each task without spending extensive time on searching, tuning, and training models due to a large number of GNN variants. Given all these variants, which one should a practitioner use first? Which are the strong and weak points of each variant? Could we design a variant that matches all of the strong points and avoids all the weak ones? In response to the questions above, we propose SlenderGNN based on the 'careful simplicity' prin-← All reviewers ciple: a simple, but carefully-designed model can be more accurate than complex ones due to better generalizability, robustness, and easier training. The design decisions of SlenderGNN (D1-4 in Section 4.2) are carefully made to follow this principle by observing and addressing the pain points of existing GNNs; for example, we generate various forms of graph-based features and combine them (D1), propose structural features (D2), remove redundancy in the generated features (D3), and make the propagator function contain no hyperparameters (D4). The resulting model, SlenderGNN, is our main contribution (C1) which exhibits the following desirable properties: • C1.1 -Accurate on both real-world and synthetic datasets, almost always winning or tying in the first place (see Figure 1b , Table 2 , and Table 3 ). • C1.2 -Robust, being able to handle numerous real settings such as homophily, heterophily, no network effects, graphs with useless features (see Figure 1a and Table 2 ). • C1.3 -Fast and scalable, using few, carefully chosen features, it takes only 32 seconds on million-scale graphs (ogbn-Products) on a stock server (see Figure 1b ). • C1.4 -Interpretable, learning the largest weights on informative features, ignoring noisy ones, based on the linear decision function (see Figure 2 ). The natural question that arises from the success of SlenderGNN is ← All reviewers Q: "How is it possible that a simpler model is more accurate than a sophisticated, more expressive one?" Our intuitive justification for the success of SlenderGNN is as follows: (a) Occam's razor: Since a statistical model tries to 'explain' the given labels, the simplest explanation performs best in general. In addition to the intuitive arguments above, our extensive experiments provide hard evidence in favor of the 'careful simplicity' principle: SlenderGNN outperforms complex GNNs in both synthetic (in Table 2 ) and real-world (in Table 3 ) datasets, and even its own variants that use nonlinear feature transformation in 9 of the 13 real-world datasets (in Table 4 ). ✓ Structural Structural ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ SlenderGNN ✓ S 2 GC G 2 CN ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ GCN ✓ ✓ ✓ SAGE ✓ ✓ ✓ ✓ GCNII ✓ ✓ ✓ APPNP ✓ GPR ✓ ✓ ✓ ✓ GAT ✓ (a)

1.1x Higher

Not only we propose a carefully designed, high performing GNN, we also explain the reasons of its success. This is thanks to our two additional contributions (C2-3): • C2 -Explanation: We propose GNNLIN, a framework for the systematic linearization of existing GNNs. As shown in Table 1 and Section 3, our GNNLIN highlights the similarities, differences, strengths and weaknesses of successful GNN baselines. • C3 -Sanity checks: We propose a wide range of scenarios (homophily, heterophily, blockcommunities, bipartite-graph communities, etc.), which reveal the strong and weak points of each GNN variant: see Figure 1a with more details in Table 2 and Section 5. Reproducibility: Our code is available at https://bit.ly/3fhWJfK along with our datasets for 'sanity ← R-D5 checks' and real-world datasets of homophily and heterophily graphs.

2. PROBLEM DEFINITION AND RELATED WORKS

We introduce the problem definition of semi-supervised node classification, symbols frequently used in this paper, and related works on graph neural networks (GNN).

Problem definition

We define the problem of semi-supervised node classification as follows: • Given An undirected graph G = (A, X), where A ∈ R n×n is an adjacency matrix, X ∈ R n×d is a node feature matrix, n is the number of nodes, and d is the number of features • Given Labels y ∈ {1, • • • , c} m for m nodes, where m ≪ n, and c is the number of classes.



SlenderGNN wins both on accuracy and training time: it is represented as the red star in (left) ogbn-arXiv, (middle) ogbn-Products and (right) Pokec, which are large real-world graphs (1.2M, 61.9M and 30.6M edges, respectively). Several baselines run out of memory ('crossed out').

Figure1: SlenderGNN outperforms existing GNN models, is fast, and passes all sanity checks. See our main results for details (sanity checks in Section 5 and real-world experiments in Section 6).

SlenderGNN succeeds in all sanity checks, while none of the existing models does. The table is generated from the results of our actual experiments in Table2: ✓ means success (accuracy ≥ 80%).

