AND THE REASONS FOR ITS SUCCESS

Abstract

What is the simplest, but still effective, graph neural network (GNN) that we can ← All reviewers design, say, for node classification? Einstein said that we should "make everything as simple as possible, but not simpler." We rephrase it into the 'careful simplicity' principle: a carefully-designed simple model can outperform sophisticated ones in real-world tasks, where data are scarce, noisy, and spuriously correlated. Based on that principle, we propose SlenderGNN that exhibits four desirable properties: It is (a) accurate, winning or tying on 11 out of 13 real-world datasets; (b) robust, being the only one that handles all settings (heterophily, random structure, useless features, etc.); (c) fast and scalable, with up to 18× faster training in million-scale graphs; and (d) interpretable, thanks to the linearity and sparsity we impose. We explain the success of SlenderGNN via a systematic study on existing models, comprehensive sanity checks, and ablation studies on its design decisions.

1. INTRODUCTION

What is the simplest, and still performant, graph neural network (GNN) that we can design? GNNs (Kipf & Welling, 2017; Hamilton et al., 2017; Gilmer et al., 2017) have succeeded in various graph mining tasks such as node classification, clustering, or link prediction. However, it is difficult for a practitioner to choose a proper model for each task without spending extensive time on searching, tuning, and training models due to a large number of GNN variants. Given all these variants, which one should a practitioner use first? Which are the strong and weak points of each variant? Could we design a variant that matches all of the strong points and avoids all the weak ones? In response to the questions above, we propose SlenderGNN based on the 'careful simplicity' prin-← All reviewers ciple: a simple, but carefully-designed model can be more accurate than complex ones due to better generalizability, robustness, and easier training. The design decisions of SlenderGNN (D1-4 in Section 4.2) are carefully made to follow this principle by observing and addressing the pain points of existing GNNs; for example, we generate various forms of graph-based features and combine them (D1), propose structural features (D2), remove redundancy in the generated features (D3), and make the propagator function contain no hyperparameters (D4). The resulting model, SlenderGNN, is our main contribution (C1) which exhibits the following desirable properties: • C1.1 -Accurate on both real-world and synthetic datasets, almost always winning or tying in the first place (see Figure 1b , Table 2 , and Table 3 ). • C1.2 -Robust, being able to handle numerous real settings such as homophily, heterophily, no network effects, graphs with useless features (see Figure 1a and Table 2 ). • C1.3 -Fast and scalable, using few, carefully chosen features, it takes only 32 seconds on million-scale graphs (ogbn-Products) on a stock server (see Figure 1b ). • C1.4 -Interpretable, learning the largest weights on informative features, ignoring noisy ones, based on the linear decision function (see Figure 2 ). The natural question that arises from the success of SlenderGNN is ← All reviewers Q: "How is it possible that a simpler model is more accurate than a sophisticated, more expressive one?" Our intuitive justification for the success of SlenderGNN is as follows: (a) Occam's razor: Since a statistical model tries to 'explain' the given labels, the simplest explanation performs best in general. 1

