NODE CLASSIFICATION BEYOND HOMOPHILY: TOWARDS A GENERAL SOLUTION

Abstract

Graph neural networks (GNNs) have become core building blocks behind a myriad of graph learning tasks. The vast majority of the existing GNNs are built upon, either implicitly or explicitly, the homophily assumption, which is not always true and could heavily degrade the performance of learning tasks. In response, GNNs tailored for heterophilic graphs have been developed. However, most of the existing works are designed for the specific GNN models to address heterophily, which lacks generality. In this paper, we study the problem from the structure learning perspective and propose a family of general solutions named ALT. It can work hand in hand with most of the existing GNNs to decently handle graphs with either low or high homophily. The core of our method is learning to (1) decompose a given graph into two components, (2) extract complementary graph signals from these two components, and (3) adaptively merge the graph signals for node classification. Moreover, analysis based on graph signal processing shows that our framework can empower a broad range of existing GNNs to have adaptive filter characteristics and further modulate the input graph signals, which is critical for handling complex homophilic/heterophilic patterns. The proposed ALT brings significant and consistent performance improvement in node classification for a wide range of GNNs over a variety of real-world datasets.

1. INTRODUCTION

Graph neural networks (GNNs) have demonstrated the great power as building blocks for a variety of graph learning tasks, such as node classification (Kipf & Welling, 2017 ), graph classification (Xu et al., 2018 ), link prediction (Zhang & Chen, 2018 ), clustering (Bianchi et al., 2020) , and many more. Most of the existing GNNs follow the homophily assumption, i.e., edges tend to connect nodes with the same labels and similar node features. Such an assumption holds true for networks such as citation networks (Yang et al., 2016; Bojchevski & Günnemann, 2018) where a paper tends to cite related literature. However, in many other cases, the heterophilic settings arise. For instance, to form a protein structure, different types of amino acids are more likely to be linked together (Zhu et al., 2020) . On such heterophilic networks, the performance of classic GNN models (Klicpera et al., 2018; Veličković et al., 2018; Hamilton et al., 2017) could degrade greatly and might be even worse than an MLP which does not utilize any topology information at all (Zhu et al., 2020) . In response, researchers have analyzed the limitations of the existing GNNs in the presence of node heterophily and further proposed specific models to address it from both the spatial and spectral perspectives. For instance, an important design by H2GCN (Zhu et al., 2020) is that high-order neighbors should be considered during message aggregation. GPRGNN (Chien et al., 2021 ) also aggregates messages from multi-hop neighbors but it emphasizes that messages can also be negative via a set of learnable aggregation weights. From the spectral perspective, FAGCN (Bo et al., 2021) points out that low-pass filter-based GNNs smooth the node representations between connected nodes, which is not desirable for the heterophilic settings where connected nodes are more likely to have different labels. Hence, FAGCN (Bo et al., 2021) adaptively mixes the low-pass graph filter with the high-pass graph filter via an attention mechanism to tackle this problem. A more detailed review of related work can be found in Section 5. Despite the theoretic insights and empirical performance gain, most of the existing works focus on the model level, i.e., they aim to propose better GNNs models to handle the heterophilic graphs. In 1

