NODE CLASSIFICATION BEYOND HOMOPHILY: TOWARDS A GENERAL SOLUTION

Abstract

Graph neural networks (GNNs) have become core building blocks behind a myriad of graph learning tasks. The vast majority of the existing GNNs are built upon, either implicitly or explicitly, the homophily assumption, which is not always true and could heavily degrade the performance of learning tasks. In response, GNNs tailored for heterophilic graphs have been developed. However, most of the existing works are designed for the specific GNN models to address heterophily, which lacks generality. In this paper, we study the problem from the structure learning perspective and propose a family of general solutions named ALT. It can work hand in hand with most of the existing GNNs to decently handle graphs with either low or high homophily. The core of our method is learning to (1) decompose a given graph into two components, (2) extract complementary graph signals from these two components, and (3) adaptively merge the graph signals for node classification. Moreover, analysis based on graph signal processing shows that our framework can empower a broad range of existing GNNs to have adaptive filter characteristics and further modulate the input graph signals, which is critical for handling complex homophilic/heterophilic patterns. The proposed ALT brings significant and consistent performance improvement in node classification for a wide range of GNNs over a variety of real-world datasets.

1. INTRODUCTION

Graph neural networks (GNNs) have demonstrated the great power as building blocks for a variety of graph learning tasks, such as node classification (Kipf & Welling, 2017) , graph classification (Xu et al., 2018) , link prediction (Zhang & Chen, 2018), clustering (Bianchi et al., 2020) , and many more. Most of the existing GNNs follow the homophily assumption, i.e., edges tend to connect nodes with the same labels and similar node features. Such an assumption holds true for networks such as citation networks (Yang et al., 2016; Bojchevski & Günnemann, 2018) where a paper tends to cite related literature. However, in many other cases, the heterophilic settings arise. For instance, to form a protein structure, different types of amino acids are more likely to be linked together (Zhu et al., 2020) . On such heterophilic networks, the performance of classic GNN models (Klicpera et al., 2018; Veličković et al., 2018; Hamilton et al., 2017) could degrade greatly and might be even worse than an MLP which does not utilize any topology information at all (Zhu et al., 2020) . In response, researchers have analyzed the limitations of the existing GNNs in the presence of node heterophily and further proposed specific models to address it from both the spatial and spectral perspectives. For instance, an important design by H2GCN (Zhu et al., 2020) is that high-order neighbors should be considered during message aggregation. GPRGNN (Chien et al., 2021 ) also aggregates messages from multi-hop neighbors but it emphasizes that messages can also be negative via a set of learnable aggregation weights. From the spectral perspective, FAGCN (Bo et al., 2021) points out that low-pass filter-based GNNs smooth the node representations between connected nodes, which is not desirable for the heterophilic settings where connected nodes are more likely to have different labels. Hence, FAGCN (Bo et al., 2021) adaptively mixes the low-pass graph filter with the high-pass graph filter via an attention mechanism to tackle this problem. A more detailed review of related work can be found in Section 5. Despite the theoretic insights and empirical performance gain, most of the existing works focus on the model level, i.e., they aim to propose better GNNs models to handle the heterophilic graphs. In other words, the success of their methods relies on specific designs of GNN models. In this paper, we take a step further and ask: how to develop a generic method to benefit a broad range of GNNs for node classification beyond homophily, even if they are not originally tailored for the heterophilic graphs? To this end, we address this problem from a structure learning (Zhu et al., 2021b) perspective, that is, we optimize the given graph structure to benefit downstream tasks (e.g., node classification). Different from the existing approaches that refine the specific GNNs models, our approach focuses on the data level by optimizing the input graph topology to tackle heterophily. Challenges. In pursuing such a data-centric general solution, here are the key challenges. First (model diversity), our goal is to strengthen a broad range of established GNNs so that they can handle graphs with arbitrary homophily. However, the aggregation mechanism and the graph convolution kernels are different between various GNN models. It is unknown how to accommodate diverse GNNs seamlessly. Second (theoretical foundation), analyses on the success of some specific GNNs for heterophilic graphs have recently emerged (e.g., from the graph signal processing perspective (Shuman et al., 2013) ). However, few works focus on the theoretical foundation of structure learning and its connection to dealing with graphs with low homophily. Our main contributions are listed as follows: (1) We propose a general graph structure learning-based framework named duAL sTructure learning (ALT), which can accommodate a variety of GNN models. Specifically, after removing the activation function from the last layer, any GNN can be plugged into our framework and be trained end-to-end with common optimizers. (2) We provide a detailed analysis from the graph signal processing perspective. Our analysis guides the design of ALT and validates its effectiveness theoretically. ( 3) Experiments show that with the help of ALT, the node classification accuracy of a broad range of existing GNNs is boosted on heterophilic graphs, and meanwhile kept competitive on homophilic graphs.

2. PRELIMINARIES

Notations. We use bold uppercase letters for matrices (e.g., A), bold lowercase letters for column vectors (e.g., u), lowercase and uppercase letters in regular font for scalars (e.g., d, K), and calligraphic letters for sets (e.g., T ). We use A[i, j] to represent the entry of matrix A at the i-th row and the j-th column, A[i, :] to represent the i-th row of matrix A, and A[:, j] to represent the j-th column of matrix A. Similarly, u[i] denotes the i-th entry of vector u. Superscript ⊤ denotes the transpose of matrices and vectors. ⊙ denotes the Hadamard product. An attributed graph can be represented as G = {A, X} which is composed of an adjacency matrix A ∈ R n×n and an attribute matrix X ∈ R n×d , where n is the number of nodes and d is the node feature dimension. In total, nodes can be categorized into a set of classes C. The normalized Laplacian matrix is L = I -D -1 2 AD -1 2 where D is the diagonal degree matrix of A. It can be decomposed as L = UΛU ⊤ where U ∈ R n×n is the eigenvector matrix and Λ ∈ R n×n is the diagonal eigenvalue matrix. In graph signal processing (Shuman et al., 2013) , the diagonal entry of Λ represents frequency and Λ[i, i] = λ i . Given a signal x ∈ R n , its graph Fourier transform (Shuman et al., 2013) is represented as x = Ux, and its inverse graph Fourier transform is defined as x = U ⊤ x. For a diffusion matrix C ∈ R n×n , its frequency response (or profile (Balcilar et al., 2021) ) is defined as Φ fp = diag -1 (U ⊤ CU) where diag -1 (•) returns the diagonal entries. This frequency response is also known as the filter and the convolution kernel. Semi-supervised Node Classification. In this paper, we study semi-supervised node classification (Yang et al., 2016; Kipf & Welling, 2017) where the graph topology A, all node features X, and a part of node labels are given and our goal is to predict the labels of unlabelled nodes. Numerous works (Kipf & Welling, 2017; Veličković et al., 2018; Klicpera et al., 2018) achieve impressive performance on this problem. However, recent studies show that their successes heavily rely upon the homophily assumption of the given graphs (Zheng et al., 2022; Zhu et al., 2020) . In general, homophily describes to what extent edges tend to link nodes with the same labels and similar features. Following previous works (Zhu et al., 2020; Pei et al., 2019) , this paper focuses on the node label homophily. There are various homophily metrics and we introduce one of them named edge homophily (Zhu et al., 2020) as: h(G) = i,j,A[i,j]=1 y[i]=y [j] i,j A[i,j] ∈ [0, 1], where x = 1 if x is true and 0 otherwise. The more homophilic a given graph is, the closer its h(G) is to 1.

