ORTHOREG: IMPROVING GRAPH-REGULARIZED MLPS VIA ORTHOGONALITY REGULARIZATION

Abstract

Graph Neural Networks (GNNs) are currently dominating in modeling graphstructure data, while their high reliance on graph structure for inference significantly impedes them from widespread applications. By contrast, Graph-regularized MLPs (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks. This motivates us to study the causes of the limited performance of GR-MLPs. In this paper, we first demonstrate that node embeddings learned from conventional GR-MLPs suffer from dimensional collapse, a phenomenon in which the largest a few eigenvalues dominate the embedding space, when a linear encoder is used. As a result of this the expressive power of the learned node representations is constrained. We further propose ORTHO-REG, a novel GR-MLP model, to mitigate the dimensional collapse issue. Through a soft regularization loss on the correlation matrix of node embeddings, ORTHO-REG explicitly encourages orthogonal node representations and thus can naturally avoid dimensionally collapsed representations. Experiments on traditional transductive semi-supervised classification tasks and inductive node classification for cold-start scenarios demonstrate its effectiveness and superiority.

1. INTRODUCTION

: As an MLP model, our method performs even better than GNN models on Pubmed, but with a much faster inference speed. GRAND (Feng et al., 2020) is one of the SOTA GNN models on task. Circled markers denote MLP baselines, and squared markers indicate GNN baselines. Graph Machine Learning (GML) has been attracting increasing attention due to its wide applications in many realworld scenarios, like social network analysis (Fan et al., 2019) , recommender systems (van den Berg et al., 2017; Wu et al., 2019b ), chemical molecules (Wang et al., 2021; Stärk et al., 2022) and biology structures. Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Hamilton et al., 2017; Velickovic et al., 2018; Xu et al., 2019) are currently the dominant models for GML thanks to their powerful representation capability through iteratively aggregating information from neighbors. Despite their successes, such an explicit utilization of graph structure information hinders GNNs from being widely applied in industry-level tasks. On the one hand, GNNs rely on layer-wise message passing to aggregate features from the neighborhood, which is computationally inefficient during inference, especially when the model becomes deep (Zhang et al., 2021) . On the other hand, recent studies have shown that GNN models can not perform satisfactorily in cold-start scenarios where the connections of new incoming nodes are few or unknown (Zheng et al., 2021) . By contrast, Multi-Layer Perceptrons (MLPs) involve no dependence between pairs of nodes, indicating that they can infer much faster than GNNs (Zhang et al., 2021) . Besides, they can predict for all nodes fairly regardless of the numbers of connections, thus can infer more reasonably when neighborhoods are missing (Zheng et al., 2021) . However, it remains challenging to inject the knowledge of graph structure information into learning MLPs. One classical and popular method to mitigate this issue is Graph-Regularized MLPs (GR-MLPs in short). Generally, besides the basic supervised loss (e.g., cross-entropy), GR-MLPs employ an additional regularization term on the final node embeddings or predictions based on the graph structure (Ando & Zhang, 2006; Zhou et al., 2003; Yang et al., 2021; Hu et al., 2021) . Though having different formulations, the basic idea is to make node embeddings/predictions smoothed over the graph structure. Even though these GR-MLP models can implicitly encode the graph structure information into model parameters, there is still a considerable gap between their performance compared with GNNs (Ando & Zhang, 2006; Yang et al., 2021) . Recently, another line of work, GNN-to-MLP knowledge distillation methods (termed by KD-MLPs) (Zhang et al., 2021; Zheng et al., 2021) , have been explored to incorporate graph structure with MLPs. In KD-MLPs, a student MLP model is trained using supervised loss and a knowledge-distillation loss from a well-trained teacher GNN model. Empirical results demonstrate that with merely node features as input, the performance of KD-MLPs can still match that of GNNs as long as they are appropriately learned. However, the 2-step training of KD-MLPs is undesirable, and they still require a well-trained GNN model as a teacher. This motivates us to rethink the failure of previous GR-MLPs to solve graph-related applications and study the reasons that limit their performance. Presented work: In this paper, we first demonstrate that node embeddings learned from existing GR-MLPs suffer from dimensional collapse (Hua et al., 2021; Jing et al., 2022) , a phenomenon that the embedding space of nodes is dominated by the largest (a few) eigenvalue(s). Our theoretical analysis demonstrates that the dimensional collapse in GR-MLP is due to the irregular feature interaction caused by the graph Laplacian matrix (see Lemma 1). We then propose Orthogonality Regularization (ORTHO-REG in short), a novel GR-MLP model, to mitigate the dimensional collapse issue in semi-supervised node representation learning tasks. The key design of ORTHO-REG is to enforce an additional regularization term on the output node embeddings, making them orthogonal so that different embedding dimensions can learn to express various aspects of information. Besides, ORTHO-REG extends the traditional first-order proximity preserving target to a more flexible one, improving the model's expressive power and generalization ability to non-homophily graphs. We provide a thorough evaluation for ORTHO-REG on various node classification tasks. The empirical results demonstrate that ORTHO-REG can achieve competitive or even better performance than GNNs. Besides, using merely node features to make predictions, ORTHO-REG can infer much faster on large-scale graphs and make predictions more reasonable for new nodes without connections. In Fig. 1 we present the performance of ORTHO-REG compared with GNNs and other MLPs on Pubmed, where ORTHO-REG achieves SOTA performance with the fastest inference speed. We summarize our contributions as follows: 1) We are the first to examine the limited representation power of existing GR-MLP models from the perspective of dimensional collapse. We provide theoretical analysis and empirical studies to justify our claims. 2) To mitigate the dimensional collapse problem, we design a novel GR-MLP model named ORTHO-REG. ORTHO-REG encourages the node embeddings to be orthogonal through explicit soft regularization, thus can naturally avoid dimensional collapse. 3) We conduct experiments on traditional transductive semi-supervised node classification tasks and inductive node classification under cold-start scenarios on public datasets of various scales. The numerical results and analysis demonstrate that by learning orthogonal node representations, ORTHO-REG can outperform GNN models on these tasks.

2. BACKGROUNDS AND RELATED WORKS 2.1 PROBLEM FORMULATION

We mainly study a general semi-supervised node classification task on a single homogeneous graph where we only have one type of node and edge. We denote a graph by G = (V, E), where V is the node set, and E is the edge set. For a graph with N nodes (i.e., |V| = N ), we denote the node feature matrix by X ∈ R N ×D , the adjacency matrix by A ∈ R N ×N . In semi-supervised node classification tasks, only a small portion of nodes are labeled, and the task is to infer the labels of unlabeled nodes using the node features and the graph structure. Denote the labeled node set by V L and the unlabeled node set by V U , then we have V L ∩ V U = ∅ and V L ∪ V U = V. Denote the one-hot ground-truth labels of nodes by Ŷ ∈ R N ×C , and the predicted labels by Y . One can learn node embeddings H using node features X and adjacency matrix A, and use the



Figure1: As an MLP model, our method performs even better than GNN models on Pubmed, but with a much faster inference speed. GRAND(Feng et al.,  2020)  is one of the SOTA GNN models on task. Circled markers denote MLP baselines, and squared markers indicate GNN baselines.

