ORTHOREG: IMPROVING GRAPH-REGULARIZED MLPS VIA ORTHOGONALITY REGULARIZATION

Abstract

Graph Neural Networks (GNNs) are currently dominating in modeling graphstructure data, while their high reliance on graph structure for inference significantly impedes them from widespread applications. By contrast, Graph-regularized MLPs (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks. This motivates us to study the causes of the limited performance of GR-MLPs. In this paper, we first demonstrate that node embeddings learned from conventional GR-MLPs suffer from dimensional collapse, a phenomenon in which the largest a few eigenvalues dominate the embedding space, when a linear encoder is used. As a result of this the expressive power of the learned node representations is constrained. We further propose ORTHO-REG, a novel GR-MLP model, to mitigate the dimensional collapse issue. Through a soft regularization loss on the correlation matrix of node embeddings, ORTHO-REG explicitly encourages orthogonal node representations and thus can naturally avoid dimensionally collapsed representations. Experiments on traditional transductive semi-supervised classification tasks and inductive node classification for cold-start scenarios demonstrate its effectiveness and superiority.

1. INTRODUCTION

are currently the dominant models for GML thanks to their powerful representation capability through iteratively aggregating information from neighbors. Despite their successes, such an explicit utilization of graph structure information hinders GNNs from being widely applied in industry-level tasks. On the one hand, GNNs rely on layer-wise message passing to aggregate features from the neighborhood, which is computationally inefficient during inference, especially when the model becomes deep (Zhang et al., 2021) . On the other hand, recent studies have shown that GNN models can not perform satisfactorily in cold-start scenarios where the connections of new incoming nodes are few or unknown (Zheng et al., 2021) . By contrast, Multi-Layer Perceptrons (MLPs) involve no dependence between pairs of nodes, indicating that they can infer much faster than GNNs (Zhang et al., 2021) . Besides, they can predict for all nodes fairly regardless of the numbers of connections, thus can infer more reasonably when neighborhoods are missing (Zheng et al., 2021) . However, it remains challenging to inject the knowledge of graph structure information into learning MLPs. One classical and popular method to mitigate this issue is Graph-Regularized MLPs (GR-MLPs in short). Generally, besides the basic supervised loss (e.g., cross-entropy), GR-MLPs employ 1



Figure1: As an MLP model, our method performs even better than GNN models on Pubmed, but with a much faster inference speed. GRAND(Feng et al.,  2020)  is one of the SOTA GNN models on task. Circled markers denote MLP baselines, and squared markers indicate GNN baselines.

