LEARNING MLPS ON GRAPHS: A UNIFIED VIEW OF EFFECTIVENESS, ROBUSTNESS, AND EFFICIENCY

Abstract

While Graph Neural Networks (GNNs) have demonstrated their efficacy in dealing with non-Euclidean structural data, they are difficult to be deployed in real applications due to the scalability constraint imposed by the multi-hop data dependency. Existing methods attempt to address this scalability issue by training student multi-layer perceptrons (MLPs) exclusively on node content features using labels derived from the teacher GNNs. However, the trained MLPs are neither effective nor robust. In this paper, we ascribe the lack of effectiveness and robustness to three significant challenges: 1) the misalignment between content feature and label spaces, 2) the strict hard matching to teacher's output, and 3) the sensitivity to node feature noises. To address the challenges, we propose NOSMOG, a novel method to learn NOise-robust Structure-aware MLPs On Graphs, with remarkable effectiveness, robustness, and efficiency. Specifically, we first address the misalignment by complementing node content with position features to capture the graph structural information. We then design an innovative representational similarity distillation strategy to inject soft node similarities into MLPs. Finally, we introduce adversarial feature augmentation to ensure stable learning against feature noises. Extensive experiments and theoretical analyses demonstrate the superiority of NOSMOG by comparing it to GNNs and the state-of-the-art method in both transductive and inductive settings across seven datasets. Codes are available at

1. INTRODUCTION

Graph Neural Networks (GNNs) have shown exceptional effectiveness in handling non-Euclidean structural data and have achieved state-of-the-art performance across a broad range of graph mining tasks (Hamilton et al., 2017; Kipf & Welling, 2017; Veličković et al., 2018) . The success of modern GNNs relies on the usage of message passing architecture, which aggregates and learns node representations based on their (multi-hop) neighborhood (Wu et al., 2020; Zhou et al., 2020) . However, message passing is time-consuming and computation-intensive, making it challenging to apply GNNs to real large-scale applications that are always constrained by latency and require the deployed model to infer fast (Zhang et al., 2020; 2022a) . To meet the latency requirement, multi-layer perceptrons (MLPs) continue to be the first choice (Zhang et al., 2022b) , despite the fact that they perform poorly in non-euclidean data and focus exclusively on the node content information. Inspired by the performance advantage of GNNs and the latency advantage of MLPs, researchers have explored combining GNNs and MLPs together to enjoy the advantages of both (Zhang et al., 2022b; Zheng et al., 2022; Chen et al., 2021) . To combine them, one effective approach is to use knowledge distillation (KD) (Hinton et al., 2015) , where the learned knowledge is transferred from GNNs to MLPs through soft labels (Phuong & Lampert, 2019) . Then only MLPs are deployed for inference, with node content features as input. In this way, MLPs can perform well by mimicking the output of GNNs without requiring explicit message passing, and thus obtaining a fast inference speed (Hu et al., 2021) . Nevertheless, existing methods are neither effective nor robust, with three major drawbacks: (1) MLPs cannot fully align the input content feature to the label space, especially when node labels are correlated with the graph structure; (2) MLPs rely on the teacher's output to learn a

availability

https://github.com/meettyj

