LINKLESS LINK PREDICTION VIA RELATIONAL DISTILLATION

Abstract

Graph Neural Networks (GNNs) have been widely used on graph data and have shown exceptional performance in the task of link prediction. Despite their effectiveness, GNNs often suffer from high latency due to non-trivial neighborhood data dependency in practical deployments. To address this issue, researchers have proposed methods based on knowledge distillation (KD) to transfer the knowledge from teacher GNNs to student MLPs, which are known to be efficient even with industrial scale data, and have shown promising results on node classification. Nonetheless, using KD to accelerate link prediction is still unexplored. In this work, we start with exploring two direct analogs of traditional KD for link prediction, i.e., predicted logit-based matching and node representation-based matching. Upon observing direct KD analogs do not perform well for link prediction, we propose a relational KD framework, Linkless Link Prediction (LLP). Unlike simple KD methods that match independent link logits or node representations, LLP distills relational knowledge that is centered around each (anchor) node to the student MLP. Specifically, we propose two matching strategies that complement each other: rank-based matching and distribution-based matching. Extensive experiments demonstrate that LLP boosts the link prediction performance of MLPs with significant margins, and even outperforms the teacher GNNs on 6 out of 9 benchmarks. LLP also achieves a 776.37× speedup in link prediction inference compared to GNNs on the large scale OGB-Citation2 dataset.

1. INTRODUCTION

Graph neural networks (GNNs) have been widely used for machine learning on graph-structured data (Kipf & Welling, 2016a; Hamilton et al., 2017) . They have shown significant performance in various applications, such as node classification (Veličković et al., 2017; Chen et al., 2020 ), graph classification (Zhang et al., 2018; Ying et al., 2018b ), graph generation (You et al., 2018; Shiao & Papalexakis, 2021) , and link prediction (Zhang & Chen, 2018) . Of these, link prediction is a notably critical problem in the graph machine learning community, which aims to predict the likelihood of any two nodes forming a link. It has broad practical applications such as knowledge graph completion (Schlichtkrull et al., 2018; Nathani et al., 2019; Vashishth et al., 2020) , friend recommendation on social platforms (Sankar et al., 2021; Tang et al., 2022; Fan et al., 2022) and item recommendation for users on service and commerce platforms (Koren et al., 2009; Ying et al., 2018a; He et al., 2020) . With the rising popularity of GNNs, state-of-the-art link prediction methods adopt encoder-decoder style models, where encoders are GNNs, and decoders are applied directly on pairs of node representations learned by the GNNs (Kipf & Welling, 2016b; Zhang & Chen, 2018; Cai & Ji, 2020; Zhao et al., 2022) . The success of GNNs is typically attributed to the explicit use of contextual information from nodes' surrounding neighborhoods (Zhang et al., 2020e) . However, this induces a heavy reliance on neighborhood fetching and aggregation schemes, which can lead to high time cost in training and inference compared to tabular models, such as multi-layer perceptrons (MLPs), especially owing to neighbor explosion (Zhang et al., 2020b; Jia et al., 2020; Zhang et al., 2021b; Zeng et al., 2019) . Compared to GNNs, MLPs do not require any graph topology information, making them more suitable for new or isolated nodes (e.g., for cold-start settings), but usually resulting in worse general task performance as encoders, which we also empirically validate Section 4. Nonetheless, having

