MASKED LABEL PREDICTION: UNIFIED MESSAGE PASSING MODEL FOR SEMI-SUPERVISED CLASSIFICA-TION

Abstract

Graph neural network (GNN) and label propagation algorithm (LPA) are both message passing algorithms, which have achieved superior performance in semisupervised classification. GNN performs feature propagation by a neural network to make predictions, while LPA uses label propagation across graph adjacency matrix to get results. However, there is still no good way to combine these two kinds of algorithms. In this paper, we proposed a new Unified Message Passaging Model (UniMP) that can incorporate feature propagation and label propagation with a shared message passing network, providing a better performance in semisupervised classification. First, we adopt a Graph Transformer jointly label embedding to propagate both the feature and label information. Second, to train UniMP without overfitting in self-loop label information, we propose a masked label prediction strategy, in which some percentage of training labels are simply masked at random, and then predicted. UniMP conceptually unifies feature propagation and label propagation and be empirically powerful. It obtains new state-of-the-art semi-supervised classification results in Open Graph Benchmark (OGB).

1. INTRODUCTION

There are various scenarios in the world, e.g., recommending related news and products, discovering new drugs, or predicting social relations, which can be described as graph structures. Many methods have been proposed to optimize these graph-based problems and achieved significant success in many related domains such as predicting the properties of nodes (Yang et al., 2016; Kipf & Welling, 2016 ), links (Grover & Leskovec, 2016; Battaglia et al., 2018), and graphs (Duvenaud et al., 2015; Niepert et al., 2016; Bojchevski et al., 2018) . In the task of semi-supervised node classification, we are required to learn with labeled examples and then make predictions for those unlabeled ones. To better classify the nodes' labels in the graph, based on the Laplacian smoothing assumption (Li et al., 2018; Xu et al., 2018b) , the message passing models were proposed to aggregate the information from its connected neighbors in the graph, acquiring enough facts to produce a more robust prediction for unlabeled nodes. Generally, there are two kinds of practical methods to implement message passing model, the Graph Neural Networks (GNNs) (Kipf & Welling, 2016; Hamilton et al., 2017; Xu et al., 2018b; Liao et al., 2019; Xu et al., 2018a; Qu et al., 2019) and the Label Propagation Algorithms (LPAs) (Zhu, 2005; Zhu et al., 2003; Zhang & Lee, 2007; Wang & Zhang, 2007; Karasuyama & Mamitsuka, 2013; Gong et al., 2016; Liu et al., 2019) . GNNs combine graph structures by propagating and aggregating nodes features through several neural layers, which get predictions from feature propagation. While LPAs make predictions for unlabeled instances by label propagation iteratively. Since GNN and LPA are based on the same assumption, making semi-supervised classifications by information propagation, there is an intuition that incorporating them together for boosting performance. Some superior studies have proposed their graph models based on it. For example, APPNP (Klicpera et al., 2019) and TPN (Liu et al., 2019) integrate GNN and LPA by concatenating them together, and GCN-LPA (Wang & Leskovec, 2019) uses LPA to regularize their GCN model. How-

