TOWARDS ROBUST GRAPH NEURAL NETWORKS AGAINST LABEL NOISE

Abstract

Massive labeled data have been used in training deep neural networks, thus label noise has become an important issue therein. Although learning with noisy labels has made great progress on image datasets in recent years, it has not yet been studied in connection with utilizing GNNs to classify graph nodes. In this paper, we propose a method, named LPM, to address the problem using Label Propagation (LP) and Meta learning. Different from previous methods designed for image datasets, our method is based on a special attribute (label smoothness) of graphstructured data, i.e., neighboring nodes in a graph tend to have the same label. A pseudo label is computed from the neighboring labels for each node in the training set using LP; meta learning is utilized to learn a proper aggregation of the original and pseudo label as the final label. Experimental results demonstrate that LPM outperforms state-of-the-art methods in graph node classification task with both synthetic and real-world label noise. Source code to reproduce all results will be released.

1. INTRODUCTION

Deep Neural Networks (DNNs) have achieved great success in various domains, but the necessity of collecting large amount of samples with high-quality labels is both expensive and time-consuming. To address this problem, cheaper alternatives have emerged. For example, the onerous labeling process can be completed on some crowdsourced system like Amazon Mechanical Turkfoot_0 . Besides, we can collect labeled samples from web with search engines and social media. However, all these methods are prone to produce noisy labels of low quality. As is shown in recent research (Zhang et al., 2016b) , an intractable problem is that DNNs can easily overfit to noisy labels, which dramatically degrades the generalization performance. Therefore, it is necessary and urgent to design some valid methods for solving this problem. Graph Neural Networks (GNNs) have aroused keen research interest in recent years, which resulted in rapid progress in graph-structured data analysis (Kipf & Welling, 2016; Velickovic et al., 2017; Xu et al., 2018; Hou et al., 2019; Wang & Leskovec, 2020) . Graph node classification is the mostcommon issue in GNNs. However, almost all the previous works about label noise focus on image classification problem and handling noisy labels in the task of graph node classification with GNNs has not been studied yet. Fortunately, most edges in the graph-structured datasets are intra-class edges (Wang & Leskovec, 2020) , indicating that a node's label can be estimated by its neighbor nodes' labels. In this paper, we utilize this special attribute of graph data to alleviate the damages caused by noisy labels. Moreover, meta learning paradigm serves as a useful tool for us to learn a proper aggregation between origin labels and pseudo labels as the final labels. The key contributions of this paper are as follows: • To the best of our knowledge, we are the first to focus on the label noise existing in utilizing GNNs to classify graph nodes, which may serve as a beginning for future research towards robust GNNs against label noise. • We utilize meta-learning to learn how to aggregate origin labels and pseudo labels properly to get more credible supervision instead of learning to re-weight different samples. We experimentally show that our LPM outperforms state-of-the-art algorithms in utilizing GNNs to classify graph nodes with both synthetic and real-world label noise.

2.1. GRAPH NEURAL NETWORKS

To start, we use G = (V, E, X ) to denote a graph whose nodes set is V and edges set is E, and X ∈ R n×d is the input feature matrix, where n denotes the number of nodes in the graph and d is the dimension of the input feature vector of each node. We use e u,v ∈ E to denote the edge that connects node u and v. For each node v ∈ V, its neighbor nodes set can be donated as N v = {u : e u,v ∈ E}. For node classification task, the goal of GNNs is to learn optimal mapping function f (•) to predict the class label y v for node v. Generally speaking, GNNs follows a framework including aggregation and combination in each layer. Different GNNs have proposed different ways of aggregation and combination. In general, the k-th layer of a GNN reads a (k) v = Aggregate (k) ({h (k-1) u : u ∈ N (v)}), h (k) v = Combine (k) (h (k-1) v , a (k) v ), where h (k) v is the output for k-th layer of node v, h v is the input vector of node v.

2.2. LABEL PROPAGATION

In Label Propagation (LP), node labels are propagated and aggregated along the edges in the graph (Zhou et al., 2004; Zhu et al., 2005; Wang & Zhang, 2007; Karasuyama & Mamitsuka, 2013) . There are some works which were designed to improve the performance of label propagation. proposed a novel framwork to propagate the label information of the sampled data (reliable) to adjacent data along a similarity based graph. Compared to these methods, we utilize the intrinsic graph structure instead of handcrafted graph to propagate clean labels information, which is more reliable for graph-structured data. Besides, GNNs are utilized by us to extract features and classify nodes for graph-structured data.

2.3. META-LEARNING BASED METHODS AGAINST NOISY LABELS

Meta-learning aims to learn not only neural networks' weights, but also itself, such as hand-designed parameters, optimizer and so on (Andrychowicz et al., 2016; Finn et al., 2017) 2019) utilize meta-learning paradigm to re-weight samples, i.e., weight samples with clean labels more and weight mislabeled samples less. The weighting factors are optimized by gradient decent or generated by a network to minimizes the loss on a small amount of samples with correct labels. In contrast, meta-learning paradigm is utilized in this paper to learn how to aggregate origin labels and pseudo labels properly. We can get more credible supervision by combining the original label information with the label information provided by LP properly.



https://www.mturk.com/



For example, Gong et al. (2016) proposed a novel iterative label propagation algorithm which explicitly optimizes the propagation quality by manipulating the propagation sequence to move from simple to difficult examples; Zhang et al. (2020) introduces a triple matrix recovery mechanism to remove noise from the estimated soft labels during propagation. Label propagation has been applied in semi-supervised image classification task. For example, Gong et al. (2017) used a weighted Knearest neighborhood graph to bridge the datapoints so that the label information can be propagated from the scarce labeled examples to unlabeled examples along the graph edges. Park et al. (2020)

