TOWARDS ROBUST GRAPH NEURAL NETWORKS AGAINST LABEL NOISE

Abstract

Massive labeled data have been used in training deep neural networks, thus label noise has become an important issue therein. Although learning with noisy labels has made great progress on image datasets in recent years, it has not yet been studied in connection with utilizing GNNs to classify graph nodes. In this paper, we propose a method, named LPM, to address the problem using Label Propagation (LP) and Meta learning. Different from previous methods designed for image datasets, our method is based on a special attribute (label smoothness) of graphstructured data, i.e., neighboring nodes in a graph tend to have the same label. A pseudo label is computed from the neighboring labels for each node in the training set using LP; meta learning is utilized to learn a proper aggregation of the original and pseudo label as the final label. Experimental results demonstrate that LPM outperforms state-of-the-art methods in graph node classification task with both synthetic and real-world label noise. Source code to reproduce all results will be released.

1. INTRODUCTION

Deep Neural Networks (DNNs) have achieved great success in various domains, but the necessity of collecting large amount of samples with high-quality labels is both expensive and time-consuming. To address this problem, cheaper alternatives have emerged. For example, the onerous labeling process can be completed on some crowdsourced system like Amazon Mechanical Turkfoot_0 . Besides, we can collect labeled samples from web with search engines and social media. However, all these methods are prone to produce noisy labels of low quality. As is shown in recent research (Zhang et al., 2016b) , an intractable problem is that DNNs can easily overfit to noisy labels, which dramatically degrades the generalization performance. Therefore, it is necessary and urgent to design some valid methods for solving this problem. Graph Neural Networks (GNNs) have aroused keen research interest in recent years, which resulted in rapid progress in graph-structured data analysis (Kipf & Welling, 2016; Velickovic et al., 2017; Xu et al., 2018; Hou et al., 2019; Wang & Leskovec, 2020) . Graph node classification is the mostcommon issue in GNNs. However, almost all the previous works about label noise focus on image classification problem and handling noisy labels in the task of graph node classification with GNNs has not been studied yet. Fortunately, most edges in the graph-structured datasets are intra-class edges (Wang & Leskovec, 2020) , indicating that a node's label can be estimated by its neighbor nodes' labels. In this paper, we utilize this special attribute of graph data to alleviate the damages caused by noisy labels. Moreover, meta learning paradigm serves as a useful tool for us to learn a proper aggregation between origin labels and pseudo labels as the final labels. The key contributions of this paper are as follows: • To the best of our knowledge, we are the first to focus on the label noise existing in utilizing GNNs to classify graph nodes, which may serve as a beginning for future research towards robust GNNs against label noise. • We utilize meta-learning to learn how to aggregate origin labels and pseudo labels properly to get more credible supervision instead of learning to re-weight different samples.



https://www.mturk.com/ 1

