LEARNING FROM NOISY DATA WITH ROBUST REPRE-SENTATION LEARNING

Abstract

Learning from noisy data has attracted much attention, where most methods focus on label noise. In this work, we propose a new framework which simultaneously addresses three types of noise commonly seen in real-world data: label noise, outof-distribution input, and input corruption. In contrast to most existing methods, we combat noise by learning robust representation. Specifically, we embed images into a low-dimensional subspace by training an autoencoder on the deep features. We regularize the geometric structure of the subspace with robust contrastive learning, which includes an unsupervised consistency loss and a supervised mixup prototypical loss. Furthermore, we leverage the structure of the learned subspace for noise cleaning, by aggregating information from neighboring samples. Experiments on multiple benchmarks demonstrate state-of-the-art performance of our method and robustness of the learned representation. Our code will be released 1 .

1. INTRODUCTION

Data in real life is noisy. However, deep models with remarkable performance are mostly trained on clean datasets with high-quality human annotations. Manual data cleaning and labeling is an expensive process that is difficult to scale. On the other hand, there exists almost infinite amount of noisy data online. It is crucial that deep neural networks (DNNs) could harvest noisy training data. However, it has been shown that DNNs are susceptible to overfitting to noise (Zhang et al., 2017) . As shown in Figure 1 , a real-world noisy image dataset often consists of multiple types of noise. Label noise refers to samples that are wrongly labeled as another class (e.g. flower labeled as orange). Out-of-distribution input refers to samples that do not belong to any known classes. Input corruption refers to image-level distortion (e.g. low brightness) that causes data shift between training and test. Most of the methods in literature focus on addressing the more detrimental label noise. Two dominant approaches include: (1) find clean samples as those with smaller loss and assign larger weights to them (Han et al., 2018; Yu et al., 2019; Shen & Sanghavi, 2019; Arazo et al., 2019) ; (2) relabel noisy samples using model's predictions (Reed et al., 2015; Ma et al., 2018; Tanaka et al., 2018; Yi & Wu, 2019) . The recently proposed DivideMix (Li et al., 2020a) integrates both approaches in a co-training framework, but it also increases computation cost. Previous methods that focus on addressing label noise do not consider out-of-distribution input or input corruption, which limits their performance in real-world scenarios. Furthermore, using a model's own prediction to relabel samples could cause confirmation bias, where the prediction error accumulates and harms performance. We propose a new direction for effective learning from noisy data. Our method embeds images into noise-robust low-dimensional representations, and regularizes the geometric structure of the representations with contrastive learning. Specifically, our algorithmic contributions include: • We propose noise-robust contrastive learning, which introduces two contrastive losses. The first is an unsupervised consistency contrastive loss. It enforces inputs with perturbations to have similar normalized embeddings, which helps learn robust and discriminative representation. • Our second contrastive loss is a weakly-supervised mixup prototypical loss. We compute class prototypes as normalized mean embeddings, and enforces each sample's embedding to be closer to

Input Corruption Label Noise

Out-of-distribution Input For each sample, we aggregate information from its top-k neighbors to create a pseudo-label. A subset of training samples with confident pseudo-labels are selected to compute the weaklysupervised losses. This process can effectively clean both label noise and out-of-distribution (OOD) noise. Our experimental contributions include: • We experimentally show that our method is robust to label noise, OOD input, and input corruption. Experiments are performed on multiple datasets with controlled noise and real-world noise, where our method achieves state-of-the-art performance. • We demonstrate that the proposed noise cleaning method can effectively clean a majority of label noise. It also learns a curriculum that gradually leverages more samples to compute the weakly-supervised losses as the pseudo-labels become more accurate. • We validate the robustness of the learned low-dimensional representation by showing (1) k-nearest neighbor classification outperforms the softmax classifier. (2) OOD samples can be separated from in-distribution samples. The efficacy of the proposed autoencoder is also verified.

2. RELATED WORK

Label noise learning. Learning from noisy labels have been extensively studied in the literature. While some methods require access to a small set of clean samples (Xiao et al., 2015; Vahdat, 2017; Veit et al., 2017; Lee et al., 2018; Hendrycks et al., 2018) , most methods focus on the more challenging scenario where no clean labels are available. These methods can be categorized into two major types. The first type performs label correction using predictions from the network (Reed et al., 2015; Ma et al., 2018; Tanaka et al., 2018; Yi & Wu, 2019) . The second type tries to separate clean samples from corrupted samples, and trains the model on clean samples (Han et al., 2018; Arazo et al., 2019; Jiang et al., 2018; 2020; Wang et al., 2018; Chen et al., 2019; Lyu & Tsang, 2020) et al., 2018) , but their method does not enjoy the same level of robustness as ours. Contrastive learning. Contrastive learning is at the core of recent self-supervised representation learning methods (Chen et al., 2020; He et al., 2019; Oord et al., 2018; Wu et al., 2018) . In selfsupervised contrastive learning, two randomly augmented images are generated for each input image. Then a contrastive loss is applied to pull embeddings from the same source image closer, while pushing embeddings from different source images apart. Recently, prototypical contrastive learning (PCL) (Li et al., 2020b) has been proposed, which uses cluster centroids as prototypes, and trains the network by pulling an image embedding closer to its assigned prototypes.



Code is in the supplementary material



Figure 1: Google search images from WebVision (Li et al., 2017) dataset with keyword "orange".

. The recently proposed DivideMix (Li et al., 2020a) effectively combines label correction and sample selection with the Mixup (Zhang et al., 2018) data augmentation under a co-training framework.However, it cost 2⇥ the computational resource of our method.

