SET-LEVEL SELF-SUPERVISED LEARNING FROM NOISILY-LABELED DATA

Abstract

Noisy labels are inevitably presented in real-world datasets due to labeling error or visual content ambiguity. Existing methods generally approach the task of noisy label learning (NLL) by either properly regularizing the model, or reweighting clean/noisy labeled samples. While self-supervised learning (SSL) has been applied to pre-train deep neural networks without label supervision, downstream tasks like image classification still require clean labeled data. And, most SSL strategies are performed at the instance level, without accessing its label. In this paper, we propose set-level self-supervised learning (SLSSL), which performs SSL at mini-batch levels with observed noisy labels. By corrupting the labels of each training mini-batch, our SLSSL enforces the model to exhibit sufficient robustness. Moreover, the proposed SLSSL can also be utilized for sample reweighting technique. As a result, the proposed learning scheme can be applied as an expectation-maximization (EM) algorithm during model training. Extensive experiments on synthetic and real-world noisy label data confirm the effectiveness of our framework.

1. INTRODUCTION

Deep learning has shown tremendous success in numerous computer vision and machine learning tasks. However, collecting a large amount of precisely labeled data for training a deep neural network (DNN) is typically time-consuming and labor-intensive. Moreover, in practice, real-world datasets are usually annotated with noisy labels. In order to alleviate possible overfitting problems (Arpit et al., 2017; Zhang et al., 2017) , noisy-label learning (NLL) has attracted the attention from researchers in related fields (Frenay & Verleysen, 2014; Song et al., 2021) . Recent deep-learning based NLL approaches can be categorized into two groups (Song et al., 2021; Karim et al., 2022) . The first group focuses on loss correction (Patrini et al., 2017; Hendrycks et al., 2018; Xia et al., 2019; Wang et al., 2020; Yao et al., 2020) , which learns a class-wise noise transition matrix to counteract the noise effect during training, so that the predicted labels can be updated accordingly. The second group of NLL works present various sample selection algorithms (Li et al., 2020; Nishi et al., 2021; Karim et al., 2022) , aiming at filtering out noisy samples. Once the noisy labels are removed, semi-supervised learning techniques can be applied for training learning models. While promising performances have been reported, the above learning strategies rely on the prediction of the derived noise transition matrix or instance weights, which require proper learning and estimation using the training data and their noisy labels. Instead of directly utilizing the noisy labels, self-supervised learning (SSL) has been recently applied to NLL tasks (Hendrycks et al., 2019; Ghosh & Lan, 2021; Yao et al., 2021; Ortego et al., 2021) . By properly designing pretext tasks, additional supervisory signal can be derived to improve robustness of the model against label noise. However, existing SSL approaches design pretext tasks by manipulating samples at the instance level, regardless of the correctness of its label. As for instance-level pretext tasks (e.g., rotation prediction or contrastive-based instance discrimination), they are expected to produce proper representations from unlabeled data; it is not clear whether such techniques would result in robust representations when tackling the problem of noisy label learning (NLL). In this paper, we propose a novel unique SSL approach for NLL. More precisely, we present a setlevel self-supervised learning (SLSSL) strategy for training NLL models. As illustrated in Fig. 1 , Figure 1 : Illustration of our set-level self-supervised learning (SLSSL). Unlike instance-level SSL approaches, our SLSSL augments an image set (e.g., mini-batch) by manipulating its labels. By maximizing the agreement between the two augmented versions, our SLSSL results in learning models which are robust to noisily-labeled data. given a set (mini-batch) of training samples, our SLSSL augments noisly labeled data by corrupting a portion of its labels for updating the DNN through a single-step optimization, while the updated model is enforced to maximize the performance agreement between different augmentation versions. As detailed later in Sect. 3, our SLSSL objective is formed to estimate the class-wise noise transition matrix, allowing us to enhance the robustness of the learned model. In addition, we show that the proposed SLSSL can be utilized to reweight samples for sample selection purpose. Unlike existing works that perform sample selection by assuming that instances with small losses are with clean labels, our SLSSL learns to assign larger weights to those resulting in significant performance degradation during label corruption, identifying the data with clean labels accordingly. Finally, we demonstrate that our SLSSL can be realized in an expectation-maximization (EM) like algorithm, with E-steps focusing on training model with noisy labeled data, and M-steps identifying clean data samples for training. As verified in our experiments, this alternating training strategy further boosts the performance of our framework. The contributions of this paper are highlighted below: • We propose set-level self-supervised learning (SLSSL) to tackle noisy-label learning (NLL) tasks, which augment image sets and enforce the model to be robust to noisy labels. • By systematically corrupting the labels during training and enforcing prediction consistency between associated models, our SLSSL can be applied to estimate the noise transition matrix, which introduces sufficient robustness to the learned model against noisy labels. • Our SLSSL can be further utilized to identify the label quality of each training sample, and thus sample selection for NLL can be performed accordingly. • Our proposed learning strategy can be further viewed as an EM-like algorithm, which alternates between model training and sample reweighting for improved NLL.

2. RELATED WORKS

Loss Correction for NLL A number of NLL works (Goldberger & Ben-Reuven, 2017; Patrini et al., 2017; Hendrycks et al., 2018; Xia et al., 2019; Wang et al., 2020; Yao et al., 2020; Zhu et al., 2022) focus on estimating the class-wise noise transition matrix of noisy training data, which describes the relationships between noisy labels and their ground-truth ones and thus can be applied to refine the predicted outputs accordingly. It is shown in (Patrini et al., 2017 ) that minimizing such corrected loss toward noisy labels is equivalent to optimizing the DNN toward the groundtruth labels. However, despite of their theoretical foundation, how to accurately estimate the noise transition matrix remains a challenging problem, especially when no clean training/validation sets are available.

