BAYESIAN METRIC LEARNING FOR ROBUST TRAINING OF DEEP MODELS UNDER NOISY LABELS

Abstract

Label noise is a natural event of data collection and annotation and has been shown to have significant impact on the performance of deep learning models regarding accuracy reduction and sample complexity increase. This paper aims to develop a novel theoretically sound Bayesian deep metric learning that is robust against noisy labels. Our proposed approach is inspired by a linear Bayesian large margin nearest neighbor classification, and is a combination of Bayesian learning, triplet loss-based deep metric learning and variational inference frameworks. We theoretically show the robustness under label noise of our proposed method. The experimental results on benchmark data sets that contain both synthetic and realistic label noise show a considerable improvement in the classification accuracy of our method compared to the linear Bayesian metric learning and the point estimate deep metric learning.

1. INTRODUCTION

Deep learning has been shown as a dominant learning framework in various domains of machine learning and computer vision. One of the major limitations of deep learning is that it often requires relatively clean data sets that do not contain label noise naturally caused by human labeling errors, measurement errors, subjective biases and other issues (Frénay et al., 2014; Ghosh et al., 2017; Algan & Ulusoy, 2019) . The performance of a machine learning method can be significantly affected by noisy labels both in terms of the reduction in the accuracy rate and the increase in sample complexity. Particularly for deep learning, a deep neural network (DNN) can generalize poorly when trained with noisy training sets which contain high proportion of noisy labels since a DNN can over-fit those noisy training data sets (Zhang et al., 2016; Algan & Ulusoy, 2020) . Developing deep learning methods that can perform well on noisy training data is essential since it can enable the use of deep models in many real-life applications. There have been several approaches proposed to handle learning issues caused by label noise, for example: data cleaning (Angelova et al., 2005; Chu et al., 2016 ), label correction (Reed et al., 2014) , additional linear correction layers (Sukhbaatar et al., 2014) , dimensionality-driven learning (Ma et al., 2018 ), bootstrapping (Reed et al., 2014) , curriculum learning-model based approach such as MentorNet (Jiang et al., 2018) or CoTeaching (Han et al., 2018) , loss correction (or noisetolerant loss) (Masnadi-Shirazi & Vasconcelos, 2009; Ghosh et al., 2017; Zhang & Sabuncu, 2018; Thulasidasan et al., 2019; Ma et al., 2020) , or a combination of the techniques above (Li et al., 2020; Nguyen et al., 2019) . Relevant to this paper is an existing theoretically sound approach: Bayesian large margin nearest neighbor classification (BLMNN) (Wang & Tan, 2018) that employs Bayesian inference to improve the robustness of a point estimation-based linear metric learning method. BLMNN then introduces a method to approximate the posterior distribution of the underlying distance parameter given the triplet data by using the stochastic variational inference. More importantly, BLMNN (Wang & Tan, 2018 ) also provides a theoretical guarantee about the robustness of the method, which says that it can work with non-uniform label noise. Although BLMNN has been mathematically shown to be robust against label noise, it only focuses on a simple linear Mahalanobis distance that can not capture the nonlinear relationships of data points in deep metric learning (Lu et al., 2017) . In this paper, we introduce a Bayesian deep metric learning framework that is robust against noisy labels. Our proposed method (depicted in Fig. 1 ) is inspired by the BLMNN (Wang & Tan, 2018), Figure 1 : An overview of our proposed Bayesian Deep Metric Learning method deep metric learning (Hoffer & Ailon, 2015; Hu et al., 2015; Wang et al., 2017; Lu et al., 2017; Do et al., 2019) , and Bayes by Backprop (Blundell et al., 2015) . Compared to the BLMNN that only considers a linear metric learning, our framework can handle non-linear deep metric learning, which is useful for many real-life applications. Moreover, directly applying the variational Bayes learning (Wang & Tan, 2018) in deep learning is challenging since it requires sampling from a distribution of the neural network parameters. Instead, we adapt the variational inference by Blundell et al. (Blundell et al., 2015) , which allows to efficiently sample the parameters of a Bayes neural networks by using a backpropagation-compatible algorithm. We also theoretically show the robustness of our proposed method when working with label noise. The experimental results on several noisy data sets show that our novel proposed method can generalize better compared to the linear BLMNN (Wang & Tan, 2018) and the point estimation-based deep metric learning (Hoffer & Ailon, 2015; Lu et al., 2017) , especially when the noise level increases. It is important to emphasize that the motivation of our method is to produce a better calibrated model that is more robust to noisy label training, and, as a result, less likely to overfit the training set than the linear BLMNN (Wang & Tan, 2018) and the point estimation-based deep metric learning (Hoffer & Ailon, 2015; Lu et al., 2017) . Therefore this is a paper that introduces a new theoretical framework to solve noisy label learning instead of presenting method that is competitive against the best approaches of the field (such as Mentornet (Jiang et al., 2018) or Co-teaching (Han et al., 2018) ) in large-scale datasets (e.g., webvision (Li et al., 2017) and Clothing 1M (Xiao et al., 2015) ). Furthermore, deep metric learning has been in fact considered in the Bayesian settings before (Ishfaq et al., 2018; Karaletsos et al., 2015), and recently, in (Lin et al., 2018) , but not in the context of noisy labels. Consequently, our proposed framework can be used by other methods that can deal with noisy label learning, but the extension of those methods using our proposed approach is out of the scope of this paper.

2.1. POINT ESTIMATION-BASED DISTANCE METRIC LEARNING

The goal of distance metric learning (or metric learning) is to learn a distance function to measure the similarity between training samples. Metric learning has been shown to have great success in many visual applications such as face recognition, image classification, visual search, visual tracking, and person re-identification (Lu et al., 2017) . In principle, a supervised metric learning method aims to learn a distance metric which pulls together samples from the same class while pushing away those from different classes. Based on the complexity of the distance, metric learning can be classified into two types: linear, focusing on linear distance (e.g., Mahalanobis), which often suffers from the nonlinear relationship of data points (Lu et al., 2017) ; and non-linear, which nowadays is mostly based on deep learning.

