LEARNING TO SEGMENT FROM NOISY ANNOTATIONS: A SPATIAL CORRECTION APPROACH

Abstract

Noisy labels can significantly affect the performance of deep neural networks (DNNs). In medical image segmentation tasks, annotations are error-prone due to the high demand in annotation time and in the annotators' expertise. Existing methods mostly assume noisy labels in different pixels are i.i.d. However, segmentation label noise usually has strong spatial correlation and has prominent bias in distribution. In this paper, we propose a novel Markov model for segmentation noisy annotations that encodes both spatial correlation and bias. Further, to mitigate such label noise, we propose a label correction method to recover true label progressively. We provide theoretical guarantees of the correctness of the proposed method. Experiments show that our approach outperforms current stateof-the-art methods on both synthetic and real-world noisy annotations. 1

1. INTRODUCTION

Noisy annotations are inevitable in large scale datasets, and can heavily impair the performance of deep neural networks (DNNs) due to their strong memorization power (Zhang et al., 2016; Arpit et al., 2017) . Image segmentation also suffers from the label noise problem. For medical images, segmentation quality is highly dependent on human annotators' expertise and time spent. In practice, medical students and residents in training are often recruited to annotate, potentially introducing errors (Gurari et al., 2015; Kohli et al., 2017) . We also note even among experts, there can be poor consensus in terms of objects' location and boundary (Menze et al., 2014; Joskowicz et al., 2018; Zhang et al., 2020a) . Furthermore, segmentation annotations require pixel/voxel-level detailed delineations of the objects of interest. Annotating objects involving complex boundaries and structures are especially time-consuming. Thus, errors can naturally be introduced when annotating at scale. Segmentation is the first step of most analysis pipelines. Inaccurate segmentation can introduce error into measurements such as the morphology, which can be important for downstream diagnosis and prognostic tasks (Wang et al., 2019a; Nafe et al., 2005) . Therefore, it is important to develop robust training methods against segmentation label noise. However, despite many existing methods addressing label noise in classification tasks (Patrini et al., 2017; Yu et al., 2019; Zhang & Sabuncu, 2018; Li et al., 2020; Liu et al., 2020; Zhang et al., 2021; Xia et al., 2021) , limited progress has been made in the context of image segmentation. A few existing segmentation label noise approaches (Zhu et al., 2019; Zhang et al., 2020b; a) directly apply methods in classification label noise. However, these methods assume the label noise for each pixel is i.i.d. (independent and identically distributed). This assumption is not realistic in the segmentation context, where annotation is often done by brushes, and error is usually introduced near the boundary of objects. Regions further away from the boundary are less likely to be mislabeled (see Fig. 1c for an illustration). Therefore, in segmentation tasks, label noise of pixels has to be spatially correlated. An i.i.d. label noise will result in unrealistic annotations as in Fig. 1b . We propose a novel label noise model for segmentation annotations. Our model simulates the real annotation scenario, where an annotator uses a brush to delineate the boundary of an object. The noisy boundary can be considered a random yet continuous distortion of the true boundary. To capture this noise behavior, we propose a Markov process model. At each step of the process, two Bernoulli variables are used to control the expansion/shrinkage decision and the spatial-dependent expansion/shrinkage strength along the boundary. This model ensures the noisy label is a continuous distortion of the ground truth label along the boundary, as shown in Fig. 1c . Our model also includes a random flipping noise, which allows random (yet sparse) mislabels to appear even at regions far away from the boundary. , a set of wellcurated annotations, to estimate and correct the bias introduced due to label noise. We prove theoretically that only a small amount of validation data are needed to fully correct the bias and clean the noise. Empirically, we show that a single validation image annotation is enough for the bias correction; this is quite reasonable in practice. Furthermore, we generalize our algorithm to an iterative method that repeatedly trains a segmentation model and corrects labels, until convergence. Since our algorithm, called Spatial Correction (SC), is separate from the DNN training process, it is agnostic to the backbone DNN architecture, and can be combined with any segmentation model. On a variety of benchmarks, our method demonstrates superior performance over different state-of-the-art (SOTA) baselines. To summarize, our contribution is three-folds. • We propose a Markov model for segmentation label noise. To the best of our knowledge, this is the first noise model that is tailored for segmentation task and considers spatial correlation. • We propose an algorithm to correct the Markov label noise. Although a validation set is required to combat bias, we prove that the algorithm only needs a small amount of validation data to fully recover the clean labels. • We extend the algorithm to an iterative approach (SC) that can handle more general label noise in various benchmarks and we show that it outperforms SOTA baselines.

2. RELATED WORK

Methods in classification label noise can be categorized into two classes, i.e., model re-calibration and data re-calibration. Model re-calibration methods focus on training a robust network using given noisy labels. Some estimate a noise matrix through special designs of network architecture (Sukhbaatar et al., 2015; Goldberger & Ben-Reuven, 2017) or loss functions (Patrini et al., 2017; Hendrycks et al., 2018) . Some design loss functions that are robust to label noise (Zhang & Sabuncu, 2018; Wang et al., 2019b; Liu & Guo, 2020; Lyu & Tsang, 2020; Ma et al., 2020) 



Codes are available at https://github.com/michaelofsbu/SpatialCorrection.



Figure 1: (a) Original image with true segmentation boundary (blue dash line). (b) Classification label noise model in segmentation context is unrealistic, where the label noise (small squares) spread allover the mask. (c) A realistic segmentation noise generated by our noise model. The noise is mostly about distortions of the boundary. A few random flippings appear at the interior/exterior.Based on our Markov label noise, we propose a novel algorithm to recover the true labels by removing the bias. Since correcting model bias without any reference is almost impossible(Massart & Nédélec, 2006), our algorithm requires a clean validation set, i.e., a set of wellcurated annotations, to estimate and correct the bias introduced due to label noise. We prove theoretically that only a small amount of validation data are needed to fully correct the bias and clean the noise. Empirically, we show that a single validation image annotation is enough for the bias correction; this is quite reasonable in practice. Furthermore, we generalize our algorithm to an iterative method that repeatedly trains a segmentation model and corrects labels, until convergence. Since our algorithm, called Spatial Correction (SC), is separate from the DNN training process, it is agnostic to the backbone DNN architecture, and can be combined with any segmentation model. On a variety of benchmarks, our method demonstrates superior performance over different state-of-the-art (SOTA) baselines. To summarize, our contribution is three-folds.• We propose a Markov model for segmentation label noise. To the best of our knowledge, this is the first noise model that is tailored for segmentation task and considers spatial correlation. • We propose an algorithm to correct the Markov label noise. Although a validation set is required to combat bias, we prove that the algorithm only needs a small amount of validation data to fully recover the clean labels. • We extend the algorithm to an iterative approach (SC) that can handle more general label noise in various benchmarks and we show that it outperforms SOTA baselines.

. For example, generalized cross entropy (GCE)(Zhang & Sabuncu, 2018)  and symmetric cross entropy (SCE)(Wang et al., 2019b)  combine both the robustness of mean absolute error and classification strength of cross entropy loss. Other methods(Xia et al., 2021; Liu et al., 2020; Wei et al., 2021)  add a regularization term to prevent the network from overfitting to noisy labels. Model re-calibration methods usually have strong assumptions and have limited performance when the noise rate is high. Data re-calibration methods achieve SOTA performance by either selecting trustworthy data or correcting labels that are suspected to be noise. Methods like Co-teaching(Han et al., 2018)  and(Jiang  et al., 2018; Yu et al., 2019)  filter out noisy labels and train the network only on clean samples. Most recently, Tanaka et al. (2018); Zheng et al. (2020); Zhang et al. (2021) propose methods that can correct noisy labels using network predictions. Li et al. (2020) extends these methods by maintaining two networks and relabeling each data with a linear combination of the original label and the confidence of the peer network that takes augmented input. Training Segmentation Models with Label Noise. Most existing methods adapt methods for classification to the segmentation task. Zhu et al. (2019) utilize the sample re-weighting technique to train a robust model by adding more weights on reliable samples. Zhang et al. (2020c) extend Coteaching (Han et al., 2018) to Tri-teaching. Three networks are trained jointly, and each pair of

