DEEP LEARNING FROM CROWDSOURCED LABELS: COUPLED CROSS-ENTROPY MINIMIZATION, IDENTI-FIABILITY, AND REGULARIZATION

Abstract

Using noisy crowdsourced labels from multiple annotators, a deep learning-based end-to-end (E2E) system aims to learn the label correction mechanism and the neural classifier simultaneously. To this end, many E2E systems concatenate the neural classifier with multiple annotator-specific "label confusion" layers and co-train the two parts in a parameter-coupled manner. The formulated coupled cross-entropy minimization (CCEM)-type criteria are intuitive and work well in practice. Nonetheless, theoretical understanding of the CCEM criterion has been limited. The contribution of this work is twofold: First, performance guarantees of the CCEM criterion are presented. Our analysis reveals for the first time that the CCEM can indeed correctly identify the annotators' confusion characteristics and the desired "ground-truth" neural classifier under realistic conditions, e.g., when only incomplete annotator labeling and finite samples are available. Second, based on the insights learned from our analysis, two regularized variants of the CCEM are proposed. The regularization terms provably enhance the identifiability of the target model parameters in various more challenging cases. A series of synthetic and real data experiments are presented to showcase the effectiveness of our approach.

1. INTRODUCTION

The success of deep learning has escalated the demand for labeled data to an unprecedented level. Some learning tasks can easily consume millions of labeled data (Najafabadi et al., 2015; Goodfellow et al., 2016) . However, acquiring data labels is a nontrivial task-it often requires a pool of annotators with sufficient domain expertise to manually label the data items. For example, the popular Microsoft COCO dataset contains 2.5 million images and around 20,000 work hours aggregated from multiple annotators were used for its category labeling (Lin et al., 2014) . Crowdsourcing is considered an important working paradigm for data labeling. In crowdsourcing platforms, e.g., Amazon Mechanical Turk (Buhrmester et al., 2011) , Crowdflower (Wazny, 2017), and ClickWork (Vakharia & Lease, 2013) , data items are dispatched and labeled by many annotators; the annotations are then integrated to produce reliable labels. A notable challenge is that annotator-output labels are sometimes considerably noisy. Training machine learning models using noisy labels could seriously degrade the system performance (Arpit et al., 2017; Zhang et al., 2016a) . In addition, the labels provided by individual annotators are often largely incomplete, as a dataset is often divided and dispatched to different annotators. Early crowdsourcing methods often treat annotation integration and downstream operations, e.g., classification, as separate tasks; see, (Dawid & Skene, 1979; Karger et al., 2011a; Whitehill et al., 2009; Snow et al., 2008; Welinder et al., 2010; Liu et al., 2012; Zhang et al., 2016b; Ibrahim et al., 2019; Ibrahim & Fu, 2021) . This pipeline estimates the annotators' confusion parameters (e.g., the confusion matrices under the Dawid & Skene (DS) model (Dawid & Skene, 1979) ) in the first stage.

