IDENTIFIABILITY OF LABEL NOISE TRANSITION MATRIX

Abstract

The noise transition matrix plays a central role in the problem of learning with noisy labels. Among many other reasons, a large number of existing solutions rely on access to it. Identifying and estimating the transition matrix without ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, the field lacks a unified understanding of when such a problem remains identifiable. The goal of this paper is to characterize the identifiability of the label noise transition matrix. Building on Kruskal's identifiability results, we show the necessity of multiple noisy labels in identifying the noise transition matrix for the generic case at the instance level. We further instantiate the results to relate to the successes of the state-of-the-art solutions and how additional assumptions alleviated the requirement of multiple noisy labels. Our result also reveals that disentangled features are helpful in the above identification task and we provide empirical evidence. Recent study (Wei et al., 2021) has empirically shown that the above class-dependent model is not precise in capturing the real-world noise patterns, but rather real human-level noise follows an instance-dependent model. Intuitively, the instance X encodes the difficulties in generating the label

1. INTRODUCTION

The literature of learning with noisy labels concerns the scenario when the observed labels Ỹ can differ from the true one Y . The noise transition matrix T (X), defined as the transition probability from Y to Ỹ given X, plays a central role in this problem. Among many other benefits, the knowledge of T (X) has demonstrated its use in performing either risk (Natarajan et al., 2013; Patrini et al., 2017a) , or label (Patrini et al., 2017a) , or constraint corrections (Wang et al., 2021a) . In beyond, it also finds applications in ranking small loss samples (Han et al., 2020) and detecting corrupted samples (Zhu et al., 2021a) . On the other hand, applying the wrong transition matrix T (X) can lead to a number of issues. The literature has well-documented evidence that a wrongly inferred transition matrix can lead to performance drops (Natarajan et al., 2013; Liu & Wang; Xia et al., 2019; Zhu et al., 2021c) , and false sense of fairness (Wang et al., 2021a; Liu & Wang) . Knowing whether a T (X) is identifiable or not helps understand if the underlying noisy learning problem is indeed learnable. Prior works have documented challenges in estimating the noise transition matrices when the quality of available training information remains unclear. For instance, in (Zhu et al., 2022) the authors show that when the quality of representations dropped, the estimation error in T (X) increases significantly (Figure 1 therein). Other previous references have documented these challenges too (Xia et al., 2019) . We have also provided experiments to validate the argument in Appendix C.4.

acknowledgement

The earlier results have focused on class-but not instance-dependent transition matrix T (X) ≡ T := [P( Ỹ = j|Y = i)] i,j , ∀X. The literature has provided discussions of the identifiability of T under the mixture proportion estimation setup (Scott, 2015), and has identified a reducibility condition for inferring the inverse noise rate. Later works have developed a sequence of solutions to estimate T under a variety of assumptions, including irreducibility (Scott, 2015), anchor points (Liu & Tao, 2016; Xia et al., 2019; Yao et al., 2020a), separability (Cheng et al., 2020), rankability (Northcutt et al., 2017; 2021), redundant labels/tensor (Liu et al., 2020; Traganitis et al., 2018; Zhang et al., 2014), clusterability (Zhu et al., 2021c), among others (Zhang et al., 2021; Li et al., 2021).

