WHEN SOURCE-FREE DOMAIN ADAPTATION MEETS LEARNING WITH NOISY LABELS

Abstract

Recent state-of-the-art source-free domain adaptation (SFDA) methods have focused on learning meaningful cluster structures in the feature space, which have succeeded in adapting the knowledge from source domain to unlabeled target domain without accessing the private source data. However, existing methods rely on the pseudo-labels generated by source models that can be noisy due to domain shift. In this paper, we study SFDA from the perspective of learning with label noise (LLN). Unlike the label noise in the conventional LLN scenario, we prove that the label noise in SFDA follows a different distribution assumption. We also prove that such a difference makes existing LLN methods that rely on their distribution assumptions unable to address the label noise in SFDA. Empirical evidence suggests that only marginal improvements are achieved when applying the existing LLN methods to solve the SFDA problem. On the other hand, although there exists a fundamental difference between the label noise in the two scenarios, we demonstrate theoretically that the early-time training phenomenon (ETP), which has been previously observed in conventional label noise settings, can also be observed in the SFDA problem. Extensive experiments demonstrate significant improvements to existing SFDA algorithms by leveraging ETP to address the label noise in SFDA.

1. INTRODUCTION

Deep learning demonstrates strong performance on various tasks across different fields. However, it is limited by the requirement of large-scale labeled and independent, and identically distributed (i.i.d.) data. Unsupervised domain adaptation (UDA) is thus proposed to mitigate the distribution shift between the labeled source and unlabeled target domain. In view of the importance of data privacy, it is crucial to be able to adapt a pre-trained source model to the unlabeled target domain without accessing the private source data, which is known as Source Free Domain Adaptation (SFDA). The current state-of-the-art SFDA methods (Liang et al., 2020; Yang et al., 2021a; b) mainly focus on learning meaningful cluster structures in the feature space, and the quality of the learned cluster structures hinges on the reliability of pseudo labels generated by the source model. Among these methods, SHOT (Liang et al., 2020) purifies pseudo labels of target data based on nearest centroids, and then the purified pseudo labels are used to guide the self-training. G-SFDA (Yang et al., 2021b) and NRC (Yang et al., 2021a) further refine pseudo labels by encouraging similar predictions to the data point and its neighbors. For a single target data point, when most of its neighbors are correctly predicted, these methods can provide an accurate pseudo label to the data point. However, as we illustrate the problem in Figure 1i (a-b), when the majority of its neighbors are incorrectly predicted to a category, it will be assigned with an incorrect pseudo label, misleading the learning of cluster structures. The experimental result on VisDA (Peng et al., 2017) , shown in Figure 1ii , further verifies this phenomenon. By directly applying the pre-trained source model on each target domain instance (central instance), we collect its neighbors and evaluate their quality. We observed that for each class a large proportion of the neighbors are misleading (i.e., the neighbors' pseudo labels are different from the central instance's true label), some even with high confidence (e.g., the over-confident misleading neighbors whose prediction score is larger than 0.75). Based on this observation, we can conclude that: (1) the pseudo labels leveraged in current SFDA methods can be heavily noisy; (2) some pseudo-label purification methods utilized in SFDA, which severely rely on the quality of the pseudo label itself, will be affected by such label noise, and the prediction error will accumulate as the training progresses. More details can be found in Appendix A. " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q B e h 6 M V j B d M W 2 l A 2 2 0 2 7 d L M J u x O h l P 4 G L x 4 U 8 e o P 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 y l M O i 6 3 0 5 h b X 1 j c 6 u 4 X d r Z 3 d s / K B 8 e N U 2 S a c Z 9 l s h E t 0 N q u B S K + y h Q 8 n a q O Y 1 D y V v h 6 G 7 m t 5 6 4 N i J R j z h O e R D T g R K R Y B S t 5 O M N 9 r x e u e J W 3 T n I K v F y U o E c j V 7 5 q 9 t P W B Z z h U x S Y z q e m 2 I w o R o F k 3 x a 6 m a G p 5 S N 6 I B 3 L F U 0 5 i a Y z I + d k j O r 9 E m U a F s K y V z 9 P T G h s T H j O L S d M c W h W f Z m 4 n 9 e J 8 P o O p g I l W b I F V s s i j J J M C G z z 0 l f a M 5 Q j i 2 h T A t 7 K 2 F D q i l D m 0 / J h u A t v 7 x K m h d V 7 7 J a e 6 h V 6 r d 5 H E U 4 g V M 4 B w + u o A 7 3 0 A A f G A h 4 h l d 4 c 5 T z 4 r w 7 H 4 v W g p P P H M M f O J 8 / Y T K O a g = = < / l a t e x i t > t = t 1 Label Noise < l a t e x i t s h a 1 _ b a s e 6 4 = " n P n p m x E N y u u N u r H M a K i I E + / 9 t E M = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 I t Q 9 O K x o v 2 A N p T N d t M u 3 W z C 7 k Q o o T / B i w d F v P q L v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x I p D L r u t 1 N Y W V 1 b 3 y h u l r a 2 d 3 b 3 y v s H T R O n m v E G i 2 W s 2 w E 1 X A r F G y h Q 8 n a i O Y 0 C y V v B 6 H b q t 5 6 4 N i J W j z h O u B / R g R K h Y B S t 9 I D X b q 9 c c a v u D G S Z e D m p Q I 5 6 r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 8 q 3 o X 1 f P 7 8 0 r t J o + j C E d w D K f g w S X U 4 A 7 q 0 A A G A 3 i G V 3 h z p P P i v D s f 8 9 a C k 8 8 c w h 8 4 n z / T S Y 2 C < / l a t e x i t > t = 0 Source Model < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 w Q 1 N M 1 Z t 8 8 / H m 3 6 b P d b v G g P / d 0 = " > A A A B 9 X i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 B I v g q S Q i 6 k U o e v F Y w X 5 A G 8 t m M 2 m X b j Z h d 6 K W 0 P / h x Y M i X v 0 v 3 v w 3 b t s c t P X B w O O 9 G W b m + Y n g G h 3 n 2 y o s L a + s r h X X S x u b W 9 s 7 5 d 2 9 p o 5 T x a D B Y h G r t k 8 1 C C 6 h g R w F t B M F N P I F t P z h 9 c R v P Y D S P J Z 3 O E r A i 2 h f 8 p A z i k a 6 x 0 v s d R G e M A M Z j H v l i l N 1 p r A X i Z u T C s l R 7 5 W / u k H M 0 g g k M k G 1 7 r h O g l 5 G F X I m Y F z q p h o S y o a 0 D x 1 D J Y 1 A e 9 n 0 6 r F 9 Z J T A D m N l S q I 9 V X 9 P Z D T S e h T 5 p j O i O N D z 3 k T 8 z + u k G F 5 4 G Z d J i i D Z b F G Y C h t j e x K B H X A F D M X I E M o U N 7 f a b E A V Z W i C K p k Q 3 P m X F 0 n z p O q e V U 9 v T y u 1 q z y O I j k g h + S Y u O S c 1 M g N q Z M G Y U S R Z / J K 3 q x H 6 8 V 6 t z 5 m r Q U r n 9 k n f 2 B 9 / g A N 7 J L j < / l a t In this paper, we address the aforementioned problem by formulating SFDA as learning with label noise (LLN). Unlike existing studies that heuristically rely on cluster structures or neighbors, we investigate the properties of label noise in SFDA and show that there is an intrinsic discrepancy between the SFDA and the LLN problems. Specifically, in conventional LLN scenarios, the label noise is generated by human annotators or image search engines (Patrini et al., 2017; Xiao et al., 2015; Xia et al., 2020a) , where the underlying distribution assumption is that the mislabeling rate for a sample is bounded. However, in the SFDA scenarios, the label noise is generated by the source model due to the distribution shift, where we prove that the mislabeling rate for a sample is much higher, and can approach 1. We term the former label noise in LLN as bounded label noise and the latter label noise in SFDA as unbounded label noise. Moreover, we theoretically show that most existing LLN methods, which rely on bounded label noise assumption, are unable to address the label noise in SFDA due to the fundamental difference (Section 3). To this end, we leverage early-time training phenomenon (ETP) in LLN to address the unbounded label noise and to improve the efficiency of existing SFDA algorithms. Specifically, ETP indicates that classifiers can predict mislabeled samples with relatively high accuracy during the early learning phase before they start to memorize the mislabeled data (Liu et al., 2020) . Although ETP has been previously observed in, it has only been studied in the bounded random label noise in the conventional LLN scenarios. In this work, we theoretically and empirically show that ETP still exists in the unbounded label noise scenario of SFDA. Moreover, we also empirically justify that existing SFDA algorithms can be substantially improved by leveraging ETP, which opens up a new avenue for SFDA. As an instantiation, we incorporate a simple early learning regularization (ELR) term (Liu et al., 2020) with existing SFDA objective functions, achieving consistent improvements on four different SFDA benchmark datasets. As a comparison, we also apply other existing LLN methods, including Generalized Cross Entropy (GCE) (Zhang & Sabuncu, 2018) , Symmetric Cross Entropy Learning (SL) (Wang et al., 2019b) , Generalized Jensen-Shannon Divergence (GJS) (Englesson & Azizpour, 2021) and Progressive Label Correction (PLC) (Zhang et al., 2021) , to SFDA. Our empirical evidence shows that they are inappropriate for addressing the label noise in SFDA. This is also consistent with our theoretical results (Section 4). Our main contribution can be summarized as: (1) We establish the connection between the SFDA and the LLN. Compared with the conventional LLN problem that assumes bounded label noise, the problem in SFDA can be viewed as the problem of LLN with the unbounded label noise. (2)



t e x i t s h a 1 _ b a s e 6 4 = " S L 1 5 3 E m R 9 z g u 8 F 8 d j g D n m v P e 4 U A =

Figure 1: (i) (a) The SFDA problem can be formulated as an LLN problem. (b) The existing SFDA algorithms using the local cluster information cannot address label noise due to the unbounded label noise (Section 3). (c) We prove that ETP exists in SFDA, which can be leveraged to address the unbounded label noise (Section 4). (ii) Observed Label Noise Phenomena on VisDA dataset.

