JOINT ATTENTION-DRIVEN DOMAIN FUSION AND NOISE-TOLERANT LEARNING FOR MULTI-SOURCE DOMAIN ADAPTATION

Abstract

Multi-source Unsupervised Domain Adaptation (MUDA) transfers knowledge from multiple source domains with labeled data to an unlabeled target domain. Recently, endeavours have been made in establishing connections among different domains to enable feature interaction. However, as these approaches essentially enhance category information, they lack the transfer of the domain-specific information. Moreover, few research has explored the connection between pseudolabel generation and the framework's learning capabilities, crucial for ensuring robust MUDA. In this paper, we propose a novel framework, which significantly reduces the domain discrepancy and demonstrates new state-of-the-art performance. In particular, we first propose a Contrary Attention-based Domain Merge (CADM) module to enable the interaction among the features so as to achieve the mixture of domain-specific information instead of focusing on the category information. Secondly, to enable the network to correct the pseudo labels during training, we propose an adaptive and reverse cross-entropy loss, which can adaptively impose constraints on the pseudo-label generation process. We conduct experiments on four benchmark datasets, showing that our approach can efficiently fuse all domains for MUDA while showing much better performance than the prior methods.

1. INTRODUCTION

Deep neural networks (DNNs) have achieved excellent performance on various vision tasks under the assumption that training and test data come from the same distribution. However, different scenes have different illumination, viewing angles, and styles, which may cause the domain shift problem (Zhu et al., 2019; Tzeng et al., 2017; Long et al., 2016) . This can eventually lead to a significant performance drop on the target task. Unsupervised Domain Adaptation (UDA) aims at addressing this issue by transferring knowledge from the source domain to the unlabeled target domain (Saenko et al., 2010) . Early research has mostly focused on Single-source Unsupervised Domain Adaptation (SUDA), which transfers knowledge from one source domain to the target domain. Accordingly, some methods align the feature distribution among source and target domains (Tzeng et al., 2014) while some (Tzeng et al., 2017) learn domain invariants through adversarial learning. Liang et al. (2020) use the label information to maintain the robust training process. However, data is usually collected from multiple domains in the real-world scenario, which arises a more practical task, i.e., Multi-source Unsupervised Domain Adaptation (MUDA) (Duan et al., 2012) . MUDA leverages all of the available data and thus enables performance gains; nonetheless, it introduces a new challenge of reducing domain shift between all source and target domains. For this, some research (Peng et al., 2019) builds their methods based on SUDA, aiming to extract common domain-invariant features for all domains. Moreover, some works, e.g., Venkat et al. (2021); Zhou et al. (2021) focus on the classifier's predictions to achieve domain alignment. Recently, some approaches (Li et al., 2021; Wen et al., 2020) take advantage of the MUDA property to create connections for each domain. Overall, since the main challenge of MUDA is to eliminate the differences between all domains, there are two main ways to achieve this. One is to extract domain invariant features among all domains, i.e., filter domain-specific information for different domains. The other is by mixing domain-specific information from different domains so that all domains share such mixed information and thus fuse into one domain. Previous approaches have mostly followed the former, however, filtering the domain-specific information for multiple domains can be difficult and often results in losing discrimination ability. For the latter, few methods have been proposed to address MUDA in this way, and there is a lack of effective frameworks to achieve such domain fusion, which is the main problem to be addressed by our proposed approach. Moreover, existing methods ignore the importance of generating reliable pseudo labels as noisy pseudo labels could lead to the accumulation of prediction errors. Consequently, it is imperative to design a robust pseudo-label generation based on the MUDA framework. f 𝒔𝟏 f 𝒔𝟏 h 𝒔𝟏 f 𝒔𝟏 f 𝒔𝟏 h 𝒔𝟐 f 𝒔𝟏 f 𝒔𝟏 h 𝒔𝟑 f 𝒔𝟏 f 𝒔𝟏 In this paper, we propose a novel framework that better reduces the domain discrepancy and shows the new state-of-the-art (SoTA) MUDA performance, as shown in Fig. 1 . Our method enjoys two pivotal technical breakthroughs. Firstly, we propose a Contrary Attention-based Domain Merge (CADM) module (Sec. 4.2), whose role is to perform the domain fusion. Self attention (Vaswani et al., 2017; Dosovitskiy et al., 2020) can capture the higher-order correlation between features and emphases more relevant feature information, e.g., the semantically closest information. Differently, our CADM proposes the contrary attention, enabling each domain to pay more attention to semantically different domain-specific information of other domains. Then, by integrating these domain-specific information, each domain can achieve movement to other domains, thus resulting in domain fusion. Secondly, to enable the network to correct the pseudo labels during training, we take the pseudo-label generation as the optimization objective of the network by proposing an adaptive and reverse cross-entropy (AR-CE) loss (Sec. 4.3). It imposes the optimizable constraints for pseudo-label generation, enabling the network to correct pseudo labels that tend to be wrong and reinforce pseudo labels that tend to be correct. We conduct extensive experiments on four benchmark datasets. The results show our method achieves new state-of-the-art performance and especially has significant advantages in dealing with the large-scale dataset. In summary, we have made the following contributions: • We propose a CADM to fuse the features of source and target domains, bridging the gap between different domains and enhancing the discriminability of different categories. • We propose a loss function, AR-CE loss, to diminish the negative impact of the noisy pseudo label during training. • Our method demonstrates the new state-of-the-art (SoTA) MUDA performance on multiple benchmark datasets.



Figure 1: Overview of our proposed framework that consists of three primary elements: (1) The feature extractor extracts features from various domains. (2) CADM is proposed to implement message passing and fuse features from different domains. The same domain is represented by the same color, while different shapes represent different classes. (3) we propose AR-CE loss and use the maintained memory to compute the soft label and pseudo label (hard label) for target domain.

