PROGRESSIVE MIXUP AUGMENTED TEACHER-STUDENT LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION

Abstract

Unsupervised Domain Adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to an unlabeled target domain, mostly through learning a domain invariant feature representation. Currently, the best performing UDA methods use category level domain alignment to capture fine-grained information, resulting in significantly improved performance over global alignment. While successful, category level UDA methods suffer from the unreliable pseudo-labels for target data. Additionally, most UDA methods directly adapt from source to target domain without regard for the large domain discrepancy. In this paper, we propose an UDA approach with teacher-student learning where the teacher network is used to provide more reliable target pseudo-labels for the student during training. Furthermore, we use a progressive mixup augmentation strategy which generates intermediate samples that become increasingly target-dominant as training progresses. Aligning the source and intermediate domains allows the model to gradually transfer fine-grained domain knowledge from the source to the target domain while minimizing the negative impact of noisy target pseudo-labels. This progressive mixup augmented teacher-student (PMATS) training strategy along with class subset sampling and clustering based pseudo-label refinement achieves state-of-the-art performance on two public UDA benchmark datasets: Office-31 and Office-Home.

1. INTRODUCTION

Unsupervised Domain Adaptation (UDA) has become a popular research topic due to its necessity in applying deep learning models to real world scenarios. Often, there exists a domain gap (Quinonero-Candela et al., 2008) between training data and real world testing data that negatively affects model performance during test time. Collecting and labeling data from various domains is impractical due to being both time consuming and labor intensive. Though semi-supervised learning (Berthelot et al., 2019; Sohn et al., 2020) and unsupervised learning (Gidaris et al., 2018) have been studied to improve generalizability to unseen and unlabeled data, it is under the assumption that both the labeled and unlabeled data existed in a similar domain. UDA specifically tackles the problem of data distribution shift, or domain shift, between the labeled source data and the unlabeled target data through transferring knowledge learned from the source domain to the target domain. UDA methods can mostly be split into two different categories. Adversarial adaptation methods (Ganin & Lempitsky, 2015; Tzeng et al., 2017) , inspired by generative adversarial networks (GANs), introduce a domain discriminator to encourage domain confusion in the feature generator through domain-adversarial objectives. This minimizes the gap between source and target distributions through learning domain invariant features. Statistical adaptation methods (Long et al., 2015; 2017) align source and target domain distributions through minimizing a statistical discrepancy measure, such as maximum mean discrepancy (MMD) and joint MMD (JMMD), between the two domains. Most domain adaptation methods in these two categories directly align source and target domain distributions (Ganin & Lempitsky, 2015; Tzeng et al., 2017; Long et al., 2015; 2017) without consideration for the large domain discrepancy, e.g. from synthetic images to real images (Peng et al., 2017) or from art to real images (Venkateswara et al., 2017) . Even when Na et al. (2021) and Hua & Guo (2020) attempt to address the large domain gap, they only create 1

