

Abstract

Given multiple source datasets with labels, how can we train a target model with no labeled data? Multi-source domain adaptation (MSDA) aims to train a model using multiple source datasets different from a target dataset in the absence of target data labels. MSDA is a crucial problem applicable to many practical cases where labels for the target data are unavailable due to privacy issues. Existing MSDA frameworks are limited since they align data without considering conditional distributions p(x|y) of each domain. They also do not fully utilize the target data without labels, and rely on limited feature extraction with a single extractor. In this paper, we propose MULTI-EPL, a novel method for multi-source domain adaptation. MULTI-EPL exploits label-wise moment matching to align conditional distributions p(x|y), uses pseudolabels for the unavailable target labels, and introduces an ensemble of multiple feature extractors for accurate domain adaptation. Extensive experiments show that MULTI-EPL provides the state-of-the-art performance for multi-source domain adaptation tasks in both of image domains and text domains.

1. INTRODUCTION

Given multiple source datasets with labels, how can we train a target model with no labeled data? A large training data are essential for training deep neural networks. Collecting abundant data is unfortunately an obstacle in practice; even if enough data are obtained, manually labeling those data is prohibitively expensive. Using other available or much cheaper datasets would be a solution for these limitations; however, indiscriminate usage of other datasets often brings severe generalization error due to the presence of dataset shifts (Torralba & Efros (2011) ). Unsupervised domain adaptation (UDA) tackles these problems where no labeled data from the target domain are available, but labeled data from other source domains are provided. Finding out domain-invariant features has been the focus of UDA since it allows knowledge transfer from the labeled source dataset to the unlabeled target dataset. There have been many efforts to transfer knowledge from a single source domain to a target one. Most recent frameworks minimize the distance between two domains by deep neural networks and distance-based techniques such as discrepancy regularizers (Long et al. (2015; 2016; 2017) While the above-mentioned approaches consider one single source, we address multi-source domain adaptation (MSDA), which is very crucial and more practical in real-world applications as well as more challenging. MSDA is able to bring significant performance enhancement by virtue of accessibility to multiple datasets as long as multiple domain shift problems are resolved. Previous works have extensively presented both theoretical analysis (Ben-David et al. 2018)) build adversarial networks for each source domain to generate features domain-invariant enough to confound domain classifiers. However, these approaches do not encompass the shifts among source domains, counting only shifts between source and target domain. M 3 SDA (Peng et al. ( 2019)) adopts moment matching strategy but makes the unrealistic assumption that matching the marginal probability p(x) would guarantee the alignment of the conditional probability p(x|y). Most of these methods also do not fully exploit the knowledge of target domain, imputing to the inaccessibility to the labels. Furthermore, all these methods leverage one single feature extractor, which possibly misses important information regarding label classification. In this paper, we propose MULTI-EPL (Multi-source domain adaptation with Ensemble of feature extractors, Pseudolabels, and Label-wise moment matching), a novel MSDA framework which mitigates the limitations of these methods of not explicitly considering conditional probability p(x|y), and relying on only one feature extractor. The model architecture is illustrated in Figure 1 . MULTI-EPL aligns the conditional probability p(x|y) by utilizing label-wise moment matching. We employ pseudolabels for the inaccessible target labels to maximize the usage of the target data. Moreover, generating an ensemble of features from multiple feature extractors gives abundant information about labels to the extracted features. Extensive experiments show the superiority of our methods. Our contributions are summarized as follows: • Method. We propose MULTI-EPL, a novel approach for MSDA that effectively obtains domain-invariant features from multiple domains by matching conditional probability p(x|y), utilizing pseudolabels for inaccessible target labels to fully deploy target data, and using an ensemble of multiple feature extractors. It allows domain-invariant features to be extracted, capturing the intrinsic differences of different labels. • Analysis. We theoretically prove that minimizing the label-wise moment matching loss is relevant to bounding the target error. • Experiments. We conduct extensive experiments on image and text datasets. We show that 1) MULTI-EPL provides the state-of-the-art accuracy, and 2) each of our main ideas significantly contributes to the superior performance. 2018)) that differently labeled data follow distinct distributions, even if they are drawn from the same domain. Also, the domain-invariant features in these methods contain the label information for only one label classifier which lead these methods to miss a large amount of



), adversarial networks (Ganin et al. (2016); Tzeng et al. (2017)), and generative networks (Liu et al. (2017); Zhu et al. (2017); Hoffman et al. (2018b)).

(2010); Mansour et al. (2008); Crammer et al. (2008); Hoffman et al. (2018a); Zhao et al. (2018); Zellinger et al. (2020)) and models (Zhao et al. (2018); Xu et al. (2018); Peng et al. (2019)) for MSDA. MDAN (Zhao et al. (2018)) and DCTN (Xu et al. (

Single-source Domain Adaptation. Given a labeled source dataset and an unlabeled target dataset, single-source domain adaptation aims to train a model that performs well on the target domain. The challenge of single-source domain adaptation is to reduce the discrepancy between the two domains and to obtain appropriate domain-invariant features. Various discrepancy measures such as Maximum Mean Discrepancy (MMD)(Tzeng et al. (2014); Long et al. (2015; 2016; 2017); Ghifary et al. (2016)) and KL divergence (Zhuang et al. (2015)) have been used as regularizers. Inspired from the insight that the domain-invariant features should exclude the clues about its domain, constructing adversarial networks against domain classifiers has shown superior performance. Liu et al. (2017) and Hoffman et al. (2018b) deploy GAN to transform data across the source and target domain, while Ganin et al. (2016) and Tzeng et al. (2017) leverage the adversarial networks to extract common features of the two domains. Unlike these works, we focus on multiple source domains. Multi-source Domain Adaptation. Single-source domain adaptation should not be naively employed for multiple source domains due to the shifts between source domains. Many previous works have tackled MSDA problems theoretically. Mansour et al. (2008) establish distribution weighted combining rule that the weighted combination of source hypotheses is a good approximation for the target hypothesis. The rule is further extended to a stochastic case with joint distribution over the input and the output space in Hoffman et al. (2018a). Crammer et al. (2008) propose the general theory of how to sift appropriate samples out of multi-source data using expected loss. Efforts to find out transferable knowledge from multiple sources from the causal viewpoint are made in Zhang et al. (2015). There have been salient studies on the learning bounds for MSDA. Ben-David et al. (2010) found the generalization bounds based on H∆H-divergence, which are further tightened by Zhao et al. (2018). Frameworks for MSDA have been presented as well. Zhao et al. (2018) propose learning algorithms based on the generalization bounds for MSDA. DCTN (Xu et al. (2018)) resolves domain and category shifts between source and target domains via adversarial networks. M 3 SDA (Peng et al. (2019)) associates all the domains into a common distribution by aligning the moments of the feature distributions of multiple domains. Lin et al. (2020) focus on the visual sentiment classification tasks and attempts to find out the common latent space of source and target domains. Wang et al. (2020) consider the interactions among multiple domains and reflect the information by constructing knowledge graph. However, all these methods do not consider multimode structures (Pei et al. (

