UNIFIED PRINCIPLES FOR MULTI-SOURCE TRANSFER LEARNING UNDER LABEL SHIFTS

Abstract

We study the label shift problem in multi-source transfer learning and derive new generic principles. Our proposed framework unifies the principles of conditional feature alignment, label distribution ratio estimation and domain relation weights estimation. Based on inspired practical principles, we provide unified practical framework for three multi-source label shift transfer scenarios: learning with limited target data, unsupervised domain adaptation and label partial unsupervised domain adaptation. We evaluate the proposed method on these scenarios by extensive experiments and show that our proposed algorithm can significantly outperform the baselines.

1. INTRODUCTION

Transfer learning (Pan & Yang, 2009) is based on the motivation that learning a new task is easier after having learned several similar tasks. By learning the inductive bias from a set of related source domains (S 1 , . . . , S T ) and then leveraging the shared knowledge upon learning the target domain T , the prediction performance can be significantly improved. Based on this, transfer learning arises in deep learning applications such as computer vision (Zhang et al., 2019; Tan et al., 2018; Hoffman et al., 2018b) , natural language processing (Ruder et al., 2019; Houlsby et al., 2019) and biomedical engineering (Raghu et al., 2019; Lundervold & Lundervold, 2019; Zhang & An, 2017) . To ensure a reliable transfer, it is critical to understand the theoretical assumptions between the domains. One implicit assumption in most transfer learning algorithms is that the label proportions remain unchanged across different domains (Du Plessis & Sugiyama, 2014) (i.e., S(y) = T (y)). However, in many real-world applications, the label distributions can vary markedly (i.e. label shift) (Wen et al., 2014; Lipton et al., 2018; Li et al., 2019b) , in which existing approaches cannot guarantee a small target generalization error, which is recently proved by Combes et al. (2020) . Moreover, transfer learning becomes more challenging when transferring knowledge from multiple sources to build a model for the target domain, as this requires an effective selection and leveraging the most useful source domains when label shift occurs. This is not only theoretically interesting but also commonly encountered in real-world applications. For example, in medical diagnostics, the disease distribution changes over countries (Liu et al., 2004; Geiss et al., 2014) . Considering the task of diagnosing a disease in a country without sufficient data, how can we leverage the information from different countries with abundant data to help the diagnosing? Obviously, naïvely combining all the sources and applying one-to-one single source transfer learning algorithm can lead to undesired results, as it can include low quality or even untrusted data from certain sources, which can severely influence the performance. In this paper, we study the label shift problem in multi-source transfer learning where S t (y) = T (y). We propose unified principles that are applicable for three common transfer scenarios: unsupervised Domain Adaptation (DA) (Ben-David et al., 2010) , limited target labels (Mansour et al., 2020) and partial unsupervised DA with supp(T (y)) ⊆ supp(S t (y)) (Cao et al., 2018) , where prior works generally treated them as separate scenario. It should be noted that this work deals with target shift without assuming that semantic conditional distributions are identical (i.e., S t (x|y) = T (x|y)), which is more realistic for real-world problems. Our contributions in this paper are two-folds: (I) We propose to use Wasserstein distance (Arjovsky et al., 2017) to develop a new target generalization risk upper bound (Theorem 1), which reveals the importance of label distribution ratio

