IDENTIFYING LATENT CAUSAL CONTENT FOR MULTI-SOURCE DOMAIN ADAPTATION Anonymous

Abstract

Multi-source domain adaptation (MSDA) learns to predict the labels in target domain data, under the setting that data from multiple source domains are labelled and data from the target domain are unlabelled. Most methods for this task focus on learning invariant representations across domains. However, their success relies heavily on the assumption that the label distribution remains consistent across domains, which may not hold in general real-world problems. In this paper, we propose a new and more flexible assumption, termed latent covariate shift, where a latent content variable z c and a latent style variable z s are introduced in the generative process, with the marginal distribution of z c changing across domains and the conditional distribution of the label given z c remaining invariant across domains. We show that although (completely) identifying the proposed latent causal model is challenging, the latent content variable can be identified up to scaling by using its dependence with labels from source domains, together with the identifiability conditions of nonlinear ICA. This motivates us to propose a novel method for MSDA, which learns the invariant label distribution conditional on the latent content variable, instead of learning invariant representations. Empirical evaluation on simulation and real data demonstrates the effectiveness of the proposed method.

1. INTRODUCTION

Traditional machine learning requires the training and testing data to be independent and identically distributed (Vapnik, 1999) . This strict assumption may not be fulfilled in various potential real-world applications. For example, in medical applications, it is common to seek to train a model on patients from a few hospitals and generalize it to a new hospital (Zech et al., 2018) . In this case, it is often reasonable to consider that the distributions of data from training hospitals are different from the new hospital (Koh et al., 2021) . Domain adaptation is a promising research area to handle such problems. In this work, we focus on multi-source DA (MSDA) settings where source domain data are collected from multiple domains. Formally, let x denote the input, e.g. image, y denote the labels in source and target domains, and D denote the domain index. We observe labeled data pairs (x S , y S ) from the multiple joint distributions p(x, y|D = 1), ..., p(x, y|D = m), ..., p(x, y|D = M ) in source domains, and unlabeled input data samples x T from the joint distribution p(x, y|D T ) in the target domain. The training phase of MSDA is to use the sets of (x S , y S ) and x T , to train a predictor so that it can provide a satisfactory estimation for y T in the target domain. The key for MSDA is to understand how the joint distribution p D (x, y) change across all different source and target domains. Most early methods assume that the change of the joint distribution results from Covariate Shift (Huang et al., 2006; Bickel et al., 2007; Sugiyama et al., 2007; Wen et al., 2014) , e.g., p D (x, y) = p D (y|x)p D (x), as depicted by Figure 1(a) . This setting assumes that p D (x) changes across domains, while the conditional distribution p D (y|x) is invariant across domains. Such assumption may not always hold for some real applications, e.g., image classification. For example, the assumption of invariant p D (y|x) implies that p D (y) should change as p D (x) changes. However, we can easily change style information (e.g., hue, view) in the images to change p D (x) and keep p D (y) unchanged, which is common in classification but violates the assumption. In contrast to covariate shift, most current works consider Conditional Shift as depicted by Figure 1(b) . It assumes that the conditional p D (x|y) changes while p D (y) is invariant across domains (Zhang et al., 2013; 2015; Schölkopf et al., 2012; Stojanov et al., 2021; Peng et al., 2019) . This situation motivates a popular class of methods focusing on learning invariant representations across domains to approach the latent content variable z c in Figure 1 (b) (Ganin et al., 2016; Zhao et al., 2018; Saito et al., 2018; Mancini et al., 2018; Yang et al., 2020; Wang et al., 2020; Li et al., 2021; Stojanov et al., 2021) . However, the label distribution p D (y) may change across domains in many real application scenarios (Tachet des Combes et al., 2020; Lipton et al., 2018; Zhang et al., 2013) , for which learning invariant representations may be resulting in degenerating performance. In theory, there exists an upper bound on the performance of learning invariant representations when label distribution changes across domains (Zhao et al., 2019) . To understand more deeply and handle LCS, we propose a latent causal model to formulate the data and label generating process, by introducing the latent style variable z s to complement z c as depicted in Figure 2 . To analyse the identifiability of the proposed causal model, we also introduce latent noise variables n c and n s , which represent some unmeasured factors influencing z c and z s , respectively. As a result, we can leverage recent progress in the identifiability result of nonlinear ICA (Hyvarinen et al., 2019; Khemakhem et al., 2020) to analyse the identifiability of the proposed latent causal model. We show that although completely identifying the proposed latent causal model is often not possible without further assumptions due to transitivity in latent space, partially identifying the latent content variables z c up to scaling is tractable, by integrating the identifiability result of nonlinear ICA with the dependence between n c and y. This motivates us to propose a novel method to learn the invariant conditional distribution p D (y|z c ) for LCS, instead of learning invariant representations. Relying on the guaranteed identifiability on z c , the proposed method provides a principled way to ensure that the covariant z c can be identified on the target domain data, and the learned predictor p D (y|z c ) can generalize to the target domain. Empirical evaluation on synthetic and real data demonstrates the effectiveness of the proposed method, compared with state-of-the-art methods. Overall, our main contributions can be summarized as follows: (i) Differ from the commonly-used Conditional Shift as shown in Figure 1 We propose a latent causal model for latent covariate shift. Leveraging the existing identifiability results of nonlinear ICA, we provide an analysis about the identifiability of the proposed latent causal graph, which provides guarantee for identifying the latent causal content variable z c . (iii) Under the identifiability result, we design a new method for domain adaptation, and empirically evaluate the proposed method on simulation and real data.



Figure 1: The illustration of three different assumptions for MSDA. (a) Covariate Shift: pD(x) changes across domains, while pD(y|x) is invariant across domains. (b) Conditional Shift: pD(y) is invariant, while pD(x|y) changes across domains. (c) Latent Covariate Shift: pD(zc) changes across domains while pD(y|zc) is invariant across domains.

(b), which assume label distribution to be the same across domains, we propose a new problem setting, latent covariate shift, as shown in Figure1 (c). (ii)

