CAKE: CAUSAL AND COLLABORATIVE PROXY-TASKS LEARNING FOR SEMI-SUPERVISED DOMAIN ADAPTA-TION

Abstract

Semi-supervised domain adaptation (SSDA) adapts a learner to a new domain by effectively utilizing source domain data and a few labeled target samples. It is a practical yet under-investigated research topic. In this paper, we analyze the SSDA problem from two perspectives that have previously been overlooked, and correspondingly decompose it into two key subproblems: robust domain adaptation (DA) learning and maximal cross-domain data utilization. (i) From a causal theoretical view, a robust DA model should distinguish the invariant "concept" (key clue to image label) from the nuisance of confounding factors across domains. To achieve this goal, we propose to generate concept-invariant samples to enable the model to classify the samples through causal intervention, yielding improved generalization guarantees; (ii) Based on the robust DA theory, we aim to exploit the maximal utilization of rich source domain data and a few labeled target samples to boost SSDA further. Consequently, we propose a collaboratively debiasing learning framework that utilizes two complementary semi-supervised learning (SSL) classifiers to mutually exchange their unbiased knowledge, which helps unleash the potential of source and target domain training data, thereby producing more convincing pseudo-labels. Such obtained labels facilitate crossdomain feature alignment and duly improve the invariant concept learning. In our experimental study, we show that the proposed model significantly outperforms SOTA methods in terms of effectiveness and generalisability on SSDA datasets.



), affecting the model feature alignment capability. Further, in the SSDA setting, we have three sets of data, i.e., source domain data, labeled and unlabeled target domain data. One single model for SSDA may be hard to generalize to the three sets with different label distributions. Thus, the premise of better utilization of labeled target samples is to mitigate undesirable bias and reasonably utilize the multiple sets. Summing up, these limitations call for reexamination of SSDA and its solutions. To alleviate the aforementioned limitations, we propose a framework called CAusal collaborative proxy-tasKs lEarning (CAKE) which is illustrated in Figure 1(c ). In the first step, we formalize the DA task using a causal graph. Then leveraging causal tools, we identify the "style" as the confounder and derive the invariant concepts across domains. In the subsequent steps, we build two classifiers based on the invariant concept to utilize rich information from cross-domain data for better SSDA. In this way, CAKE explicitly decomposes the SSDA into two proxy subroutines, namely Invariant Concept Learning Proxy (ICL) and Collaboratively Debiasing Learning Proxy (CDL). In ICL, we identify the key to robust DA is that the underlying concepts are consistent across domains, and the confounder is the style that prevents the model from learning the invariant concept (C) for accurate DA. Therefore, a robust DA model should be an invariant predictor P (Y| X , D = D T ) = P (Y| X , D = D S )) under causal interventions. To address the problem, we devise a causal factor generator (CFG) that can produce concept-invariant samples X with different style to facilitate the DA model to effectively learn the invariant concept. As such, our ICL may be regarded as an improved version of Invariant Risk Minimization (IRM) Arjovsky et al. (2019) for SSDA, which equips the model with the ability to learn the concept features that are invariant to styles. In CDL, with the invariant concept learning as the foundation, we aim to unleash the potential of three sets of cross-domain data for better SSDA. Specifically, we build two correlating and complementary pseudo-labeling based semi-supervised learning (SSL) classifiers for D S and D T with self-penalization. These two classifiers ensure that the mutual knowledge is exchanged to expand the number of "labeled" samples



DA) aims to transfer training knowledge to the new domain (target D = D T ) using the labeled data available from the original domain (source D = D S ), which can alleviate the poor generalization of learned deep neural networks when the data distribution significantly deviates from the original domain Wang & Deng (2018); You et al. (2019); Tzeng et al. (2017). In the DA community, recent works Saito et al. (2019) have shown that the presence of few labeled data from the target domain can significantly boost the performance of deep learning-based models. This observation led to the formulation of Semi-Supervised Domain Adaptation (SSDA), which is a variant of Unsupervised Domain Adaptation (UDA) Venkateswara et al. (2017) to facilitate model training with rich labels from D S and a few labeled samples from D T . For the fact that we can easily collect such additional labels on the target data in real-world applications, SSDA has the potential to render the adaptation problem more practical and promising in comparison to UDA. Broadly, most contemporary approaches Ganin et al. (2016); Jiang et al. (2020); Kim & Kim (2020); Yoon et al. (2022) handle the SSDA task based on two domain shift assumptions, where X and Y respectively denote the samples and their corresponding labels: (i) Covariate Shift, P (X |D = D S ) ̸ = P (X |D = D T ); (ii) Conditional Shift, P (Y|X , D = D S ) ̸ = P (Y|X , D = D T ), refers to the difference of conditional label distributions of cross-domain data. Intuitively, one straightforward solution for SSDA is to learn the common features to mitigate the domain shift issues. Further quantitative analyses, however, indicate that the model trained with supervision on a few labeled target samples and labeled source data can just ensure partial cross-domain feature alignment Kim &Kim (2020). That is, it only aligns the features of labeled target samples and their correlated nearby samples with the corresponding feature clusters in the source domain.

Figure 1: (a) Four DA cases ("Clipart" → "Real"). (b) Class-wise distribution of source domain and target domain. (c) A simplified version that indicates how our proposed model facilitates the SSDA.

