SOURCE-FREE DOMAIN ADAPTATION VIA DISTRIBUTIONAL ALIGNMENT BY MATCHING BATCH NORMALIZATION STATISTICS

Abstract

In this paper, we propose a novel domain adaptation method for the source-free setting. In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given. Due to lack of source data, we cannot directly match the data distributions between domains unlike typical domain adaptation algorithms. To cope with this problem, we propose utilizing batch normalization statistics stored in the pretrained model to approximate the distribution of unobserved source data. Specifically, we fix the classifier part of the model during adaptation and only fine-tune the remaining feature encoder part so that batch normalization statistics of the features extracted by the encoder match those stored in the fixed classifier. Additionally, we also maximize the mutual information between the features and the classifier's outputs to further boost the classification performance. Experimental results with several benchmark datasets show that our method achieves competitive performance with state-of-the-art domain adaptation methods even though it does not require access to source data.

1. INTRODUCTION

In typical statistical machine learning algorithms, test data are assumed to stem from the same distribution as training data (Hastie et al., 2009) . However, this assumption is often violated in practical situations, and the trained model results in unexpectedly poor performance (Quionero-Candela et al., 2009) . This situation is called domain shift, and many researchers have intensely worked on domain adaptation (Csurka, 2017; Wilson & Cook, 2020) to overcome it. A common approach for domain adaptation is to jointly minimize a distributional discrepancy between domains in a feature space as well as the prediction error of the model (Wilson & Cook, 2020) , as shown in Fig. 1 Many domain adaptation algorithms assume that they can access labeled source data as well as target data during adaptation. This assumption is essentially required to evaluate the distributional discrepancy between domains as well as the accuracy of the model's prediction. However, it can be unreasonable in some cases, for example, due to data privacy issues or too large-scale source datasets to be handled at the environment where the adaptation is conducted. To tackle this problem, a few recent studies (Kundu et al., 2020; Li et al., 2020; Liang et al., 2020) have proposed source-free domain adaptation methods in which they do not need to access the source data. In source-free domain adaptation, the model trained with source data is given instead of source data themselves, and it is fine-tuned through adaptation with unlabeled target data so that the fine-tuned model works well in the target domain. Since it seems quite hard to evaluate the distributional discrepancy between unobservable source data and given target data, previous studies mainly focused on how to minimize the prediction error of the model with unlabeled target data, for example, by using pseudo-labeling (Liang et al., 2020) or a conditional generative model (Li et al., 2020) . However, due to lack of the distributional alignment, those methods heavily depend on noisy target labels obtained through the adaptation, which can result in unstable performance. In this paper, we propose a novel method for source-free domain adaptation. Figure 1 (b) shows our setup in comparison with that of typical domain adaptation methods shown in Fig. 1 (a). In our method, we explicitly minimize the distributional discrepancy between domains by utilizing batch normalization (BN) statistics stored in the pretrained model. Since we fix the pretrained classifier during adaptation, the BN statistics stored in the classifier can be regarded as representing the distribution of source features extracted by the pretrained encoder. Based on this idea, to minimize the discrepancy, we train the target-specific encoder so that the BN statistics of the target features extracted by the encoder match with those stored in the classifier. We also adopt information maximization as in Liang et al. (2020) to further boost the classification performance of the classifier in the target domain. Our method is apparently simple but effective; indeed, we will validate its advantage through extensive experiments on several benchmark datasets.

2. RELATED WORK

In this section, we introduce existing works on domain adaptation that are related to ours and also present a formulation of batch normalization.

2.1. DOMAIN ADAPTATION

Given source and target data, the goal of domain adaptation is to obtain a good prediction model that performs well in the target domain (Csurka, 2017; Wilson & Cook, 2020) . Importantly, the data distributions are significantly different between the domains, which means that we cannot simply train the model with source data to maximize the performance of the model for target data. Therefore, in addition to minimizing the prediction error using labeled source data, many domain adaptation algorithms try to align the data distributions between domains by adversarial training (Ganin et al., 2016; Tzeng et al., 2017; Deng et al., 2019; Xu et al., 2019) or explicitly minimizing a distributionaldiscrepancy measure (Long et al., 2015; Bousmalis et al., 2016; Long et al., 2017) . This approach has empirically shown excellent performance and is also closely connected to theoretical analysis (Ben-David et al., 2010) . However, since this distribution alignment requires access to source data, these methods cannot be directly applied to the source-free domain adaptation setting. In source-free domain adaptation, we can only access target data but not source data, and the model pretrained with the source data is given instead of the source data. This challenging problem has been tackled in recent studies. Li et al. (2020) proposed joint training of the target model and the conditional GAN (Generative Adversarial Network) (Mirza & Osindero, 2014) that is to generate annotated target data. Liang et al. (2020) explicitly divided the pretrained model into two modules,



(a). Deep neural networks (DNNs) are particularly popular for this joint training, and recent methods using DNNs have demonstrated excellent performance under domain shift(Wilson  & Cook, 2020).

General setup commonly adopted in recent typical domain adaptation methods. This visualization is inspired by(Wilson & Cook, 2020). Our setup for source-free domain adaptation.

Figure 1: Comparison between typical domain adaptation methods and our method. A rectangle with solid lines represents a trainable component, while that with dotted lines represent a fixed component during adaptation.

