KEY DESIGN CHOICES FOR DOUBLE-TRANSFER IN SOURCE-FREE UNSUPERVISED DOMAIN ADAPTATION

Abstract

Fine-tuning and Domain Adaptation emerged as effective strategies for efficiently transferring deep learning models to new target tasks. However, target domain labels are not accessible in many real-world scenarios. This led to the development of Unsupervised Domain Adaptation (UDA) methods, which only employ unlabeled target samples. Furthermore, efficiency and privacy requirements may also prevent the use of source domain data during the adaptation stage. This particularly challenging setting, known as Source-free Unsupervised Domain Adaptation (SF-UDA), is still understudied. In this paper, we systematically analyze the impact of the main design choices in SF-UDA through a large-scale empirical study on 500 models and 74 domain pairs. We identify the normalization approach, pretraining strategy, and backbone architecture as the most critical factors. Based on our observations, we propose recipes to best tackle SF-UDA scenarios. Moreover, we show that SF-UDA performs competitively also beyond standard benchmarks and backbone architectures, performing on par with UDA at a fraction of the data and computational cost. Experimental data and code will be released.

1. INTRODUCTION

The recent success of deep neural networks (DNNs) in many tasks and domains often relies on the availability of large annotated datasets. This can be tackled by pre-training DNNs on a large dataset and then fine-tuning their weights with target task data (Huh et al., 2016; Yosinski et al., 2014; Chu et al., 2016) . Furthermore, fine-tuning is usually simpler to perform and faster than training the model from scratch, the dataset size can be smaller, and the final performance is typically higher (with some exceptions: see (Kornblith et al., 2019) ). This approach is very convenient: the model requires a single expensive pre-training and can later be re-used for multiple down-stream tasks. This is a good example of transfer learning (Zhuang et al., 2021) , which leverages on the information acquired from a task to improve accuracy on another task of interest. Two relevant examples of transfer learning are Domain Adaptation (DA), that, given different-yet related-tasks, exploits source domain(s) data to improve performance on different known target domain(s), and Domain Generalization (DG), that aims to generalize to unknown target(s). As opposed to fine-tuning, in which the pre-training and downstream tasks can be significantly different, DA and DGen require stronger assumptions on the similarity between tasks, e.g., leveraging synthetic images to improve the classification of real images that share the same label space. DA is also related to Multi-task Learning (MTL) (Caruana, 1997; Ciliberto et al., 2017) and Multi-domain Learning (MDL) (Joshi et al., 2012) . In fact, domains can be seen as tasks in MTL or MDL. However, an explicit domain label is provided and annotated examples are available for each task. A particularly challenging and useful setting in practice is Unsupervised Domain Adaptation (UDA) (Tzeng et al., 2017; Ganin et al., 2016) , in which labeled samples from a source domain are used together with unlabeled samples from the target domain to improve performance on the latter. This work focuses on Source-Free Unsupervised Domain Adaptation (SF-UDA) (Liang et al., 2020) for the image classification task. SF-UDA is a two-steps sequential version of UDA in which the source-domain labeled data is only accessible in the first training phase. Adaptation to the new domain is carried out in a second stage where only the unlabeled data from the target domain is available. SF-UDA nicely matches applications where continual adaptation is required with computational and memory constraints, or where privacy policies prevent access to the source data. Since 1

