ANYDA: ANYTIME DOMAIN ADAPTATION

Abstract

Unsupervised domain adaptation is an open and challenging problem in computer vision. While existing research shows encouraging results in addressing crossdomain distribution shift on common benchmarks, they are often constrained to testing under a specific target setting, limiting their impact for many real-world applications. In this paper, we introduce a simple yet effective framework for anytime domain adaptation that is executable with dynamic resource constraints to achieve accuracy-efficiency trade-offs under domain-shifts. We achieve this by training a single shared network using both labeled source and unlabeled data, with switchable depth, width and input resolutions on the fly to enable testing under a wide range of computation budgets. Starting with a teacher network trained from a label-rich source domain, we utilize bootstrapped recursive knowledge distillation as a nexus between source and target domains to jointly train the student network with switchable subnetworks. Experiments on multiple datasets well demonstrate the superiority of our approach over state-of-the-art methods. 1

1. INTRODUCTION

Unsupervised Domain Adaptation (UDA) which aims to adapt models trained on a labeled source domain to an unlabeled target domain has attracted intense attention in recent years. However, recent successful UDA approaches (Carlucci et al., 2019; Ganin et al., 2016; Li et al., 2020a; Prabhu et al., 2021; Sun et al., 2019; Tan et al., 2020; Tzeng et al., 2015; 2017) often rely on complicated network architectures and are limited to testing under a specific target setting, which may not be particularly suitable for applications across a wide range of platforms that present different resource constraints (see Figure 1a ). While adapting the trained model independently for all testing scenarios in the target domain with drastically different resource requirements looks like a possible option at the first glance, it is not efficient and economical, because of time-consuming training and benchmarking for each of these adaptation settings. Preferably, we want to be able to adjust the model, without the need of re-training or re-adaptation in the target domain, to run in high accuracy mode when resources are sufficient and switch to low accuracy mode when resources are limited. Motivated by this, in this paper, we investigate the problem of anytime domain adaptation where we have labeled training data from a source domain but no labeled data in the target domain and in addition testing at a resource setting with wide range of variation (e.g., see Figure 1b ). Specifically, we aim to train a single network using both labeled source and unlabeled target data that can directly run at arbitrary resource budget while being invariant to distribution shifts across both domains. This is an extremely relevant problem to address as it will provide a distinct opportunity for a more practical and efficient domain adaptation to favor different scenarios with different resource budgets. Recently, anytime prediction (Cai et al., 2019; Huang et al., 2018; Jie et al., 2019) that train a network to carry out inference under varying budget constraints have witnessed great success in many vision tasks. However, all these methods assume that the models are trained and tested using data coming from some fixed distribution and lead to substantially poor generalization when the two data distributions are different. The twin goals of aligning two domains and operating at different constrained computation budgets bring in additional challenges for anytime domain adaptation. To this end, we propose a simple yet effective method for anytime domain adaptation, called AnyDA, by considering domain alignment in addition to varying both network (width and depth) and input (resolution) scales to enable testing under a wide range of computation budgets. Such variation over width, depth and resolution enables tighter as well as finer coupling of efficiency-computation trade-off than prior works that only focus on one or two out of the three dimensions (Li et al., 2021a; Yang et al., 2020; Yu et al., 2019b) for in-domain data. In particular, we adopt a switchable network where individual subnetworks executable at variable computation budget share parameters with the full-budget network, known as the supernet. However, inability to leverage on the higher capacity of complex networks may, in effect, cause such a model to severely underperform, leading to suboptimal performance across different resource budgets. To alleviate this, we propose to distill (Ba & Caruana, 2014; Hinton et al., 2015) richer alignment information from higher capacity models to networks with limited computation. In particular, our proposed AnyDA, adopts two switchable networks as teacher and student, that interact and learn from each other. The student subnetworks are trained recursively to fit the output logits of an ensemble of larger subnetworks of the teacher. Such recursive distillation within a single network with only adaptive width is shown to improve generality and reduce performance gaps between high and low capacity networks (Li et al., 2021a) . Starting with the labeled source data, we build the teacher from past iterations of the student network as an exponential moving average (ema) of the student. The bootstrapped teacher provides the targets to train the student for an enhanced representation. Once the target data is available, the bootstrapped recursive distillation not only brings the target features close to the source but also transfers the learned knowledge to a smaller network for efficient inference. Moreover, we harness the categorical information by leveraging self-supervision through a pseudo-label loss on the student supernet to ensure a discriminative latent space for the unlabelled target images. Interestingly, without using any component for explicit domain alignment (e.g., a domain discriminator), we show that our approach trades the performance gracefully with decreasing budgets of the subnetworks. Our extensive experiments on 4 benchmark datasets show very minimal drop in performance across a wide range of computation budget (e.g., a maximum drop of only 1.1% is observed when the range of computation budget gets 8× small during testing in the Office-31 dataset (Saenko et al., 2010) ). Our work forges a connection between two literatures that have evolved mostly independently: anytime prediction and domain adaptation. This connection allows us to leverage the progress made in unsupervised representation learning to address the very practical problem of anytime domain adaptation. To summarize, our key contributions include: • We introduce a novel approach for anytime domain adaptation, that is executable with dynamic resource constraints to achieve accuracy-efficiency trade-offs under domain-shifts. We achieve this by training two networks as teacher and student with switchable depth, width and input resolutions to enable testing under a wide range of computation budgets. • We propose a bootstrapped recursive distillation approach to train the student subnets with the knowledge from the teacher network that not only brings the target features close to the source but also transfers the learned knowledge to a smaller network for efficient inference. • We perform extensive experiments on 4 benchmark datasets and demonstrate that AnyDA achieves superior performance over the state-of-the-art methods, more significantly at lower computation budgets. We also include comprehensive ablation studies to depict the importance of each module of our proposed framework.



Project page: https://cvir.github.io/projects/anyda



Figure 1: Instead of conventional adaptation under a specific computation budget, anytime domain adaptation focuses on training a model using both labeled source and unlabeled target data that can directly run at arbitrary resource budget in the target domain while being invariant to distribution shifts across both domains.

