FEED-FORWARD LATENT DOMAIN ADAPTATION

Abstract

We study the highly practical but comparatively under-studied problem of latentdomain adaptation, where a source model should be adapted to a target dataset that contains a mixture of unlabelled domain-relevant and domain-irrelevant examples. Furthermore, motivated by the requirements for data privacy and the need for embedded and resource-constrained devices of all kinds to adapt to local data distributions, we focus on the setting of feed-forward source-free domain adaptation, where adaptation should not require access to the source dataset, and also be back propagation-free. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvement on strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection.

1. INTRODUCTION

Domain shift presents a real-world challenge for the application of machine learning models because performance degrades when deployment data are not from the training data distribution. This issue is ubiquitious as it is often impossible or prohibitively costly to pre-collect and annotate training data that is sufficiently representative of test data statistics. The field of domain adaptation (Kouw & Loog, 2021; Csurka et al., 2022) has therefore attracted a lot of attention with the promise of adapting models during deployment to perform well using only unlabeled deployment data. The main body of work in deep domain adaptation assumes that there is a pre-specified source domain and a pre-specified target domain. An unlabeled adaptation set is provided from the target domain, and various methods define different learning objectives that update a deep model on the unlabeled adaptation set, with the aim of improving performance on new test data drawn from the target domain. In this paper we make two main contributions: A conceptual contribution, of a highly practical variant of the domain adaptation problem; and an algorithm for effective domain adaptation in this condition. A motivating scenario Let us introduce one potential illustrative application scenario that motivates the variant of the domain adaptation problem that we propose here. Suppose that a robot or other mobile embedded vision system needs to recognise objects. Because it is mobile, it may encounter objects in different unconstrained contexts, e.g., indoor or outdoor backgrounds, sunny or rainy weather, rooms with lights on or lights off, etc. The robot's object recognition model should adapt in order to maintain strong performance across all these conditions, for example by adapting based on a buffer of recently experienced unlabelled images. However, unlike standard pre-defined domain adaptation benchmarks with neatly curated domains, there are two new challenges: 1) Using such a buffer as the adaptation set means that the adaptation data can be of mixed relevance to the test image to be processed at any given instant. For example the recent history used for adaptation may span multiple rooms, while any individual test image comes from a specific room. 2) The adaptation needs to happen on-board the robot and ideally happen in real-time as the adaptation set itself is updated over time. The first challenge is the latent domain challenge, wherein uncurated adaptation sets do not have consistent relevance to a given test image (Fig. 1 ) The second challenge requires adaptation to take place without back-propagation (which is too costly and not supported on most embedded platforms). It means adaptation should be feed-forward. Latent domain adaptation While domain adaptation is very well studied (Kouw & Loog, 2021; Csurka et al., 2022) , the vast majority of work assumes that instances have been pre-grouped into one or more subsets (domains) that differ statistically across groups, while being similar within groups. We join a growing minority (Mancini et al., 2021; Deecke et al., 2022; Hoffman et al., 2014; Wang et al., 2022) in arguing that this is an overly restrictive assumption that does not hold in most real applications of interest. Some collection processes may not provide meta-data suitable for defining domain groupings. Alternatively, for other data sources that occur with rich meta-data there may be no obviously correct grouping and existing domain definitions may be sub-optimal (Deecke et al., 2022) . Consider the popular iWildCam (Beery et al., 2020) benchmark for animal detection within the WILDS Koh et al. ( 2021) suite. The default setup within WILDS defines domains by camera ID. But given that images span different weather conditions and day/night cycles as well as cameras, such domains may neither be internally homogenous, nor similarly distinct. There may be more transferability between images from nearby cameras at similar times of day than between images from the same camera taken on a sunny day vs a snowy night. As remarked by Hoffman et al. ( 2014); Wang et al. ( 2022), domains may more naturally define a continuum, rather than discrete groups. That continuum may even be multi-dimensional -such as timestamp of image and spatial proximity of cameras. Our latent domain formulation of the domain adaptation problem spans all these situations where domains are hard to define, while aligning with the requirements of real use cases. Feed-forward domain adaptation Unsupervised domain adaptation aims to adapt models from source datasets (e.g. ImageNet) to the peculiarities of target data distributions in the wild. The mainstream line of work here updates models by backpropagation on an adaptation set from the target data distribution (Kouw & Loog, 2021; Csurka et al., 2022) (and often simultaneously uses the source data (Liang et al., 2020) ). We consider adaptation under the practical constraints of an edge device, namely that neither the hardware capability nor the software stack support back-propagation. Therefore we focus on the feed-forward condition where adaptation algorithms should use only feed-forward operations, and only the target dataset (source free condition (Liang et al., 2020) ). For example, simply updating batch normalisation statistics, which can be done without back-propagation, provides a strong baseline for backprop-free adaptation (Schneider et al., 2020; Zhang et al., 2021) . Our solution To solve the challenge posed above, we propose a feed-forward adaptation framework based on cross-attention between test instances and the adaptation set. The cross-attention module is meta-learned based on a set of training domains, inspired by Zhang et al. (2021) . This is a one-off cost paid up-front and performed on a server, after which the actual adaptation is fast. The deployed recognition model flexibly enables each inference operation to draw upon any part of the target adaptation set, exploiting each adaptation instance to a continuous degree. This can improve performance by eliminating adaptation instances that would be conventionally in-domain yet lead to negative transfer (e.g., same camera/opposite time of day), and include transfer from adaptation instances that would conventionally be out-of-domain but could benefit transfer (e.g., similar images/different camera). Our experiments show that our cross-attention approach provides useful adaptation in this highly practical setting across a variety of synthetic and real benchmarks.

2. BACKGROUND AND RELATED WORK

Test-time domain adaptation TTDA has emerged as a topical adaptation scenario that focuses on model adaptation without access to source data at adaptation time (i.e., source-free condition), and further adapts to each mini-batch at test time, aligning with an online adaptation scenario. A metalearning framework for TTDA has been recently proposed under the name adaptive risk minimization (ARM) (Zhang et al., 2021) . ARM provides a variety of options for how TTDA is done, including context network that embeds information from the whole minibatch, updates to batch normalization statistics and gradient-based fine-tuning on the minibatch. ARM learns to do TTDA by meta-learning across a large number of tasks. TENT (Wang et al., 2021) is another TTDA method and is based on optimizing channel-wise affine transformation according to the current minibatch.

Latent domains

In settings with latent domains, information about domains is not available i.e. there are no domain labels. Further, some domains may be more similar to each other so the boundaries between domains are often blurred. Various approaches have been proposed to deal with latent domains, e.g. sparse latent adapters (Deecke et al., 2022) , domain agnostic learning (Peng et al., 2019) that disentangles domain-specific features from class information using an autoencoder

