VISUAL PROMPT TUNING FOR TEST-TIME DOMAIN ADAPTATION Anonymous

Abstract

Models should be able to adapt to unseen data during test-time to avoid performance drops caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called Data-efficient Prompt Tuning (DePT) with two key ingredients. First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation. We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective. Second, DePT bootstraps the source representation to the target domain by memory bank-based online pseudo-labeling. A hierarchical self-supervised regularization specially designed for prompts is jointly optimized to alleviate error accumulation during self-training. With much fewer tunable parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks VisDA-C, ImageNet-C, and DomainNet-126, but also superior data efficiency, i.e., adaptation with only 1% or 10% data without much performance degradation compared to 100% data. In addition, DePT is also versatile to be extended to online or multi-source TTA settings.

1. INTRODUCTION

Deep neural networks achieve excellent performance when the testing data (target domain) follow the same distribution as the training data (source domain). However, their generalization ability on the testing data is not guaranteed when a distribution shift occurs between the source and the target. Even simple domain shifts, like common corruptions (Hendrycks & Dietterich, 2019) or appearance variations (Geirhos et al., 2018) , can lead to a significant performance drop. Solving the problem of domain shift is non-trivial. On one hand, it is impossible to train a single model to cover an infinite number of domains. On the other hand, training individual models for each domain requires lots of annotated samples, which induces significant data collection and labeling costs. In this paper, we tackle the practical yet challenging test-time domain adaptation (TTA) problem. Compared with conventional unsupervised domain adaptation (UDA) (Long et al., 2015) , TTA adapts the source domain initialized model with the unlabeled target domain data during testing without access to source data. TTA avoids the privacy issue of sharing the source data and is desirable for real-world applications. We focus on both offline and online TTA settings. For offline adaptation, also known as source-free adaptation (Kundu et al., 2020; Liang et al., 2020) , the model is first updated with unlabeled target data and then inference. For online adaptation, the model keeps updating and inferencing at the same time, given the testing data that comes in a stream. The key challenges of TTA lie in two folds. First, how to effectively modulate the source domain initialized model given a noisy unsupervised learning objective? Tent (Wang et al., 2020) optimizes the parameters of batch normalization layers, which is parameter-efficient but lacks adaptation capacity. On the other hand, SHOT (Liang et al., 2020) tunes the feature encoder; AdaContrast (Chen et al., 2022) trains the whole model. Given the current over-parameterized models, these methods are prone to overfitting to the unreliable unsupervised learning objective, especially when the amount of target domain data is limited. We present more evidences in Appendix A to illustrate our motivation to use visual prompt tuning. Second, given only unlabeled target domain data, use

