VARIATIONAL PSEUDO LABELS FOR META TEST-TIME ADAPTATION

Abstract

Test-time model adaptation has shown great effectiveness in generalizing over domain shifts. A most successful tactic for test-time adaptation conducts further optimization on the target data using the predictions by the source-trained model. However, due to domain shifts, the source-trained model predictions themselves can be largely inaccurate, which results in a model misspecified to the target data and therefore damages their adaptation ability. In this paper, we address test-time adaptation from a probabilistic perspective. We formulate model adaption as a probabilistic inference problem, which incorporates the uncertainty into source model predictions by modeling pseudo labels as distributions. Based on the probabilistic formalism, we propose variational pseudo labels that explore the information of neighboring target samples to improve pseudo labels and achieve a model better specified to target data. By a meta-learning paradigm, we train our model by simulating domain shifts and the test-time adaptation procedure. In doing so, our model learns the ability to generate more accurate pseudo-label distributions and to adapt to new domains. Experiments on five widely used datasets demonstrate the effectiveness of our proposal.

1. INTRODUCTION

Deep neural networks start to exhibit generalizability problems and suffer from performance degradation as soon as test data distributions differ from the ones experienced during training, (Geirhos et al., 2018; Recht et al., 2019) . To deal with the distribution shift, domain adaptation, e.g., (Saenko et al., 2010; Long et al., 2015; Lu et al., 2020; Li et al., 2021) and domain generalization, e.g., (Muandet et al., 2013; Motiian et al., 2017; Li et al., 2017; 2020) have proven effective tactics. However, these two settings either require a large number of (unlabeled) target data during training or do not consider any target information during generalization at all. Both of which are not necessarily valid assumptions in realistic scenarios. Test-time adaptation, e.g., (Sun et al., 2020; Varsavsky et al., 2020; Wang et al., 2021) goes beyond these two setting and introduces a new learning paradigm, which trains a model on source data and further optimizes it using the unlabeled target data at test time to adapt to the target domain. One widely applied strategy for test-time adaptation updates model parameters by self-supervision (Liang et al., 2020; Wang et al., 2021; Iwasawa & Matsuo, 2021; Niu et al., 2022) . However, due to domain shifts, the source-model predictions on the target samples can be uncertain and inaccurate. As self-supervision-based test-time adaptation is often achieved by optimization with pseudo labels or entropy minimization based on the source-trained model predictions, the model can be overconfident on some mispredictions. As a result, the adapted model becomes unreliable and misspecified (Wilson & Izmailov, 2020) to the target data. In this paper we make three contributions. First, we address test-time adaptation in a probabilistic framework by formulating it as a variational inference problem. We define pseudo labels as stochastic variables and estimate a distribution over them by variational inference. By doing so, the uncertainty in source-trained model predictions is incorporated into the adaptation to the target data at test time. Second, thanks to the proposed probabilistic formalism, it is natural and convenient to utilize variational distributions to leverage extra information. By hinging on this benefit, we design the variational pseudo labels to explore the neighboring information of target samples into the inference of the pseudo label distributions. By doing so, the variational pseudo labels are more accurate, which enables the source-trained model to be better specified to target data and therefore conducive to model adaptation. Third, we adopt a meta-learning paradigm for optimization to simulate test-time adaptation on source domains. More specifically, the model is exposed to domain shifts iteratively and optimized to learn the ability of adapting to unseen domains. We conduct experiments on three widely-used benchmarks to demonstrate the promise and effectiveness of our method for test-time adaptation.

2.1. PRELIMINARY

We were given data from different domains defined on the joint space X ×Y, where X and Y denote the data space and label space, respectively. The domains are split into several source domains D s = (x s , y s ) i Ns i=1 and target domains D t = (x t , y t ) i Nt i=1 . The goal is to train a model on source domains that is expected to generalize well on the (unseen) target domains. To this end, test-time adaptation methods, e.g., (Wang et al., 2021; Zhang et al., 2021; Niu et al., 2022) , have recently been proposed. These methods adapt the source-trained model by optimization to target domains at test time. A common strategy in these methods is that the model θ is first trained on source data D s by minimizing a supervised loss L train (θ)=E (xs,ys) i ∈Ds [L CE (x s , y s ; θ)]; and then at test time they adapt the model θ s to the target domain by optimization with certain surrogate losses, e.g., entropy minimization, based on unlabeled test data, which is formulated as: L test (θ) = E xt∈Dt [L E (x t ; θ s )], where the entropy is calculated on the source model predictions. However, test samples from the target domain could be largely misclassified by the source model due to the domain shift, resulting in large uncertainty in the predictions. Moreover, the entropy minimization tends to update the model with high confidence even for the wrong predictions, which would cause a misspecified model for the target domain. To solve those problems, in this work we address test-time model adaptation from a probabilistic perspective. We propose a probabilistic inference framework that models the uncertainty of the source-model predictions by defining distributions over pseudo labels. Moreover, under the probabilistic formalism, we propose designing variational pseudo labels, which enables the model to incorporate the neighboring information in test samples to combat domain shifts. We adopt a metalearning paradigm for optimization, which simulates the domain shifts and adaptation procedure. By doing so, the model learns to acquire the ability to further adapt itself with pseudo labels to unseen target domains. We provide a graphical illustration to highlight the differences between common test-time adaptation and our proposals in Figure 1 .

2.2. PROBABILISTIC TEST-TIME ADAPTATION WITH LATENT PSEUDO LABELS

We first provide a probabilistic formulation for test-time adaptation based on pseudo labels. Given the target sample x t and the source-trained model θ s , we would like to make predictions on the target sample. To this end, we formulate the predictive likelihood as follows: p(y t |x t , θ s ) = p(y t |x t , θ t )p(θ t |x t , θ s )dθ t ≈ p(y t |x t , θ * t ), where we use the value θ * t obtained by the maximum a posterior (MAP) to approximate the integration (Finn et al., 2018) . Intuitively, the MAP approximation is interpreted as inferring the posterior over θ t : p(θ t |x t , θ s ) ≈ δ(θ t = θ * t ), which we obtain by adapting θ s using the target data x t . To model the uncertainty of predictions for more robust test-time adaptation, we treat pseudo labels as stochastic variables in the probabilistic framework as shown in Figure 1 (b). The pseudo labels are obtained from the source model predictions, which follows categorical distributions. Then we

