TEST-TIME ADAPTATION VIA SELF-TRAINING WITH NEAREST NEIGHBOR INFORMATION

Abstract

Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier's prediction on the test data as pseudo-label. However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. The pseudo-label generation is based on the basic intuition that a test data and its nearest neighbor in the embedding space are likely to share the same label under the domain shift. By utilizing multiple randomly initialized adaptation modules, TAST extracts useful information for the classification of the test data under the domain shift, using the nearest neighbor information. TAST showed better performance than the state-of-the-art TTA methods on two standard benchmark tasks, domain generalization, namely VLCS, PACS, OfficeHome, and TerraIncognita, and image corruption, particularly CIFAR-10/100C.

1. INTRODUCTION

Deep neural networks often encounter significant performance degradations under domain shift (i.e., distribution shift). This phenomenon has been observed in various tasks including classification (Taori et al., 2020; Wang et al., 2021b ), visual recognition (Saenko et al., 2010; Csurka, 2017) , and reinforcement learning (Cobbe et al., 2019; Mendonca et al., 2020; Lee and Chung, 2021b) . There are two broad classes of domain adaptation methods that attempt to solve this problem: supervised domain adaptation (SDA) (Tzeng et al., 2015; Motiian et al., 2017) and unsupervised domain adaptation (UDA) (Ganin and Lempitsky, 2015; Long et al., 2016; Sener et al., 2016) . Both SDA and UDA methods aim to obtain domain-invariant representations by aligning the representations of training and test data closely in the embedding space. While testing, UDA methods require the training dataset and SDA methods additionally require labeled data of the test domain. However, in practice, it is often difficult to access training datasets or labeled data in the test domain during test time, due to data security or labeling cost. Test-time adaptation (TTA) (Iwasawa and Matsuo, 2021; Wang et al., 2021a ) is a prominent approach to alleviate the problems caused by the domain shift. TTA methods aim to adapt the trained model to the test domain without a labeled dataset in the test domain and any information related to the training procedure (e.g., training dataset, feature statistics of training domain (Sun et al., 2020; 1 

availability

Our code is available at https://github.com/mingukjang/

