TEST-TIME ADAPTATION AND ADVERSARIAL RO-BUSTNESS

Abstract

This paper studies test-time adaptation in the context of adversarial robustness. We formulate an adversarial threat model for test-time adaptation, where the defender may have a unique advantage as the adversarial game becomes a maximin game, instead of a minimax game as in the classic adversarial robustness threat model. We then study whether the maximin threat model admits more "good solutions" than the minimax threat model, and is thus strictly weaker. For this purpose, we first present a provable separation between the two threat models in a natural Gaussian data model. For deep learning, while we do not have a proof, we propose a candidate, Domain Adversarial Neural Networks (DANN), an algorithm designed for unsupervised domain adaptation, by showing that it provides nontrivial robustness in the test-time maximin threat model against strong transfer attacks and adaptive attacks. This is somewhat surprising since DANN is not designed specifically for adversarial robustness (e.g., against norm-based attacks), and provides no robustness in the minimax model. Complementing these results, we show that recent data-oblivious test-time adaptations can be easily attacked even with simple transfer attacks. We conclude the paper with various future directions of studying adversarially robust test-time adaptation.

1. INTRODUCTION

There is a surge of interest to study test-time adaptation to help generalization to unseen domains (e.g., recent work by Sun et al. (2020) ; Wang et al. (2020) ; Nado et al. (2020)) . At the high level, a generic test-time adaptation can be modeled as an algorithm Γ which accepts an (optional) labeled training dataset D, an (optional) model F trained on D (usually used as a starting point), and an unlabeled test feature set U , outputs a model F = Γ(F, D, U ), in order to achieve high test accuracy on U . For large test set U , test-time adaptation can be viewed as a form of transductive learning (Joachims (1999); Vapnik (1998)) (i.e., using D, U to train a model to predict specific instances in U ), which is argued to be easier than more traditional inductive learning. This paper studies test-time adaptation in the context of adversarial robustness (i.e., there is an active agent who tries to fool the test-time adaptation by perturbing the input so that F gives wrong predictions). There are several motivations in pursuing this direction. First, this question is of practical interest: Many practical ML pipelines run in a batch modefoot_0 , where they first collect a set of unlabelled data points, and then send them to a model (e.g. Nado et al. ( 2020)). In such cases, data in the batch may have been adversarially perturbed, and it is a natural question whether we can leverage the large batch size and test-time adaptation to enhance adversarial robustness. Second, from a purely theoretical point of view, since test-time adaptation is a form of transductive learning, it is intriguing to ask whether transductive adversarial learning can be easier, given that traditional adversarial robustness is formulated in the inductive learning setting (e.g. Madry et al. (2018) ). To this end, a recent work by Goldwasser et al. (2020) shows that, with transductive learning, one can achieve nontrivial guarantees for classes of bounded VC dimension with arbitrary train and test distributions. The current work complements their paper in the setting of deep learning. To study these questions, we formalize a threat model, which we call (test-time) maximin threat model, for the adversarial robustness of test-time adaptation. Recall that the classic adversarial



For example,Instagram collects a large batch of photos before sending them to a model to tag people. 1

