TEST-TIME ADAPTATION AND ADVERSARIAL RO-BUSTNESS

Abstract

This paper studies test-time adaptation in the context of adversarial robustness. We formulate an adversarial threat model for test-time adaptation, where the defender may have a unique advantage as the adversarial game becomes a maximin game, instead of a minimax game as in the classic adversarial robustness threat model. We then study whether the maximin threat model admits more "good solutions" than the minimax threat model, and is thus strictly weaker. For this purpose, we first present a provable separation between the two threat models in a natural Gaussian data model. For deep learning, while we do not have a proof, we propose a candidate, Domain Adversarial Neural Networks (DANN), an algorithm designed for unsupervised domain adaptation, by showing that it provides nontrivial robustness in the test-time maximin threat model against strong transfer attacks and adaptive attacks. This is somewhat surprising since DANN is not designed specifically for adversarial robustness (e.g., against norm-based attacks), and provides no robustness in the minimax model. Complementing these results, we show that recent data-oblivious test-time adaptations can be easily attacked even with simple transfer attacks. We conclude the paper with various future directions of studying adversarially robust test-time adaptation.

1. INTRODUCTION

There is a surge of interest to study test-time adaptation to help generalization to unseen domains (e.g., recent work by Sun et al. (2020) ; Wang et al. (2020) ; Nado et al. (2020)) . At the high level, a generic test-time adaptation can be modeled as an algorithm Γ which accepts an (optional) labeled training dataset D, an (optional) model F trained on D (usually used as a starting point), and an unlabeled test feature set U , outputs a model F = Γ(F, D, U ), in order to achieve high test accuracy on U . For large test set U , test-time adaptation can be viewed as a form of transductive learning (Joachims (1999); Vapnik (1998)) (i.e., using D, U to train a model to predict specific instances in U ), which is argued to be easier than more traditional inductive learning. This paper studies test-time adaptation in the context of adversarial robustness (i.e., there is an active agent who tries to fool the test-time adaptation by perturbing the input so that F gives wrong predictions). There are several motivations in pursuing this direction. First, this question is of practical interest: Many practical ML pipelines run in a batch modefoot_0 , where they first collect a set of unlabelled data points, and then send them to a model (e.g. Nado et al. (2020) ). In such cases, data in the batch may have been adversarially perturbed, and it is a natural question whether we can leverage the large batch size and test-time adaptation to enhance adversarial robustness. Second, from a purely theoretical point of view, since test-time adaptation is a form of transductive learning, it is intriguing to ask whether transductive adversarial learning can be easier, given that traditional adversarial robustness is formulated in the inductive learning setting (e.g. Madry et al. ( 2018)). To this end, a recent work by Goldwasser et al. (2020) shows that, with transductive learning, one can achieve nontrivial guarantees for classes of bounded VC dimension with arbitrary train and test distributions. The current work complements their paper in the setting of deep learning. To study these questions, we formalize a threat model, which we call (test-time) maximin threat model, for the adversarial robustness of test-time adaptation. Recall that the classic adversarial robustness game is a minimax game min F E V [max V L(F, V )], where V is random sampled data, V is the perturbed data generated from V by the adversary, and L(F, V ) is the loss of the model F on V . By contrast, in the maximin threat model, we allow V to be sampled from a different domain, and the game is maximin: E V [max U min F L( F , V )] (where U is the perturbed features of V , subject to the attack type, and V is the labeled perturbed data, see Definition 2). By the maximin inequality, it follows that this threat model is no harder than the minimax model (to allow source and target domains to be different, we need to generalize the classic minimax model, see Definition 3). We then move on to the focus of this work: Whether the maximin threat model is "strictly weaker" than the minimax threat model. We note that any good defender solution (a robust model) in the minimax game induces a good defender solution in the maximin game (an adaptation algorithm that outputs that robust model), thus intuitively, the good defender solutions of the minimax model is a subset of the good defender solutions of the maximin threat model. We ask whether such a containment is proper: That is, whether there exists a defender solution that is good in the maximin threat model, but is bad in the minimax threat model. The existence of such a defender will demonstrate that the maximin threat model admits more good solutions. Besides theoretical interest, this question is also of practical importance since these "new" solutions may possess desirable properties that good solutions in the minimax threat model may lack. For example, one such property is that the defender solution is attack agnostic (Goodfellow (2018) (pp.30)): That is, the solution is not to directly optimize the performance measure for a particular type of perturbationfoot_1 . To this end, we first present a provable separation between the maximin and minimax threat models in a natural Gaussian data model. In fact, the separation holds even when U only contains a single point, indicating the power of transductive learning. We then move to deep learning. While we do not have provable guarantees, we empirically examine Domain Adverarial Neural Networks (DANN) (Ganin et al. ( 2017)), an algorithm designed for unsupervised domain adaptation (UDA), as a candidate for the separation. Specifically, we demonstrate that DANN provides nontrivial testtime adversarial robustness against both transfer attacks and adaptive attacks, in both homogeneous and inhomogeneous cases. This is somewhat surprising as DANN is attack agnostic as we mentioned above, and has not been considered for adversarial robustness. Not surprisingly, as we hypothesized for a separation, the accuracy becomes very low when evaluating F in the minimax model. Complementing the above result, we explore the maximin robustness of the recent data-oblivious adaptation algorithms (namely, the adaptation algorithms do not use D, but just the pretrained model F and unlabeled test set U ). Specifically, we consider Test-Time Training (TTT) by Sun et al. (2020) 3 . We show that TTT can be easily attacked using simple transfer attacks. While this is not surprising as authors of Sun et al. (2020) have cautioned that TTT is not designed for adversarial robustness, the situation is in sharp contrast to our results with DANN. The rest of the paper is organized as follows: Section 2 presents the setup. In Section 3 we define threat models. In Section 4 we present theoretical results about separation, and examine DANN as a candidate separation in the deep learning. Finally, Section 5 explores the maximin robustness of oblivious test-time adaptation, and concludes the paper with future directions.

2. PRELIMINARIES

Let F be a model, for a data point (x, y) ∈ X × Y, a loss function (F ; x, y) give the loss of F on x given the true label y. Let V be a set of labeled data points. We use the notation L(F, V ) = 1 |V | (x,y)∈V (F ; x, y) to denote the empirical loss of F on V . For example, if we use binary loss 0,1 (F ; x, y) = 1[F (x) = y], this gives the test error of F on V . We use the notation V | X to denote the projection of V to its features, that is {(x i , y i )} m i=1 → {x 1 , . . . , x m }. Threat model for classic adversarial robustness. To formulate the threat model for test-time adaptation, we first present a threat model for the classic adversarial robustness. Although the classic adversarial robustness can be written down succinctly as a minimax objective, namely



For example,Instagram collects a large batch of photos before sending them to a model to tag people. Another consideration, which is beyond the scope of this paper, is the computational feasibility of finding a good solution, given the hardness of minimax optimizationKatz et al. (2017); Daskalakis et al. (2020). While TTT does not use training data D at the test time, it has a special self-training component, and the joint architecture is a Y -structure. A more domain agnostic approach is discussed in Wang et al. (2020).

