TENT: FULLY TEST-TIME ADAPTATION BY ENTROPY MINIMIZATION

Abstract

A model must adapt itself to generalize to new and different data during testing. In this setting of fully test-time adaptation the model has only the test data and its own parameters. We propose to adapt by test entropy minimization (tent 1 ): we optimize the model for confidence as measured by the entropy of its predictions. Our method estimates normalization statistics and optimizes channel-wise affine transformations to update online on each batch. Tent reduces generalization error for image classification on corrupted ImageNet and CIFAR-10/100 and reaches a new state-of-the-art error on ImageNet-C. Tent handles source-free domain adaptation on digit recognition from SVHN to MNIST/MNIST-M/USPS, on semantic segmentation from GTA to Cityscapes, and on the VisDA-C benchmark. These results are achieved in one epoch of test-time optimization without altering training.

1. INTRODUCTION

Deep networks can achieve high accuracy on training and testing data from the same distribution, as evidenced by tremendous benchmark progress (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016) . However, generalization to new and different data is limited (Hendrycks & Dietterich, 2019; Recht et al., 2019; Geirhos et al., 2018) . Accuracy suffers when the training (source) data differ from the testing (target) data, a condition known as dataset shift (Quionero-Candela et al., 2009) . Models can be sensitive to shifts during testing that were not known during training, whether natural variations or corruptions, such as unexpected weather or sensor degradation. Nevertheless, it can be necessary to deploy a model on different data distributions, so adaptation is needed. During testing, the model must adapt given only its parameters and the target data. This fully test-time adaptation setting cannot rely on source data or supervision. Neither is practical when the model first encounters new testing data, before it can be collected and annotated, as inference must go on. Real-world usage motivates fully test-time adaptation by data, computation, and task needs: 1. Availability. A model might be distributed without source data for bandwidth, privacy, or profit. 2. Efficiency. It might not be computationally practical to (re-)process source data during testing. 3. Accuracy. A model might be too inaccurate without adaptation to serve its purpose. To adapt during testing we minimize the entropy of model predictions. We call this objective the test entropy and name our method tent after it. We choose entropy for its connections to error and shift. Entropy is related to error, as more confident predictions are all-in-all more correct (Figure 1 ). Entropy is related to shifts due to corruption, as more corruption results in more entropy, with a strong rank correlation to the loss for image classification as the level of corruption increases (Figure 2 ). To minimize entropy, tent normalizes and transforms inference on target data by estimating statistics and optimizing affine parameters batch-by-batch. This choice of low-dimensional, channel-wise feature modulation is efficient to adapt during testing, even for online updates. Tent does not restrict or alter model training: it is independent of the source data given the model parameters. If the model can be run, it can be adapted. Most importantly, tent effectively reduces not just entropy but error. Our results evaluate generalization to corruptions for image classification, to domain shift for digit recognition, and to simulation-to-real shift for semantic segmentation. For context with more data and optimization, we evaluate methods for robust training, domain adaptation, and self-supervised learning given the labeled source data. Tent can achieve less error given only the target data, and it improves on the state-of-the-art for the ImageNet-C benchmark. Analysis experiments support our entropy objective, check sensitivity to the amount of data and the choice of parameters for adaptation, and back the generality of tent across architectures.

Our contributions

• We highlight the setting of fully test-time adaptation with only target data and no source data. To emphasize practical adaptation during inference we benchmark with offline and online updates. • We examine entropy as an adaptation objective and propose tent: a test-time entropy minimization scheme to reduce generalization error by reducing the entropy of model predictions on test data. • For robustness to corruptions, tent reaches 44.0% error on ImageNet-C, better than the state-ofthe-art for robust training (50.2%) and the strong baseline of test-time normalization (49.9%). • For domain adaptation, tent is capable of online and source-free adaptation for digit classification and semantic segmentation, and can even rival methods that use source data and more optimization.

2. SETTING: FULLY TEST-TIME ADAPTATION

Adaptation addresses generalization from source to target. A model f θ (x) with parameters θ trained on source data and labels x s , y s may not generalize when tested on shifted target data x t . Table 1 summarizes adaptation settings, their required data, and types of losses. Our fully test-time adaptation setting uniquely requires only the model f θ and unlabeled target data x t for adaptation during inference. Existing adaptation settings extend training given more data and supervision. Transfer learning by fine-tuning (Donahue et al., 2014; Yosinski et al., 2014) needs target labels to (re-)train with a supervised loss L(x t , y t ). Without target labels, our setting denies this supervised training. Domain adaptation (DA) (Quionero-Candela et al., 2009; Saenko et al., 2010; Ganin & Lempitsky, 2015; Tzeng et al., 2015) needs both the source and target data to train with a cross-domain loss L(x s , x t ). Test-time training (TTT) (Sun et al., 2019b) adapts during testing but first alters training to jointly optimize its supervised loss L(x s , y s ) and self-supervised loss L(x s ). Without source, our setting denies joint training across domains (DA) or losses (TTT). Existing settings have their purposes, but do not cover all practical cases when source, target, or supervision are not simultaneously available. Unexpected target data during testing requires test-time adaptation. TTT and our setting adapt the model by optimizing an unsupervised loss during testing L(x t ). During training, TTT jointly optimizes this same loss on source data L(x s ) with a supervised loss L(x s , y s ), to ensure the parameters θ are shared across losses for compatibility with adaptation by L(x t ). Fully test-time adaptation is independent of the training data and training loss given the parameters θ. By not changing training, our setting has the potential to require less data and computation for adaptation.  ) + L(x s , x t ) - test-time training x s , y s x t L(x s , y s ) + L(x s ) L(x t ) fully test-time adaptation - x t - L(x t ) = f ( ; θ) Loss ( , ) θ (a) training < l a t e x i t s h a 1 _ b a s e 6 4 = " Y D I W 6 M i / j c 4 g n v 8 4 3 z X e d R T i l J U = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 m F 6 5 4 l b d G c g y 8 X J S g R y N X v m r 2 4 9 Z G n G F T F J j O p 6 b o J 9 R j Y J J P i l 1 U 8 M T y k Z 0 w D u W K h p x 4 2 e z U y f k x C p 9 E s b a l k I y U 3 9 P Z D Q y Z h w F t j O i O D S L 3 l T 8 z + u k G F 7 4 m V B J i l y x + a I w l Q R j M v 2 b 9 I X m D O X Y E s q 0 s L c S N q S a M r T p l G w I 3 u L L y 6 R 1 V v V q V c + 7 q V X q l 3 k e R T i C Y z g F D 8 6 h D t f Q g C Y w G M A z v M K b I 5 0 X 5 9 3 5 m L c W n H z m E P 7 A + f w B B S a O G g = = < / l a t e x i t > x s < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 p s C F / 1 O y j u Z 4 T w B u 2 1 v o N G 0 X a I = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 e K 1 h b a W D b b S b t 0 s w m 7 G y G E / g Q v H h T E q 3 / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d k o r q 2 v r G + X N y t b 2 z u 5 e d f / g Q c e p Y t h i s Y h V J 6 A a B Z f Y M t w I 7 C Q K a R Q I b A f j 6 6 n f f k K l e S z v T Z a g H 9 G h 5 C F n 1 F j p L n v U / W r N r b s z k G X i F a Q G B Z r 9 6 l d v E L M 0 Q m m Y o F p 3 P T c x f k 6 V 4 U z g p N J L N S a U j e k Q u 5 Z K G q H 2 8 9 m p E 3 J i l Q E J Y 2 V L G j J T f 0 / k N N I 6 i w L b G V E z 0 o v e V P z P 6 6 Y m v P R z L p P U o G T z R W E q i I n J 9 G 8 y 4 A q Z E Z k l l C l u b y V s R B V l x q Z T s S F 4 i y 8 v k / Z Z 3 T u v e 9 7 t e a 1 x V e R R h i M 4 h l P w 4 A I a c A N N a A G D I T z D K 7 w 5 w n l x 3 p 2 P e W v J K W Y O 4 Q + c z x 8 G r Y 4 b < / l a t e x i t > y s < l a t e x i t s h a 1 _ b a s e 6 4 = " X l B 1 P O i M M x M F B s x K q M 8 k 7 a n L b z Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 m j 6 1 Z p b d 2 c g y 8 Q r S A 0 K t P r V r 9 4 g Z m n E F T J J j e l 6 b o J + T j U K J v m k 0 k s N T y g b 0 y H v W q p o x I 2 f z w 6 e k B O r D E g Y a 1 s K y U z 9 P Z H T y J g s C m x n R H F k F r 2 p + J / X T T G 8 8 H O h k h S 5 Y v N F Y S o J x m T 6 P R k I z R n K z B L K t L C 3 E j a i m j K 0 G V V s C N 7 i y 8 u k c 1 b 3 G n X P u 2 n U m p d F H m U 4 g m M 4 B Q / O o Q n X 0 I I 2 M I j g G V 7 h z d H O i / P u f M x b S 0 4 x c w h / 4 H z + A N h z k O g = < / l a t e x i t > ŷs < l a t e x i t s h a 1 _ b a s e 6 4 = " X l B 1 P O i M M x M F B s x K q M 8 k 7 a n L b z Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 m j 6 1 Z p b d 2 c g y 8 Q r S A 0 K t P r V r 9 4 g Z m n E F T J J j e l 6 b o J + T j U K J v m k 0 k s N T y g b 0 y H v W q p o x I 2 f z w 6 e k B O r D E g Y a 1 s K y U z 9 P Z H T y J g s C m x n R H F k F r 2 p + J / X T T G 8 8 H O h k h S 5 Y v N F Y S o J x m T 6 P R k I z R n K z B L K t L C 3 E j a i m j K 0 G V V s C N 7 i y 8 u k c 1 b 3 G n X P u 2 n U m p d F H m U 4 g m M 4 B Q / O o Q n X 0 I I 2 M I j g G V 7 h z d H O i / P u f M x b S 0 4 x c w h / 4 H z + A N h z k O g = < / l a t e x i t > ŷs < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 p s C F / 1 O y j u Z 4 T w B u 2 1 v o N G 0 X a I = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 e K 1 h b a W D b b S b t 0 s w m 7 G y G E / g Q v H h T E q 3 / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d k o r q 2 v r G + X N y t b 2 z u 5 e d f / g Q c e p Y t h i s Y h V J 6 A a B Z f Y M t w I 7 C Q K a R Q I b A f j 6 6 n f f k K l e S z v T Z a g H 9 G h 5 C F n 1 F j p L n v U / W r N r b s z k G X i F a Q G B Z r 9 6 l d v E L M 0 Q m m Y o F p 3 P T c x f k 6 V 4 U z g p N J L N S a U j e k Q u 5 Z K G q H 2 8 9 m p E 3 J i l Q E J Y 2 V L G j J T f 0 / k N N I 6 i w L b G V E z 0 o v e V P z P 6 6 Y m v P R z L p P U o G T z R W E q i I n J 9 G 8 y 4 A q Z E Z k l l C l u b y V s R B V l x q Z T s S F 4 i y 8 v k / Z Z 3 T u v e 9 7 t e a 1 x V e R R h i M 4 h l P w 4 A I a c A N N a A G D I T z D K 7 w 5 w n l x 3 p 2 P e W v J K W Y O 4 Q + c z x 8 G r Y 4 b < / l a t e x i t > y s < l a t e x i t s h a 1 _ b a s e 6 4 = " X l B 1  P O i M M x M F B s x K q M 8 k 7 a n L b z Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 m j 6 1 Z p b d 2 c g y 8 Q r S A 0 K t P r V r 9 4 g Z m n E F T J J j e l 6 b o J + T j U K J v m k 0 k s N T y g b 0 y H v W q p o x I 2 f z w 6 e k B O r D E g Y a 1 s K y U z 9 P Z H T y J g s C m x n R H F k F r 2 p + J / X T T G 8 8 H O h k h S 5 Y v N F Y S o J x m T 6 P R k I z R n K z B L K t L C 3 E j a i m j K 0 G V V s C N 7 i y 8 u k c 1 b 3 G n X P u 2 n U m p d F H m U 4 g m M 4 B Q / O o Q n X 0 I I 2 M I j g G V 7 h z d H O i / P u f M x b S 0 4 x c w h / 4 H z + A N h z k O g = < / l a t e x i t > ŷs < l a t e x i t s h a 1 _ b a s e 6 4 = " Y D I W 6 M i / j c 4 g n v 8 4 3 z X e d R T i l J U = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 m F 6 5 4 l b d G c g y 8 X J S g R y N X v m r 2 4 9 Z G n G F T F J j O p 6 b o J 9 R j Y J J P i l 1 U 8 M T y k Z 0 w D u W K h p x 4 2 e z U y f k x C p 9 E s b a l k I y U 3 9 P Z D Q y Z h w F t j O i O D S L 3 l T 8 z + u k G F 7 4 m V B J i l y x + a I w l Q R j M v 2 b 9 I X m D O X Y E s q 0 s L c S N q S a M r T p l G w I 3 u L L y 6 R 1 V v V q V c + 7 q V X q l 3 k e R T i C Y z g F D 8 6 h D t f Q g C Y w G M A z v M K b I 5 0 X 5 9 3 5 m L c W n H z m E P 7 A + f w B B S a O G g = = < / l a t e x i t > x s (b) fully test-time adaptation θ = f ( ; θ+Δ) Entropy ( ) < l a t e x i t s h a 1 _ b a s e 6 4 = " u z C f v i + o t Y u 2 i h j b D 7 P a M p 0 J G 7 Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 o j 9 a s 2 t u z O Q Z e I V p A Y F W v 3 q V 2 8 Q s z T i C p m k x n Q 9 N 0 E / p x o F k 3 x S 6 a W G J 5 S N 6 Z B 3 L V U 0 4 s b P Z w d P y I l V B i S M t S 2 F Z K b + n s h p Z E w W B b Y z o j g y i 9 5 U / M / r p h h e + L l Q S Y p c s f m i M J U E Y z L 9 n g y E 5 g x l Z g l l W t h b C R t R T R n a j C o 2 B G / x 5 W X S O a t 7 j b r n 3 T R q z c s i j z I c w T G c g g f n 0 I R r a E E b G E T w D K / w 5 m j n x X l 3 P u a t J a e Y O Y Q / c D 5 / A N n 4 k O k = < / l a t e x i t > ŷt < l a t e x i t s h a 1 _ b a s e 6 4 = " m / Z z d j A C t P k 7 V V v 8 q M U M x H k n 5 f 0 = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 Y K 9 c c a v u D G S Z e D m p Q I 5 G r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D C / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 6 q 3 q 1 q u f d 1 C r 1 y z y P I h z B M Z y C B + d Q h 2 t o Q B M Y D O A Z X u H N k c 6 L 8 + 5 8 z F s L T j 5 z C H / g f P 4 A B q u O G w = = < / l a t e x i t > x t < l a t e x i t s h a 1 _ b a s e 6 4 = " u z C f v i + o t Y u 2 i h j b D 7 P a M p 0 J G 7 Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 o j 9 a s 2 t u z O Q Z e I V p A Y F W v 3 q V 2 8 Q s z T i C p m k x n Q 9 N 0 E / p x o F k 3 x S 6 a W G J 5 S N 6 Z B 3 L V U 0 4 s b P Z w d P y I l V B i S M t S 2 F Z K b + n s h p Z E w W B b Y z o j g y i 9 5 U / M / r p h h e + L l Q S Y p c s f m i M J U E Y z L 9 n g y E 5 g x l Z g l l W t h b C R t R T R n a j C o 2 B G / x 5 W X S O a t 7 j b r n 3 T R q z c s i j z I c w T G c g g f n 0 I R r a E E b G E T w D K / w 5 m j n x X l 3 P u a t J a e Y O Y Q / c D 5 / A N n 4 k O k = < / l a t e x i t > ŷt < l a t e x i t s h a 1 _ b a s e 6 4 = " m / Z z d j A C t P k 7 V V v 8 q M U M x H k n 5 f 0 = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 Y K 9 c c a v u D G S Z e D m p Q I 5 G r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D C / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 6 q 3 q 1 q u f d 1 C r 1 y z y P I h z B M Z y C B + d Q h 2 t o Q B M Y D O A Z X u H N k c 6 L 8 + 5 8 z F s L T j 5 z C H / g f P 4 A B q u O G w = = < / l a t e x i t > x t < l a t e x i t s h a 1 _ b a s e 6 4 = " u z C f v i + o t Y u 2 i h j b D 7 P a M p 0 J G 7 Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 o j 9 a s 2 t u z O Q Z e I V p A Y F W v 3 q V 2 8 Q s z T i C p m k x n Q 9 N 0 E / p x o F k 3 x S 6 a W G J 5 S N 6 Z B 3 L V U 0 4 s b P Z w d P y I l V B i S M t S 2 F Z K b + n s h p Z E w W B b Y z o j g y i 9 5 U / M / r p h h e + L l Q S Y p c s f m i M J U E Y z L 9 n g y E 5 g x l Z g l l W t h b C R t R T R n a j C o 2 B G / x 5 W X S O a t 7 j b r n 3 T R q z c s i j z I c w T G c g g f n 0 I R r a E E b G E T w D K / w 5 m j n x X l 3 P u a t J a e Y O Y Q / c D 5 / A N n 4 k O k = < / l a t e x i t > ŷt = f ( ; θ) Loss ( , ) θ (a) training < l a t e x i t s h a 1 _ b a s e 6 4 = " Y D I W 6 M i / j c 4 g n v 8 4 3 z X e d R T i l J U = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 m F 6 5 4 l b d G c g y 8 X J S g R y N X v m r 2 4 9 Z G n G F T F J j O p 6 b o J 9 R j Y J J P i l 1 U 8 M T y k Z 0 w D u W K h p x 4 2 e z U y f k x C p 9 E s b a l k I y U 3 9 P Z D Q y Z h w F t j O i O D S L 3 l T 8 z + u k G F 7 4 m V B J i l y x + a I w l Q R j M v 2 b 9 I X m D O X Y E s q 0 s L c S N q S a M r T p l G w I 3 u L L y 6 R 1 V v V q V c + 7 q V X q l 3 k e R T i C Y z g F D 8 6 h D t f Q g C Y w G M A z v M K b I 5 0 X 5 9 3 5 m L c W n H z m E P 7 A + f w B B S a O G g = = < / l a t e x i t > x s < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 p s C F / 1 O y j u Z 4 T w B u 2 1 v o N G 0 X a I = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 e K 1 h b a W D b b S b t 0 s w m 7 G y G E / g Q v H h T E q 3 / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d k o r q 2 v r G + X N y t b 2 z u 5 e d f / g Q c e p Y t h i s Y h V J 6 A a B Z f Y M t w I 7 C Q K a R Q I b A f j 6 6 n f f k K l e S z v T Z a g H 9 G h 5 C F n 1 F j p L n v U / W r N r b s z k G X i F a Q G B Z r 9 6 l d v E L M 0 Q m m Y o F p 3 P T c x f k 6 V 4 U z g p N J L N S a U j e k Q u 5 Z K G q H 2 8 9 m p E 3 J i l Q E J Y 2 V L G j J T f 0 / k N N I 6 i w L b G V E z 0 o v e V P z P 6 6 Y m v P R z L p P U o G T z R W E q i I n J 9 G 8 y 4 A q Z E Z k l l C l u b y V s R B V l x q Z T s S F 4 i y 8 v k / Z Z 3 T u v e 9 7 t e a 1 x V e R R h i M 4 h l P w 4 A I a c A N N a A G D I T z D K 7 w 5 w n l x 3 p 2 P e W v J K W Y O 4 Q + c z x 8 G r Y 4 b < / l a t e x i t > y s < l a t e x i t s h a 1 _ b a s e 6 4 = " X l B 1 P O i M M x M F B s x K q M 8 k 7 a n L b z Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 m j 6 1 Z p b d 2 c g y 8 Q r S A 0 K t P r V r 9 4 g Z m n E F T J J j e l 6 b o J + T j U K J v m k 0 k s N T y g b 0 y H v W q p o x I 2 f z w 6 e k B O r D E g Y a 1 s K y U z 9 P Z H T y J g s C m x n R H F k F r 2 p + J / X T T G 8 8 H O h k h S 5 Y v N F Y S o J x m T 6 P R k I z R n K z B L K t L C 3 E j a i m j K 0 G V V s C N 7 i y 8 u k c 1 b 3 G n X P u 2 n U m p d F H m U 4 g m M 4 B Q / O o Q n X 0 I I 2 M I j g G V 7 h z d H O i / P u f M x b S 0 4 x c w h / 4 H z + A N h z k O g = < / l a t e x i t > ŷs < l a t e x i t s h a 1 _ b a s e 6 4 = " X l B 1 P O i M M x M F B s x K q M 8 k 7 a n L b z Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 m j 6 1 Z p b d 2 c g y 8 Q r S A 0 K t P r V r 9 4 g Z m n E F T J J j e l 6 b o J + T j U K J v m k 0 k s N T y g b 0 y H v W q p o x I 2 f z w 6 e k B O r D E g Y a 1 s K y U z 9 P Z H T y J g s C m x n R H F k F r 2 p + J / X T T G 8 8 H O h k h S 5 Y v N F Y S o J x m T 6 P R k I z R n K z B L K t L C 3 E j a i m j K 0 G V V s C N 7 i y 8 u k c 1 b 3 G n X P u 2 n U m p d F H m U 4 g m M 4 B Q / O o Q n X 0 I I 2 M I j g G V 7 h z d H O i / P u f M x b S 0 4 x c w h / 4 H z + A N h z k O g = < / l a t e x i t > ŷs < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 p s C F / 1 O y j u Z 4 T w B u 2 1 v o N G 0 X a I = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 e K 1 h b a W D b b S b t 0 s w m 7 G y G E / g Q v H h T E q 3 / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d k o r q 2 v r G + X N y t b 2 z u 5 e d f / g Q c e p Y t h i s Y h V J 6 A a B Z f Y M t w I 7 C Q K a R Q I b A f j 6 6 n f f k K l e S z v T Z a g H 9 G h 5 C F n 1 F j p L n v U / W r N r b s z k G X i F a Q G B Z r 9 6 l d v E L M 0 Q m m Y o F p 3 P T c x f k 6 V 4 U z g p N J L N S a U j e k Q u 5 Z K G q H 2 8 9 m p E 3 J i l Q E J Y 2 V L G j J T f 0 / k N N I 6 i w L b G V E z 0 o v e V P z P 6 6 Y m v P R z L p P U o G T z R W E q i I n J 9 G 8 y 4 A q Z E Z k l l C l u b y V s R B V l x q Z T s S F 4 i y 8 v k / Z Z 3 T u v e 9 7 t e a 1 x V e R R h i M 4 h l P w 4 A I a c A N N a A G D I T z D K 7 w 5 w n l x 3 p 2 P e W v J K W Y O 4 Q + c z x 8 G r Y 4 b < / l a t e x i t > y s < l a t e x i t s h a 1 _ b a s e 6 4 = " X l B 1 P O i M M x M F B s x K q M 8 k 7 a n L b z Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 m j 6 1 Z p b d 2 c g y 8 Q r S A 0 K t P r V r 9 4 g Z m n E F T J J j e l 6 b o J + T j U K J v m k 0 k s N T y g b 0 y H v W q p o x I 2 f z w 6 e k B O r D E g Y a 1 s K y U z 9 P Z H T y J g s C m x n R H F k F r 2 p + J / X T T G 8 8 H O h k h S 5 Y v N F Y S o J x m T 6 P R k I z R n K z B L K t L C 3 E j a i m j K 0 G V V s C N 7 i y 8 u k c 1 b 3 G n X P u 2 n U m p d F H m U 4 g m M 4 B Q / O o Q n X 0 I I 2 M I j g G V 7 h z d H O i / P u f M x b S 0 4 x c w h / 4 H z + A N h z k O g = < / l a t e x i t > ŷs < l a t e x i t s h a 1 _ b a s e 6 4 = " Y D I W 6 M i / j c 4 g n v 8 4 3 z X e d R T i l J U = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 m F 6 5 4 l b d G c g y 8 X J S g R y N X v m r 2 4 9 Z G n G F T F J j O p 6 b o J 9 R j Y J J P i l 1 U 8 M T y k Z 0 w D u W K h p x 4 2 e z U y f k x C p 9 E s b a l k I y U 3 9 P Z D Q y Z h w F t j O i O D S L 3 l T 8 z + u k G F 7 4 m V B J i l y x + a I w l Q R j M v 2 b 9 I X m D O X Y E s q 0 s L c S N q S a M r T p l G w I 3 u L L y 6 R 1 V v V q V c + 7 q V X q l 3 k e R T i C Y z g F D 8 6 h D t f Q g C Y w G M A z v M K b I 5 0 X 5 9 3 5 m L c W n H z m E P 7 A + f w B B S a O G g = = < / l a t e x i t > x s (b) fully test-time adaptation θ = f ( ; θ+Δ) Entropy ( ) < l a t e x i t s h a _ b a s e = " u z C f v i + o t Y u i h j b D P a M p J G Y = " > A A A B X i c b V B N S N A E J U r q / q h L B b B U m k o M e i F V r F X a W D b b T b t s w m E y G E / g o v H h T E q / / G m / / G b Z u D t j Y e L w w y I J H C o O t + O W V b X j f J m Z W t Z e v u n w Z + J U M m s Y z f U A N l L x N g q U / D R n E a B J g f D X O c G x G r W w S k d q E Q o G E U r P f R G F P N s o j a s t u z O Q Z e I V p A Y F W v q V Q s z T i C p m k x n Q N E / p x o F k x S a W G J S N Z B L V U s b P Z w d P y I l V B i S M t S F Z K b + n s h p Z E w W B b Y z o j g y i U / M / r p h h e + L l Q S Y p c s f m i M J U E Y z L n g y E g x l Z g l l W t h b C R t R T R n a j C o B G / x W X S O a t j b r n T R q z c s i j z I c w T G c g g f n I R r a E E b G E T w D K / w m j n x X l P u a t J a e Y O Y Q / c D / A N n k O k = < / l a t e x i t > ŷt < l a t e x i t s h a 1 _ b a s e 6 4 = " m / Z z d j A C t P k 7 V V v 8 q M U M x H k n 5 f 0 = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 Y K 9 c c a v u D G S Z e D m p Q I 5 G r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D C / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 6 q 3 q 1 q u f d 1 C r 1 y z y P I h z B M Z y C B + d Q h 2 t o Q B M Y D O A Z X u H N k c 6 L 8 + 5 8 z F s L T j 5 z C H / g f P 4 A B q u O G w = = < / l a t e x i t > x t < l a t e x i t s h a 1 _ b a s e 6 4 = " u z C f v i + o t Y u 2 i h j b D 7 P a M p 0 J G 7 Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 o j 9 a s 2 t u z O Q Z e I V p A Y F W v 3 q V 2 8 Q s z T i C p m k x n Q 9 N 0 E / p x o F k 3 x S 6 a W G J 5 S N 6 Z B 3 L V U 0 4 s b P Z w d P y I l V B i S M t S 2 F Z K b + n s h p Z E w W B b Y z o j g y i 9 5 U / M / r p h h e + L l Q S Y p c s f m i M J U E Y z L 9 n g y E 5 g x l Z g l l W t h b C R t R T R n a j C o 2 B G / x 5 W X S O a t 7 j b r n 3 T R q z c s i j z I c w T G c g g f n 0 I R r a E E b G E T w D K / w 5 m j n x X l 3 P u a t J a e Y O Y Q / c D 5 / A N n 4 k O k = < / l a t e x i t > ŷt < l a t e x i t s h a 1 _ b a s e 6 4 = " m / Z z d j A C t P k 7 V V v 8 q M U M x H k n 5 f 0 = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r S 2 0 s W y 2 m 3 b p Z h N 2 J 2 I J / Q l e P C i I V / + Q N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f / g 3 s S p Z r z J Y h n r d k A N l 0 L x J g q U v J 1 o T q N A 8 l Y w u p r 6 r U e u j Y j V H Y 4 T 7 k d 0 o E Q o G E U r 3 T 4 9 Y K 9 c c a v u D G S Z e D m p Q I 5 G r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D C / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 6 q 3 q 1 q u f d 1 C r 1 y z y P I h z B M Z y C B + d Q h 2 t o Q B M Y D O A Z X u H N k c 6 L 8 + 5 8 z F s L T j 5 z C H / g f P 4 A B q u O G w = = < / l a t e x i t > x t < l a t e x i t s h a 1 _ b a s e 6 4 = " u z C f v i + o t Y u 2 i h j b D 7 P a M p 0 J G 7 Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o M e i F 4 8 V r F X a W D b b T b t 0 s w m 7 E y G E / g o v H h T E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W V 1 b X 1 j f J m Z W t 7 Z 3 e v u n 9 w Z + J U M 9 5 m s Y z 1 f U A N l 0 L x N g q U / D 7 R n E a B 5 J 1 g f D X 1 O 0 9 c G x G r W 8 w S 7 k d 0 q E Q o G E U r P f R G F P N s 8 o j 9 a s 2 t u z O Q Z e I V p A Y F W v 3 q V 2 8 Q s z T i C p m k x n Q 9 N 0 E / p x o F k 3 x S 6 a W G J 5 S N 6 Z B 3 L V U 0 4 s b P Z w d P y I l V B i S M t S 2 F Z K b + n s h p Z E w W B b Y z o j g y i 9 5 U / M / r p h h e + L l Q S Y p c s f m i M J U E Y z L 9 n g y E 5 g x l Z g l l W t h b C R t R T R n a j C o 2 B G / x 5 W X S O a t 7 j b r n 3 T R q z c s i j z I c w T G c g g f n 0 I R r a E E b G E T w D K / w 5 m j n x X l 3 P u a t J a e Y O Y Q / c D 5 / A N n 4 k O k = < /

3. METHOD: TEST ENTROPY MINIMIZATION VIA FEATURE MODULATION

We optimize the model during testing to minimize the entropy of its predictions by modulating its features. We call our method tent for test entropy. Tent requires a compatible model, an objective to minimize (Section 3.1), and parameters to optimize over (Section 3.2) to fully define the algorithm (Section Section 3.3). Figure 3 outlines our method for fully test-time adaptation. The model to be adapted must be trained for the supervised task, probabilistic, and differentiable. No supervision is provided during testing, so the model must already be trained. Measuring the entropy of predictions requires a distribution over predictions, so the model must be probabilistic. Gradients are required for fast iterative optimization, so the model must be differentiable. Typical deep networks for supervised learning satisfy these model requirements.

3.1. ENTROPY OBJECTIVE

Our test-time objective L(x t ) is to minimize the entropy H(ŷ) of model predictions ŷ = f θ (x t ). In particular, we measure the Shannon entropy (Shannon, 1948) , H(ŷ) =c p(ŷ c ) log p(ŷ c ) for the probability ŷc of class c. Note that optimizing a single prediction has a trivial solution: assign all probability to the most probable class. We prevent this by jointly optimizing batched predictions over parameters that are shared across the batch. Entropy is an unsupervised objective because it only depends on predictions and not annotations. However, as a measure of the predictions it is directly related to the supervised task and model. In contrast, proxy tasks for self-supervised learning are not directly related to the supervised task. Proxy tasks derive a self-supervised label y from the input x t without the task label y. Examples of these proxies include rotation prediction (Gidaris et al., 2018) , context prediction (Doersch et al., 2015) , and cross-channel auto-encoding (Zhang et al., 2017) . Too much progress on a proxy task could interfere with performance on the supervised task, and self-supervised adaptation methods have to limit or mix updates accordingly (Sun et al., 2019b; a) . As such, care is needed to choose a proxy compatible with the domain and task, to design the architecture for the proxy model, and to balance optimization between the task and proxy objectives. Our entropy objective does not need such efforts.

3.2. MODULATION PARAMETERS

The model parameters θ are a natural choice for test-time optimization, and these are the choice of prior work for train-time entropy minimization (Grandvalet & Bengio, 2005; Dhillon et al., 2020; Carlucci et al., 2017) . However, θ is the only representation of the training/source data in our setting, and altering θ could cause the model to diverge from its training. Furthermore, f can be nonlinear and θ can be high dimensional, making optimization too sensitive and inefficient for test-time usage. IN OUT + < l a t e x i t s h a 1 _ b a s e 6 4 = " F G M S n 1 o l A m s 3 U k J + m U M 6 l R B k J r w = " > A A A B 6 H i c b V D L S g N B E O y N r y S + o h 6 9 D A Z B E M K u K H o M e v G Y g H l g s o T Z S W 8 y Z v b B z K w Y l n y B F w + K 5 O o P + C / e / B q d J B 4 0 s a C h q O q m u 8 u L B V f a t j + t z N L y y u p a N p d f 3 9 j c 2 i 7 s 7 N Z V l E i G N R a J S D Y 9 q l D w E G u a a 4 H N W C I N P I E N b 3 A 1 8 R v 3 K B W P w h s 9 j N E N a C / k P m d U G 6 l 6 3 C k U 7 Z I 9 B V k k z g 8 p l n P x + P b 9 4 a v S K X y 0 u x F L A g w 1 E 1 S p l m P H 2 k 2 p 1 J w J H O X b i c K Y s g H t Y c v Q k A a o 3 H R 6 6 I g c G q V L / E i a C j W Z q r 8 n U h o o N Q w 8 0 x l Q 3 V f z 3 k T 8 z 2 s l 2 r 9 w U x 7 G i c a Q z R b 5 i S A 6 I p O v S Z d L Z F o M D a F M c n M r Y X 0 q K d M m m 7 w J w Z l / e Z H U T 0 r O a e m s a t K 4 h B m y s A 8 H c A Q O n E M Z r q E C N W C A 8 A j P 8 G L d W U / W q z W e t W a s n 5 k 9 + A P r 7 R u T U J C F < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 e H H 7 c r 2 5 v A 7 s 0  z J Y Y C D P Q N S a T 0 = " > A A A B 7 X i c b V D L S g N B E O y N r x h f U Y + K D A b B U 9 g V Q b 0 F v X h M w D w g W c L s Z D Y Z M 7 O z z M w K Y c n R u x c P i n j 1 F / I d 3 v w G f 8 L J 4 6 C J B Q 1 F V T f d X U H M m T a u + + V k l p Z X V t e y 6 7 m N z a 3 t n f z u X k 3 L R B F a J Z J L 1 Q i w p p x F t G q Y 4 b Q R K 4 p F w G k 9 6 N + M / f o D V Z r J 6 M 4 M Y u o L 3 I 1 Y y A g 2 V q q 1 u l g I 3 M 4 X 3 K I 7 A V o k 3 o w U S o e j y v f j 0 a j c z n + 2 O p I k g k a G c K x 1 0 3 N j 4 6 d Y G U Y 4 H e Z a i a Y x J n 3 c p U 1 L I y y o 9 t P J t U N 0 Y p U O C q W y F R k 0 U X 9 P p F h o P R C B 7 R T Y 9 P S 8 N x b / 8 5 q J C S / 9 l E V x Y m h E p o v C h C M j 0 f h 1 1 G G K E s M H l m C i m L 0 V k R 5 W m B g b U M 6 G 4 M 2 / v E h q Z 0 X v v H h V s W l c w x R Z O I B j O A U P L q A E t 1 C G K h C I R h 1 L w y A x s z W U W Z Z p Z E I Y v U = " > A A A B 7 X i c b V A 9 T w J B E J 3 D L 8 A v 1 N L m I j G x I n d G o y X R x h I T + Y h w I X v L H q z s 7 V 5 2 5 4 y E 8 B 9 s L D D G 1 t L / Y u e v 0 Q U s F H z J J C / v z W R m X p g I b t D z P p 3 M 0 v L K 6 l o 2 l 1 / f 2 N z a L u z s 1 o x K N W V V q o T S j Z A Y J r h k V e Q o W C P R j M S h Y P W w f z n x 6 / d M G 6 7 k D Q 4 S F s S k K 3 n E K U E r 1 V r I Y 2 b a h a J X 8 q Z w F 4 n / Q 4 r l X D K + f X / 4 q r Q L H 6 2 O o m n M J F J B j G n 6 X o L B k G j k V L B R v p U a l h D a J 1 3 W t F Q S u y Q Y T q 8 d u Y d W 6 b i R 0 r Y k u l P 1 9 8 S Q x M Y M 4 t B 2 x g R 7 Z t 6 b i P 9 5 z R S j 8 2 D I Z Z I i k 3 S 2 K E q F i 8 q d v O 5 2 u G Y U x c A S Q j W 3 t 7 q 0 R z S h a A P K 2 x D 8 + Z c X S e 2 4 5 J + U T q 9 t G h c w Q x b 2 4 Q C O w I c z K M M V V K A K F O 7 g E c b w 7 C j n y X l x X m e t G e d n Z g / + w H n 7 B t f 2 k w o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " i c K T v S n Y u W A w x C N 4 M X a V c P x J r U E = " > A A A B 7 H i c b V B N S 8 N A E N 3 4 W e t X 1 a M i w S J 4 K o k I 6 q 3 o x W M L p i 2 0 o W y 2 k 3 b p Z h N 2 J 0 I J P X r 2 4 k E R r / 6 G / g 5 v / g b / h N u P g 7 Y + G H i 8 N 8 P M v C A R X K P j f F l L y y u r a + u 5 j f z m 1 v b O b m F v v 6 b j V D H w W C x i 1 Q i o B s E l e M h R Q C N R Q K N A Q D 3 o 3 4 7 9 + g M o z W N 5 j 4 M E / I h 2 J Q 8 5 o 2 g k r x U A 0 n a h 6 J S c C e x F 4 s 5 I s X w 0 q n 4 / H o 8 q 7 c J n q x O z N A K J T F C t m 6 6 T o J 9 R h Z w J G O Z b q Y a E s j 7 t Q t N Q S S P Q f j Y 5 d m i f G q V j h 7 E y J d G e q L 8 n M h p p P Y g C 0 x l R 7 O l 5 b y z + 5 z V T D K / 8 j M s k R Z B s u i h M h Y 2 x P f 7 c 7 n A F D M X A E M o U N 7 f a r E c V Z W j y y Z s Q 3 P m X F 0 n t v O R e l K 6 r J o 0 b M k W O H J I T c k Z c c k n K 5 I 5 U i E c Y 4 e S J v J B X S 1 r P 1 p v 1 P m 1 d s m Y z B + Q P r I 8 f t L W S V w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 p S Y s G j i 0 D 9 B m 0 v Y 9 b y 0 e 4 3 + p Z o = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r A f U W 9 O I x A f O A Z A m z k 9 5 k z O z s M j M r h J A v 8 O J B E a 9 + k j f / x k m y B 0 0 s a C i q u u n u C h L B t X H d b y e 3 t r 6 x u Z X f L u z s 7 u 0 f F A + P m j p O F c M G i 0 W s 2 g H V K L j E h u F G Y D t R S K N A Y C s Y 3 c 3 8 1 h M q z W P 5 Y M Y J + h E d S B 5 y R o 2 V 6 h e 9 Y s k t u 3 O Q V e J l p A Q Z a r 3 i V 7 c f s z R C a Z i g W n c 8 N z H + h C r D m c B p o Z t q T C g b 0 Q F 2 L J U 0 Q u 1 P 5 o d O y Z l V + i S M l S 1 p y F z 9 P T G h k d b j K L C d E T V D v e z N x P + 8 T m r C a 3 / C Z Z I a l G y x K E w F M T G Z f U 3 6 X C E z Y m w J Z Y r b W w k b U k W Z s d k U b A j e 8 s u r p H l Z 9 i r l m 3 q l V L 3 N 4 s j D C Z z C O X h w B V W 4 h x o 0 g A H C M 7 z C m / P o v D j v z s e i N e d k M 8 f w B 8 7 n D 3 h t j L 0 = < / l a t e x i t > ÷ < l a t e x i t s h a 1 _ b a s e 6 4 = " K L N i Q j y d w C + U j s L t I a n o x 9 T + r q 8 = " >  A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o N 6 K X j x W s B / Q h r L Z b N q l u 5 u w u y m U 0 L / g x Y M i X v 1 D 3 v w 3 b t o c t P X B w O O 9 G W b m B Q l n 2 r j u t 1 P a 2 N z a 3 i n v V v b 2 D w 6 P q s c n H R 2 n i t A 2 i X m s e g H W l D N J 2 4 Y Z T n u J o l g E n H a D y X 3 u d 6 d U a R b L J z N L q C / w S L K I E W x y a R C y 6 b B a c + v u A m i d e A W p Q Y H W s P o 1 C G O S C i o N 4 V j r v u c m x s + w M o x w O q 8 M U k 0 T T C Z 4 R P u W S i y o 9 r P F r X N 0 Y Z U Q R b G y J Q 1 a q L 8 n M i y 0 n o n A d g p s x n r V y 8 X / v H 5 q o h s / Y z J J D Z V k u S h K O T I x y h 9 H I V O U G D 6 z B B P F 7 K 2 I j L H C x N h 4 K j Y E b / X l d d K 5 q n u N + u = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y + K D A b B U 9 g V Q b 0 F v X h M 0 D w g W c L s Z D Y Z M j O 7 z M w K Y c n R o x c P i n j 1 I / I d 3 v w G f 8 L J 4 6 C J B Q 1 F V T f d X U H M m T a u + + V k l p Z X V t e y 6 7 m N z a 3 t n f z u X k 1 H i S K 0 S i I e q U a A N e V M 0 q p h h t N G r C g W A a f 1 o H 8 z 9 u s P V G k W y X s z i K k v c F e y k B F s r H T X E k k 7 X 3 C L 7 g R o k X g z U i g d j i r f j 0 e j c j v / 2 e p E J B F U G s K x 1 k 3 P j Y 2 f Y m U Y 4 X S Y a y W a x p j 0 c Z c 2 L Z V Y U O 2 n k 1 O H 6 M Q q H R R G y p Y 0 a K L + n k i x 0 H o g A t s p s O n p e W 8 s / u c 1 E x N e + i m T c W K o J N N F Y c K R i d D 4 b 9 R h i h L D B 5 Z g o p i 9 F Z E e V p g Y m 0 7 O h u D N v 7 x I a m d F 7 7 x 4 V b F p X M M U W T i A Y z g F D y 6 g B L d Q h i o Q 6 M I T v M C r w 5 1 n 5 8 1 5 n 7 Z m n N n M P v y B 8 / E D T j 2 R i Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x n r z B 7 2 K z f q B M Q 1 7 s 1 z l s x Q W R + k = " > A A A B 7 X i c b Z D L S g M x F I b P 1 F s d b 1 W X b o J F c F V m R F A X Y t G N y w r 2 A u 1 Q M m m m j U 0 y Q 5 I R y t B 3 c O N C E T c u f B T 3 b s S 3 M b 0 s t P W H w M f / n 0 P O O W H C m T a e 9 + 3 k F h a X l l f y q + 7 a + s b m V m F 7 p 6 b j V B F a J T G P V S P E m n I m a d U w w 2 k j U R S L k N N 6 2 L 8 a 5 f V 7 q j S L 5 a 0 Z J D Q Q u C t Z x A g 2 1 q q 1 N O s K 3 C 4 U v Z I 3 F p o H f w r F i w / 3 P H n 7 c i v t w m e r E 5 N U U G k I x 1 o 3 f S 8 x Q Y a V Y Y T T o d t K N U 0 w 6 e M u b V q U W F A d Z O N p h + j A O h 0 U x c o + a d D Y / d 2 R Y a H 1 Q I S 2 U m D T 0 7 P Z y P w v a 6 Y m O g 0 y J p P U U E k m H 0 U p R y Z G o 9 V R h y l K D B 9 Y w E Q x O y s i P a w w M f Z A r j 2 C P 7 v y P N S O S v 5 x 6 e z G K 5 Y v Y a I 8 7 M E + H I I P J 1 C G a 6 h A F Q j c w Q M 8 w b M T O 4 / O i / M 6 K c 0 5 0 5 5 d + C P n / Q f / x p J s < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 M z b u k l i F 0 G 5 U 4 W y I N C T J m M N j A 8 = " > A A A C N n i c d V B N S 8 N A F N z 4 b f 2 K e v S y W A Q F L U l R 9 C i K 4 E W o Y F u h i W W z 3 d S l u 0 n Y f V F L 6 K / y 4 u / w 1 o s H R b z 6 E 9 y 0 P W j V g Y V h Z h 7 7 3 g S J 4 B o c p 2 9 N T E 5 N z 8 z O z R c W F p e W V + z V t Z q O U 0 V Z l c Y i V t c B 0 U z w i F W B g 2 D X i W J E B o L V g 8 5 p 7 t f v m N I 8 j q 6 g m z B f k n b E Q 0 4 J G K l p X 3 g y x Z 5 g I R C l 4 n v s S Q K 3 Q Z C d 9 R o P T f B 3 s a d 5 W 5 K b 8 j + h 7 X x + D 5 v s z k 3 Z b 9 p F p + Q M g H 8 T d 0 S K a I R K 0 3 7 2 W j F N J Y u A C q J 1 w 3 U S 8 D O i g F P B e g U v 1 S w h t E P a r G F o R C T T f j Y 4 u 4 e 3 j N L C Y a z M i w A P 1 O 8 T G Z F a d 2 V g k v n C e t z L x b + 8 R g r h k Z / x K E m B R X T 4 U Z g K D D H O O 8 Q t r h g F 0 T W E U M X N r p j e E k U o m K Y L p g R 3 / O T f p F Y u u Q c l 5 3 K / e H w y q m M O b a B N t I 1 c d I i O 0 T m q o C q i 6 B H 1 0 S t 6 s 5 6 s F + v d + h h G J 6 z R z D r 6 A e v z C 4 n R q 7 w = < / l a t e x i t > µ E[x t ], 2 E[(µ x t ) 2 ] < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 u C F L j s y h + @H /@ , + @H /@ normalization transformation Figure 4 : Tent modulates features during testing by estimating normalization statistics µ, σ and optimizing transformation parameters γ, β. Normalization and transformation apply channel-wise scales and shifts to the features. The statistics and parameters are updated on target data without use of source data. In practice, adapting γ, β is efficient because they make up <1% of model parameters. V l o t M r 4 3 R w 1 B d Z F k 0 s = " > A A A C Y X i c b Z F L S + R A F I U r G Z / t K + M s 3 R Q 2 g q C 0 i S g z y 2 b c u H T A V q H T N D f V N 2 1 h V R K q b m a m C f 0 n Z z c b N / 4 R K z H 4 v l B w + O 6 5 9 T i V F E p a C s P / n v 9 l Y X F p e W W 1 s 7 a + s b k V f N 2 + s n l p B A 5 E r n J z k 4 B F J T M c k C S F N 4 V B 0 I n C 6 + T u r O 5 f / 0 Z j Z Z 5 d 0 q z A k Y Z p J l M p g B w a B 3 / j K W g N P F a Y E h i T / + E t O e B x A Y Y k q F g D 3 U q q z u d H L 6 w x H f I 4 Q X o 7 3 Y B P h 5 9 R b R k H 3 b A X N s U / i q g V X d b W x T j 4 F 0 9 y U W r M S C i w d h i F B Y 2 q e k u h c N 6 J S 4 s F i D u Y 4 t D J D D T a U d U k N O d 7 j k x 4 m h u 3 M u I N f T 1 R g b Z 2 p h P n r O 9 r 3 / d q + F l v W F L 6 Y 1 T J r C g J M / F 0 U F o q T j m v 4 + Y T a V C Q m j k B w k h 3 V y 5 u w Y A g 9 y k d F 0 L 0 / s k f x d V x L z r t h b 9 O u v 2 f b R For stability and efficiency, we instead only update feature modulations that are linear (scales and shifts), and low-dimensional (channel-wise). Figure 4 shows the two steps of our modulations: normalization by statistics and transformation by parameters. Normalization centers and standardizes the input x into x = (xµ)/σ by its mean µ and standard deviation σ. Transformation turns x into the output x = γ x + β by affine parameters for scale γ and shift β. Note that the statistics µ, σ are estimated from the data while the parameters γ, β are optimized by the loss. For implementation, we simply repurpose the normalization layers of the source model. We update their normalization statistics and affine parameters for all layers and channels during testing.

3.3. ALGORITHM

Initialization The optimizer collects the affine transformation parameters {γ l,k , β l,k } for each normalization layer l and channel k in the source model. The remaining parameters θ \ {γ l,k , β l,k } are fixed. The normalization statistics {µ l,k , σ l,k } from the source data are discarded. Iteration Each step updates the normalization statistics and transformation parameters on a batch of data. The normalization statistics are estimated for each layer in turn, during the forward pass. The transformation parameters γ, β are updated by the gradient of the prediction entropy ∇H(ŷ), during the backward pass. Note that the transformation update follows the prediction for the current batch, and so it only affects the next batch (unless forward is repeated). This needs just one gradient per point of additional computation, so we use this scheme by default for efficiency. Termination For online adaptation, no termination is necessary, and iteration continues as long as there is test data. For offline adaptation, the model is first updated and then inference is repeated. Adaptation may of course continue by updating for multiple epochs.

4. EXPERIMENTS

We evaluate tent for corruption robustness on CIFAR-10/CIFAR-100 and ImageNet, and for domain adaptation on digit adaptation from SVHN to MNIST/MNIST-M/USPS. Our implementation is in PyTorch (Paszke et al., 2019) with the pycls library (Radosavovic et al., 2019) . Datasets We run on image classification datasets for corruption and domain adaptation conditions. For large-scale experiments we choose ImageNet (Russakovsky et al., 2015) , with 1,000 classes, a training set of 1.2 million, and a validation set of 50,000. For experiments at an accessible scale we choose CIFAR-10/CIFAR-100 (Krizhevsky, 2009) , with 10/100 classes, a training set of 50,000, and a test set of 10,000. For domain adaptation we choose SVHN (Netzer et al., 2011) as source and MNIST (LeCun et al., 1998) /MNIST-M (Ganin & Lempitsky, 2015) /USPS (Hull, 1994) as targets, with ten classes for the digits 0-9. SVHN has color images of house numbers from street views with a training set of 73,257 and test set of 26,032. MNIST/MNIST-M/USPS have handwritten digits with a training sets of 60,000/60,000/7,291 and test sets of 10,000/10,000/2,007. Models For corruption we use residual networks (He et al., 2016) with 26 layers (R-26) on CIFAR-10/100 and 50 layers (R-50) on ImageNet. For domain adaptation we use the R-26 architecture. For fair comparison, all methods in each experimental condition share the same architecture. Our networks are equipped with batch normalization (Ioffe & Szegedy, 2015) . For the source model without adaptation, the normalization statistics are estimated during training on the source data. For all test-time adaptation methods, we estimate these statistics during testing on the target data, as done in concurrent work on adaptation by normalization (Schneider et al., 2020; Nado et al., 2020) . Optimization We optimize the modulation parameters γ, β following the training hyperparameters for the source model with few changes. On ImageNet we optimize by SGD with momentum; on other datasets we optimize by Adam (Kingma & Ba, 2015) . We lower the batch size (BS) to reduce memory usage for inference, then lower the learning rate (LR) by the same factor to compensate (Goyal et al., 2017) . On ImageNet, we set BS = 64 and LR = 0.00025, and on other datasets we set BS = 128 and LR = 0.001.We control for ordering by shuffling and sharing the order across methods. Baselines We compare to domain adaptation, self-supervision, normalization, and pseudo-labeling: • source applies the trained classifier to the test data without adaptation, • adversarial domain adaptation (RG) reverses the gradients of a domain classifier on source and target to optimize for a domain-invariant representation (Ganin & Lempitsky, 2015) , • self-supervised domain adaptation (UDA-SS) jointly trains self-supervised rotation and position tasks on source and target to optimize for a shared representation (Sun et al., 2019a) , • test-time training (TTT) jointly trains for supervised and self-supervised tasks on source, then keeps training the self-supervised task on target during testing (Sun et al., 2019b) , • test-time normalization (BN) updates batch normalization statistics (Ioffe & Szegedy, 2015) on the target data during testing (Schneider et al., 2020; Nado et al., 2020) , • pseudo-labeling (PL) tunes a confidence threshold, assigns predictions over the threshold as labels, and then optimizes the model to these pseudo-labels before testing (Lee, 2013) . Only test-time normalization (BN), pseudo-labeling (PL), and tent (ours) are fully test-time adaptation methods. See Section 2 for an explanation and contrast with domain adaptation and test-time training.

4.1. ROBUSTNESS TO CORRUPTIONS

To benchmark robustness to corruption, we make use of common image corruptions (see Appendix A for examples). The CIFAR-10/100 and ImageNet datasets are turned into the CIFAR-10/100-C and ImageNet-C corruption benchmarks by duplicating their test/validation sets and applying 15 types of corruptions at five severity levels (Hendrycks & Dietterich, 2019) . Tent improves more with less data and computation. Table 2 reports errors averaged over corruption types at the severest level of corruption. On CIFAR-10/100-C we compare all methods, including those that require joint training across domains or losses, given the convenient sizes of these datasets. Adaptation is offline for fair comparison with offline baselines. Tent improves on the fully test-time adaptation baselines (BN, PL) but also the domain adaptation (RG, UDA-SS) and test-time training (TTT) methods that need several epochs of optimization on source and target. Tent consistently improves across corruption types. Figure 5 plots the error for each corruption type averaged over corruption levels on ImageNet-C. We compare the most efficient methods-source, normalization, and tent-given the large scale of the source data (>1 million images) needed by other methods and the 75 target combinations of corruption types and levels. Tent and BN adapt online to rival the efficiency of inference without adaptation. Tent reaches the least error for most corruption types without increasing the error on the original data. 

4.2. SOURCE-FREE DOMAIN ADAPTATION

We benchmark digit adaptation (Ganin & Lempitsky, 2015; Tzeng et al., 2015; 2017; Shu et al., 2018) for shifts from SVHN to MNIST/MNIST-M/USPS. Recall that unsupervised domain adaptation makes use the labeled source data and unlabeled target data, while our fully test-time adaptation setting denies use of source data. Adaptation is offline for fair comparison with offline baselines. Tent adapts to target without source. Tent scales to semantic segmentation. To show scalability to large models and inputs, we evaluate semantic segmentation (pixel-wise classification) on a domain shift from a simulated source to a real target. The source is GTA (Richter et al., 2017) , a video game in an urban environment, and the target is Cityscapes (Cordts et al., 2016) , an urban autonomous driving dataset. The model is HRNet-W18, a fully convolutional network (Shelhamer et al., 2017) with high-resolution architecture (Wang et al., 2020) . The target intersection-over-union scores (higher is better) are source 28.8%, BN 31.4%, and tent 35.8% with offline optimization by Adam. For adaptation to a single image, tent reaches 36.4% in 10 iterations with episodic optimization. See the appendix for a qualitative example (Appendix B). Tent scales to the VisDA-C challenge. To show adaptation on a more difficult benchmark, we evaluate on the VisDA-C challenge (Peng et al., 2017) . The task is object recognition for 12 classes where the source data is synthesized by rendering 3D models and the target data is collected from real scenes. The validation error for our source model (ResNet-50, pretrained on ImageNet) is 56.1%, while tent reaches 45.6%, and improves to 39.6% by updating all layers except for the final classifier as done by Liang et al. (2020) . Although offline source-free adaptation by model adaptation (Li et al., 2020) or SHOT (Liang et al., 2020) can reach lower error with more computation and tuning, tent can adapt online during testing.

4.3. ANALYSIS

Tent reduces entropy and error. Figure 6 verifies tent does indeed reduce the entropy and the task loss (softmax cross-entropy). We plot changes in entropy and loss on CIFAR-100-C for all 75 corruption type/level combinations. Both axes are normalized by the maximum entropy of a prediction (log 100) and clipped to ±1. Most points have lower entropy and error after adaptation. Tent needs feature modulation. We ablate the normalization and transformation steps of feature modulation. Not updating normalization increases errors, and can fail to improve over BN and PL. Not updating transformation parameters reduces the method to test-time normalization. Updating only the last layer of the model can improve but then degrades with further optimization. Updating the full model parameters θ never improves over the unadapted source model. Tent generalizes across target data. Adaptation could be limited to the points used for updates. We check that adaptation generalizes across points by adapting on target train and not target test. Test errors drop: CIFAR-100-C error goes from 37.3% to 34.2% and SVHN-to-MNIST error goes from 8.2% to 6.5%. (Train is larger than test; when subsampling to the same size errors differ by <0.1%.) Therefore the adapted modulation is not point specific but general. Tent modulation differs from normalization. Modulation normalizes and transforms features. We examine the combined effect. Figure 7 contrasts adapted features on corrupted data against reference features on uncorrupted data. We plot features from the source model, normalization, tent, and an oracle that optimizes on the target labels. Normalization makes features more like the reference, but tent does not. Instead, tent makes features more like the oracle. This suggests a different and task-specific effect. See the appendix for visualizations of more layers (Appendix C). Tent adapts alternative architectures. Tent is architecture agnostic in principle. To gauge its generality in practice, we evaluate new architectures based on self-attention (SAN) (Zhao et al., 2020) and equilibrium solving (MDEQ) (Bai et al., 2020) for corruption robustness on CIFAR-100-C. Table 4 shows that tent reduces error with the same settings as convolutional residual networks. 

5. RELATED WORK

We relate tent to existing adaptation, entropy minimization, and feature modulation methods. Train-Time Adaptation Domain adaptation jointly optimizes on source and target by cross-domain losses L(x s , x t ) to mitigate shift. These losses optimize feature alignment (Gretton et al., 2009; Sun et al., 2017) , adversarial invariance (Ganin & Lempitsky, 2015; Tzeng et al., 2017) , or shared proxy tasks (Sun et al., 2019a) . Transduction (Gammerman et al., 1998; Joachims, 1999; Zhou et al., 2004) jointly optimizes on train and test to better fit specific test instances. While effective in their settings, neither applies when joint use of source/train and target/test is denied. Tent adapts on target alone. Recent "source-free" methods (Li et al., 2020; Kundu et al., 2020; Liang et al., 2020) 2020) also alter training. Tent does not need generative modeling, nor does it alter training, and so it can deployed more generally to adapt online with much more computational efficiency. SHOT (Liang et al., 2020) adapts by information maximization (entropy minimization and diversity regularization), but differs in its other losses and its parameterization. These source-free methods optimize offline with multiple losses for multiple epochs, which requires more tuning and computation than tent, but may achieve more accuracy with more computation. Tent optimizes online with just one loss and an efficient parameterization of modulation to emphasize fully test-time adaptation during inference. We encourage examination of each of these works on the frontier of adaptation without source data. Chidlovskii et al. (2016) are the first to motivate adaptation without source data for legal, commercial, or technical concerns. They adapt predictions by applying denoising auto-encoders while we adapt models by entropy minimization. We share their motivations, but the methods and experiments differ. Test-Time Adaptation Tent adapts by test-time optimization and normalization to update the model. Test-time adaptation of predictions, through which harder and uncertain cases are adjusted based on easier and certain cases (Jain & Learned-Miller, 2011) , provides inspiration for certainty-based model adaptation schemes like our own. Test-time training (TTT) (Sun et al., 2019b ) also optimizes during testing, but differs in its loss and must alter training. TTT relies on a proxy task, such as recognizing rotations of an image, and so its loss depends on the choice of proxy. (Indeed, its authors caution that the proxy must be "both well-defined and non-trivial in the new domain"). TTT alters training to optimize this proxy loss on source before adapting to target. Tent adapts without proxy tasks and without altering training. Normalizing feature statistics is common for domain adaptation (Gretton et al., 2009; Sun et al., 2017) . Entropy Minimization Entropy minimization is a key regularizer for domain adaptation (Carlucci et al., 2017; Shu et al., 2018; Saito et al., 2019; Roy et al., 2019) , semi-supervised learning (Grandvalet & Bengio, 2005; Lee, 2013; Berthelot et al., 2019) , and few-shot learning (Dhillon et al., 2020) . Regularizing entropy penalizes decisions at high densities in the data distribution to improve accuracy for distinct classes (Grandvalet & Bengio, 2005) . These methods regularize entropy during training in concert with other supervised and unsupervised losses on additional data. Tent is the first to minimize entropy during testing, for adaptation to dataset shifts, without other losses or data. Entropic losses are common; our contribution is to exhibit entropy as the sole loss for fully test-time adaptation. Feature Modulation Modulation makes a model vary with its input. We optimize modulations that are simpler than the full model for stable and efficient adaptation. We modulate channel-wise affine transformations, for their effectiveness in tandem with normalization (Ioffe & Szegedy, 2015; Wu & He, 2018) , and for their flexibility in conditioning for different tasks (Perez et al., 2018) . These normalization and conditioning methods optimize the modulation during training by a supervised loss, but keep it fixed during testing. We optimize the modulation during testing by an unsupervised loss, so that it can adapt to different target data.

6. DISCUSSION

Tent reduces generalization error on shifted data by test-time entropy minimization. In minimizing entropy, the model adapts itself to feedback from its own predictions. This is truly self-supervised self-improvement. Self-supervision of this sort is totally defined by the supervised task, unlike proxy tasks designed to extract more supervision from the data, and yet it remarkably still reduces error. Nevertheless, errors due to corruption and other shifts remain, and therefore more adaptation is needed. Next steps should pursue test-time adaptation on more and harder types of shift, over more general parameters, and by more effective and efficient losses. Shifts Tent reduces error for a variety of shifts including image corruptions, simple changes in appearance for digits, and simulation-to-real discrepancies. These shifts are popular as standardized benchmarks, but other real-world shifts exist. For instance, the CIFAR 10.1 and ImageNetV2 test sets (Recht et al., 2018; 2019) , made by reproducing the dataset collection procedures, entail natural but unknown shifts. Although error is higher on both sets, indicating the presence of shift, tent does not improve generalization. Adversarial shifts (Szegedy et al., 2014 ) also threaten real-world usage, and attackers keep adapting to defenses. While adversarial training (Madry et al., 2018) makes a difference, test-time adaptation could help counter such test-time attacks. Parameters Tent modulates the model by normalization and transformation, but much of the model stays fixed. Test-time adaptation could update more of the model, but the issue is to identify parameters that are both expressive and reliable, and this may interact with the choice of loss. TTT adapts multiple layers of features shared by supervised and self-supervised models and SHOT adapts all but the last layer(s) of the model. These choices depend on the model architecture, the loss, and tuning. For tent modulation is reliable, but the larger shift on VisDA is better addressed by the SHOT parameterization. Jointly adapting the input could be a more general alternative. If a model can adapt itself on target, then perhaps its input gradients might optimize spatial transformations or image translations to reduce shift without source data. Losses Tent minimizes entropy. For more adaptation, is there an effective loss for general but episodic test-time optimization? Entropy is general across tasks but limited in scope. It needs batches for optimization, and cannot update episodically on one point at a time. TTT can do so, but only with the right proxy task. For less computation, is there an efficient loss for more local optimization? Tent and TTT both require full (re-)computation of the model for updates because they depend on its predictions. If the loss were instead defined on the representation, then updates would require less forward and backward computation. Returning to entropy specifically, this loss may interact with calibration (Guo et al., 2017) , as better uncertainty estimation could drive better adaptation. We hope that the fully test-time adaptation setting can promote new methods for equipping a model to adapt itself, just as tent yields a new model with every update. Figure 10 : Adapted features on CIFAR-100-C with Gaussian noise (front) and reference features without corruption (back). Corruption shifts the source features from the reference. BN shifts the features back to be more like the reference. Tent shifts features to be less like the reference, and more like an oracle that optimizes on target labels.



We exclude DIRT-T from our experiments because of incomparable differences in architecture and model selection. DIRT-T tunes with labeled target data, but we do not. Please refer toShu et al. (2018) for more detail.



Figure 1: Predictions with lower entropy have lower error rates on corrupted CIFAR-100-C. Certainty can serve as supervision during testing.

Figure 3: Method overview. Tent does not alter training (a), but minimizes the entropy of predictions during testing (b) over a constrained modulation ∆, given the parameters θ and target data x t .

4 h y d 4 g V d H O s / O m / M + b c 0 4 s 5 l 9 + A P n 4 w d 3 y p L I < / l a t e x i t > ⇥ < l a t e x i t s h a 1 _ b a s e 6 4 = " r 9 C o

1 j o 9 a 8 K + I o w x m c w y V 4 c A 1 N e I A W t I H A G J 7 h F d 4 c 4 b w 4 7 8 7 H s r X k F D O n 8 A f O 5 w 8 a W Y 5 N < / l a t e x i t > µ < l a t e x i t s h a 1 _ b a s e 6 4 = " l b H w l 5 b k U b e n c + Y o + u 8 y N z p x s y 0

w r b I f t s n 0 W s e + s z 8 7 Z B R s w w e 6 9 B W / D 2 / Q e / F U / 8 L e f r L 7 X z n x j b 8 r f e Q S p H 7 d Z < / l a t e x i t >

Figure 5: Corruption benchmark on ImageNet-C: error for each type averaged over severity levels. Tent improves on the prior state-of-the-art, adversarial noise training (Rusak et al., 2020), by fully test-time adaptation without altering training.

Figure 6: Tent reduces the entropy and loss. We plot changes in entropy ∆H and loss ∆L for all of CIFAR-100-C. Change in entropy rank-correlates with change in loss: note the dark diagonal and the rank correlation coefficient of 0.22.

Figure 7: Adapted features on CIFAR-100-C with Gaussian noise (front) and reference features without corruption (back). Corruption shifts features away from the reference, but BN reduces the shifts. Tent instead shifts features more, and closer to an oracle that optimizes on target labels.

also adapt without source data. Li et al. (2020); Kundu et al. (2020) rely on generative modeling and optimize multiple models with multiple losses. Kundu et al. (2020); Liang et al. (

For batch normalization Li et al. (2017); Carlucci et al. (2017) separate source and target statistics during training. Schneider et al. (2020); Nado et al. (2020) estimate target statistics during testing to improve generalization. Tent builds on test-time normalization to further reduce generalization error.

Figure 9: Adaptation for semantic segmentation with simulation-to-real shift from GTA Richter et al. (2017) to Cityscapes Cordts et al. (2016). Tent only uses the target data, and optimizes over a single image as a dataset of pixel-wise predictions. This episodic optimization in effect fits a custom model to each image of the target domain. In only 10 iterations our method suppresses noise (see the completion of the street segment, in purple) and recovers missing classes (see the motorcycle and rider, center).

Adaptation settings differ by their data and therefore losses during training and testing. Of the source s and target t data x and labels y, our fully test-time setting only needs the target data x t .

Corruption benchmark on CIFAR-10-C and CIFAR-100-C for the highest severity. Tent has least error, with less optimization than domain adaptation (RG, UDA-SS) and test-time training (TTT), and improves on test-time norm (BN).

Digit domain adaptation from SVHN to MNIST/MNIST-M/USPS. Source-free adaptation is not only feasible, but more efficient. Tent always improves on normalization (BN), and in 2/3 cases achieves less error than domain adaptation (RG, UDA-SS) without joint training on source & target.Tent reaches a new state-of-the-art without altering training. The state-of-the-art methods for robustness extend training with adversarial noise (ANT)(Rusak et al., 2020) for 50.2% error or mixtures of data augmentations (AugMix)(Hendrycks et al., 2020) for 51.7% error. Combined with stylization from external images (SIN)(Geirhos et al., 2019), ANT+SIN reaches 47.4%. Tent reaches a new state-of-the-art of 44.0% by online adaptation and 42.3% by offline adaptation. It improves on ANT for all types except noise, on which ANT is trained. This requires just one gradient per test point, without more optimization on the training set (ANT, AugMix) or use of external images (SIN). Among fully test-time adaptation methods, tent reduces the error beyond test-time normalization for 18% relative improvement. In concurrent work, Schneider et al. (2020) report 49.3% error for test-time normalization, for which tent still gives 14% relative improvement.

Table3reports the target errors for domain adaptation and fully test-time adaptation methods. Test-time normalization (BN) marginally improves, while adversarial domain adaptation (RG) and self-supervised domain adaptation (UDA-SS) improve more by joint training on source and target. Tent always has lower error than the source model and BN, and it achieves the lowest error in 2/3 cases, even in just one epoch and without use of source data.While encouraging for fully test-time adaptation, unsupervised domain adaptation remains necessary for the highest accuracy and harder shifts. For SVHN-to-MNIST, DIRT-T(Shu et al., 2018) achieves a remarkable 0.6% error 2 . For MNIST-to-SVHN, a difficult shift with source-only error of 71.3%, DIRT-T reaches 45.5% and UDA-SS reaches 38.7%. Tent fails on this shift and increases error to 79.8%. In this case success presently requires joint optimization over source and target.Tent needs less computation, but still improves with more. Tent adapts efficiently on target data alone with just one gradient per point. RG & UDA-SS also use the source data (SVHN train), which is ∼7× the size of the target data (MNIST test), and optimize for 10 epochs. Tent adapts with ∼80× less computation. With more updates, tent reaches 8.2% error in 10 epochs and 6.5% in 100 epochs. With online updates, tent reaches 12.5% error in one epoch and 8.4% error in 10 epochs.

Tent adapts alternative architectures on CIFAR-100-C without tuning. Results are error (%).

ACKNOWLEDGMENTS

We thank Eric Tzeng for discussions on domain adaptation, Bill Freeman for comments on the experiments, Yu Sun for consultations on test-time training, and Kelsey Allen for feedback on the exposition. We thank the anonymous reviewers of ICLR 2021 for their feedback, which certainly improved the latest adaptation of the paper.

APPENDIX

This supplement summarizes the image corruptions used in our experiments, highlights a qualitative example of instance-wise adaptation for semantic segmentation, and visualizes feature shifts across more layers.

A ROBUSTNESS TO CORRUPTIONS

In Section 4.1 we evaluate methods on a common image corruptions benchmark. Table 2 reports errors on the most severe level of corruption, level 5, and Figure 5 reports errors for each corruption type averaged across each of the levels 1-5. We summarize these corruptions types by example in Figure 8 . 

B SOURCE-FREE ADAPTATION FOR SEMANTIC SEGMENTATION

Figure 9 shows a qualitative result on source-free adaptation for semantic segmentation (pixel-wise classification) with simulation-to-real (sim-to-real) shift.For this sim-to-real condition, the source data is simulated while the target data is real. Our source data is GTA Richter et al. ( 2017), a visually-sophisticated video game set in an urban environment, and our target data is Cityscapes Cordts et al. (2016) , an urban autonomous driving dataset. The supervised model is HRnet-W18, a fully convolutional network Shelhamer et al. (2017) in the high-resolution network family Wang et al. (2020) .For this qualitative example, we run tent on a single image for multiple iterations, because an image is in effect a batch of pixels. This demonstrates adaptation to a target instance, without any further access to the target domain through usage of multiple images from the target distribution.

