DISTRIBUTIONALLY ROBUST LEARNING FOR UNSU-PERVISED DOMAIN ADAPTATION

Abstract

We propose a distributionally robust learning (DRL) method for unsupervised domain adaptation (UDA) that scales to modern computer-vision benchmarks. DRL can be naturally formulated as a competitive two-player game between a predictor and an adversary that is allowed to corrupt the labels, subject to certain constraints, and reduces to incorporating a density ratio between the source and target domains (under the standard log loss). This formulation motivates the use of two neural networks that are jointly trained -a discriminative network between the source and target domains for density-ratio estimation, in addition to the standard classification network. The use of a density ratio in DRL prevents the model from being overconfident on target inputs far away from the source domain. Thus, DRL provides conservative confidence estimation in the target domain, even when the target labels are not available. This conservatism motivates the use of DRL in self-training for sample selection, and we term the approach distributionally robust self-training (DRST). In our experiments, DRST generates more calibrated probabilities and achieves state-of-the-art self-training accuracy on benchmark datasets. We demonstrate that DRST captures shape features more effectively, and reduces the extent of distributional shift during self-training.

1. INTRODUCTION

In many real-world applications, the target domain for deployment of a machine-learning (ML) model can significantly differ from the source training domain. Furthermore, labels in the target domain are often more expensive to obtain compared to the source domain. An example is synthetic training where the source domain has complete supervision while the target domain of real images may not be labeled. Unsupervised domain adaptation (UDA) aims to maximize performance on the target domain, and it utilizes both the labeled source data and the unlabeled target data. A popular framework for UDA involves obtaining proxy labels in the target domain through selftraining (Zou et al., 2019) . Self-training starts with a classifier trained on the labeled source data. It then iteratively obtains pseudo-labels in the target domain using predictions from the current ML model. However, this process is brittle, since wrong pseudo-labels in the target domain can lead to catastrophic failure in early iterations (Kumar et al., 2020) . To avoid this, self training needs to be conservative and select only pseudo-labels with sufficiently high confidence level. This entails accurate knowledge of the confidence levels. Accurate confidence estimation is a challenge for current deep learning models. Deep learning models tend to produce over-confident and misleading probabilities, even when predicting on the same distribution (Guo et al., 2017a; Gal & Ghahramani, 2016) . Some attempts to remedy this issue include temperature scaling (Platt et al., 1999) , Monte-Carlo sampling (Gal & Ghahramani, 2016) and Bayesian inference (Blundell et al., 2015; Riquelme et al., 2018 ). However, Snoek et al. (2019) has shown that the uncertainty estimation from these models cannot be trusted under domain shifts. In this paper, we instead consider the distributionally robust learning (DRL) framework (Liu & Ziebart, 2014; 2017) which provides a principled approach for uncertainty quantification under domain shifts. DRL can be formulated as a two-player adversarial risk minimization game, as depicted in Figure 1(a) . Recall that the standard framework of empirical risk minimization (ERM) directly learns a predictor P (Y |X) from training data. In contrast, DRL also includes an adversary Q(Y |X)

Discriminative Network

Classification Network Binary Classifier ✓ • (x, y) < l a t e x i t s h a 1 _ b a s e 6 4 = " M i s p o Z P j B O U a i B u D 9 Q J g X X I X w R Q = " > A A A B / 3 i c b V D L S s N A F J 3 U V 6 2 v q O D G z W A R K k h J R N B l 0 Y 3 L C v Y B T S i T y a Q Z O n k w c y O G 2 o W / 4 s a F I m 7 9 D X f + j d M 2 C 2 0 9 c O F w z r 3 c e 4 + X C q 7 A s r 6 N 0 t L y y u p a e b 2 y s b m 1 v W P u 7 r V V k k n K W j Q R i e x 6 R D H B Y 9 Y C D o J 1 U 8 l I 5 A n W 8 Y b X E 7 9 z z 6 T i S X w H e c r c i A x i H n B K Q E t 9 8 8 C B k A H B D v U T w E 4 a 8 t r D a X 7 S N 6 t W 3 Z o C L x K 7 I F V U o N k 3 v x w / o V n E Y q C C K N W z r R T c E Z H A q W D j i p M p l h I 6 J A P W 0 z Q m E V P u a H r / G B 9 r x c d B I n X F g K f q 7 4 k R i Z T K I 0 9 3 R g R C N e 9 N x P + 8 X g b B p T v i c Z o B i + l s U Z A J D A m e h I F 9 L h k F k W t C q O T 6 V k x D I g k F H V l F h 2 D P v 7 x I 2 m d 1 2 6 r b t + f V x l U R R x k d o i N U Q z a 6 Q A 1 0 g 5 q o h S h 6 R M / o F b 0 Z T 8 a L 8 W 5 8 z F p L R j G z j / 7 A + P w B y z C V S g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M i s p o Z P j B O U a i B u D 9 Q J g X X I X w R Q It is an instantiation of (a) using neural networks. The expected target loss cannot be evaluated due to lack of target labels in the UDA setting. Instead, we need to compute the gradients directly for training the networks. We present the details in Sec. 2.2.that is allowed to perturb the labels, subject to certain feature-matching constraints to ensure datacompatibility. Formally, the minimax game for DRL is:where the adversary Q(Y |X) is constrained to match the evaluation of a set of features Φ(x, y) to that of the source distribution (see Section 2 for details). Note that the loss in ( 1) is evaluated under the target input distribution P t (X), and the predictor does not have direct access to the source data {X s , Y s }. Instead, the predictor optimizes the target loss by playing a game with an adversary constrained by source data.A special case of UDA is the covariate shift setting, where the label-generating distribution P (Y |X) is assumed to be the same in both source and target domains. Under this assumption, with log-loss and a linear predictor parameterized by θ and features Φ(x, y), (1) reduces to:Intuitively, the density ratio P s (x)/P t (x) prevents the model from being overconfident on target inputs far away from the source domain. Thus, the DRL framework is a principled approach for conservative confidence estimation.Previous works have shown that DRL is highly effective in safety-critical applications such as safe exploration in control systems (Liu et al., 2020) and safe trajectory planning (Nakka et al., 2020) . However, these works only consider estimating the density ratio in low dimensions (e.g. control inputs) using standard kernel density estimator (KDE) and extending it to high-dimensional inputs such as images remains an open challenge. Moreover, it is not clear if the covariate-shift assumption holds for common high-dimensional settings such as images -which we investigate in this paper.In this paper, we propose a novel deep-learning method based on the DRL framework for accurate uncertainties that scales to modern domain-adaptation tasks in computer vision.

Summary of Contributions:

1. We develop differentiable density-ratio estimation as part of the DRL framework to enable efficient end-to-end training. See Figure 1 (b). 2. We employ DRL's confidence estimation in the self-training framework for domain adaptation and term it as distributionally robust self-training (DRST). See Figure 2 . 3. We further combine it with automated synthetic to real generalization (ASG) framework of (Chen et al., 2020b) to improve generalization in the real target domain when the source domain consists of synthetic images. 4. We demonstrate that DRST generates more calibrated probabilities. DRST-ASG achieves competitive accuracy on the VisDA2017 dataset (Peng et al., 2017) with 1% improvement over the baseline class-regularized self-training (CRST) using the standard soft-max confidence measure.

