f -DOMAIN-ADVERSARIAL LEARNING: THEORY AND ALGORITHMS FOR UNSUPERVISED DOMAIN ADAPTA-TION WITH NEURAL NETWORKS

Abstract

The problem of unsupervised domain adaptation arises in a variety of practical applications where the distribution of the training samples differs from those used at test time. The existing theory of domain adaptation derived generalization bounds based on divergence measures that are hard to optimize in practice. This has led to a large disconnect between theory and state-of-the-art methods. In this paper, we propose a novel domain-adversarial framework that introduces new theory for domain adaptation and leads to practical learning algorithms with neural networks. In particular, we derive a novel generalization bound that utilizes a new measure of discrepancy between distributions based on a variational characterization of f -divergences. We show that our bound recovers the theoretical results from Ben-David et al. (2010a) as a special case with a particular choice of divergence, and also supports divergences typically used in practice. We derive a general algorithm for domain-adversarial learning for the complete family of fdivergences. We provide empirical results for several f -divergences and show that some, not considered previously in domain-adversarial learning, achieve state-ofthe-art results in practice. We provide empirical insights into how choosing a particular divergence affects the transfer performance on real-world datasets. By further recognizing the optimization problem as a Stackelberg game, we utilize the latest optimizers from the game optimization literature, achieving additional performance boosts in our training algorithm. We show that our f -domain adversarial framework achieves state-of-the-art results on the challenging Office-31 and Office-Home datasets without extra hyperparameters.

1. INTRODUCTION

The ability to learn new concepts and skills from general-purpose data and transfer them to similar scenarios is critical in many modern applications. For example, it is often the case that the learner has access to only a small (unlabeled) subset of data on its domain of interest, but has access to a larger labeled dataset (for the same task) in a domain that is similar to the target domain. If the gap between these two domains is not considerable, we may expect to train a model by using the labeled and unlabeled data, and to generalize well to the target dataset. This scenario is called unsupervised domain adaptation, and it is the focus of this paper. 



Figure 1: Domain Adaptation. A learner is trained on abundant labeled data and is expected to perform well in the target domain (marked as +). Decision boundaries correspond to a 2-layers neural net trained using f -DAL.

The paramount importance of domain adaptation (DA) has led to remarkable advances in the field. From a theoretical point of view, the seminal works of Ben-David et al. (2007; 2010a;b); Mansour et al. (2009) provided generalization bounds for unsupervised DA based on discrepancy measures that are a reduction of the Total Variation (TV). More recently, Zhang et al. (2019) took one step further and proposed the Margin Disparity Discrepancy (MDD) with the aim of closing the gap between theory and algorithms. Their notion of discrepancy is tailored to margin losses and builds on the observation of only taking a single supremum over the class set

