REDUCING IMPLICIT BIAS IN LATENT DOMAIN LEARNING Anonymous authors Paper under double-blind review

Abstract

A fundamental shortcoming of deep neural networks is their specialization to a single task and domain. While recent techniques in multi-domain learning enable the learning of more domain-agnostic features, their success relies firmly on the presence of domain labels, typically requiring manual annotation and careful curation of datasets. Here we focus on latent domain learning, a highly realistic, yet less explored scenario: learning from data from different domains, without access to domain annotations. This is a particularly challenging problem, since standard models exhibit an implicit bias toward learning only the large domains in data, while disregarding smaller ones. To address this issue, we propose dynamic residual adapters that adaptively account for latent domains, and weighted domain transfer -a novel augmentation strategy designed specifically for this setting. Our techniques are evaluated on image classification tasks containing multiple unannotated domains, and we demonstrate they enhance performance, in particular, on the smallest of these.

1. INTRODUCTION

While the performance of deep learning has surpassed that of humans in a range of tasks (He et al., 2016; Silver et al., 2017) , machine learning models perform best when the learning objective is narrowly defined. Practical realities however often require the learning of joint models over semantically different examples. In this case, best performances are usually obtained by fitting a collection of models, with each model solving an individual subproblem. This is somewhat disappointing seeing how humans and other biological systems are capable of flexibly adapting to a large number of scenarios (Kaiser et al., 2017) . Past solutions that address this problem tend to fall into some category of multi-domain learning (Nam & Han, 2016; Bulat et al., 2019; Schoenauer-Sebag et al., 2019) . In this setting, models are learned over diverse datasets each associated with an underlying distribution. Multi-domain learning however relies firmly on the availability of domain annotations, for example to control domainspecific architectural elements (Rebuffi et al., 2017; 2018; Liu et al., 2019; Guo et al., 2019) . Reliance on domain annotations is not limited to the multi-domain scenario, their presence is also required in domain adaptation where models transfer between related tasks (Ganin et al., 2016; Tzeng et al., 2017; Hoffman et al., 2018; Xu et al., 2018; Peng et al., 2019a; Sun et al., 2019b ), continual learning (Kirkpatrick et al., 2017; Lopez-Paz & Ranzato, 2017; Riemer et al., 2019) , meta learning over multiple tasks (Finn et al., 2017; Li et al., 2018a) , or the generalization to previously unseen domains (Li et al., 2018b; 2019b; a; Gulrajani & Lopez-Paz, 2020) . The above approaches have established the notion that the presence of domain labels improves generalization. In the real world however, these can often be difficult or expensive to obtain. Consider images that were scraped from the web. Most image datasets such as Pascal VOC (Li et al., 2018a) or ImageNet (Deng et al., 2009) already rely on expensive manual filtering and the removal of different looking images. Existing multi-domain approaches require that the scraped images are further annotated for the mixture of content types they will contain, such as real world images or studio photos (Saenko et al., 2010) , clipart or sketches (Li et al., 2017) . This can be an expensive process, moreover it is not clear which variations (indoor/outdoor, urban/rural, etc.) should be grouped. 5.2% 15.6% 9.0% 2.8% 12.2% 10.0% 40.6% 1.6% 3.0%

ResNet26 Ours

Figure 1 : Changes in performance without access to domain labels, relative to the multi-domain baseline of learning an individual baseline on each dataset (sizes in %). For ResNet26 (•), performance losses on the smallest domains (yellow) are significant. Our proposed solutions (•) recover a large portion of the gap to full domain-supervison. Best viewed in color. Here we consider the task of learning in absence of domain labels, or latent domain learning. This scenario encompasses any task where we have inadequate resources to assign domain labels to all data, but have reason to believe that such a partitioning of the data would in principle make sense. And as our experiments show, even when domain labels already exist, there is no guarantee that these are optimal for a given model. In this paper, we therefore argue and demonstrate that learning implicit domain assignments, end-to-end alongside the rest of the network, is the superior option. In Figure 1 we display accuracy for latent domain learning by jointly learning a single model over datasets from the Visual Decathlon benchmark that contains images from distinct domains with mutually exclusive classes. Per-domain performance is measured relative to that of 9× models learned individually on each domain -a common baseline in the multi-domain setting (Rebuffi et al., 2018; Liu et al., 2019) . This highlights the central challenge of latent domain learning: a significant loss of performance for the joint model (•) on small domains. While the performance drop is not significant on large domains, relative accuracy is reduced by 15-20% on Aircraft and Ucf101, for example. Latent domain learning therefore requires customized solutions, because standard models have an implicit bias through which they disregard the smallest domains in data. The mechanisms we propose throughout this paper (shown in •) help overcome this: dynamic residual adapters (Section 3.3), which we couple with weighted domain transfer (Section 3.4) obtain robust performance on small domains, without trading performance on larger ones. Our proposed solutions can be incorporated and trained seamlessly with existing architectures, and are able to surpass the performance of domain-supervised approaches that have access to human-annotated domain labels (Section 4.2). Moreover, qualitative analysis demonstrates that they partition latent domains in a highly intuitive way (Figures 2 and 3 ).

2. RELATED WORK

Multi-domain learning relates most closely to our paper. The state-of-the-art introduces small convolutional corrections in residual networks to account for individual domains (Rebuffi et al., 2017; 2018) . Stickland & Murray (2019) 



extend this approach to obtain efficient multi-task models for related language tasks. Other recent work makes use of task-specific attention mechanisms(Liu et al.,  2019), attempts to scale task-specific losses(Kendall et al., 2018), or addresses tasks at the level of gradients(Chen et al., 2017). Crucially, these approaches all rely firmly on domain labels. A lack of domain labels has previously attracted interest in domain adaptation, Hoffman et al. (2012) use hierarchical clustering to uncover latent domains, other work investigates the use of kernel-based clustering (Gong et al., 2013), via exemplar SVMs (Xu et al., 2014), or mutual information (Xiong et al., 2014). Different from these works, we propose tackling latent domains in an end-to-end fashion, which no longer requires a clustering ansatz. In another line of work Mancini et al. (2018) estimate batch statistics of domain adaptation layers with Gaussian mixture models using only few domain labels. Peng et al. (2019b) study the shift from some source domain to a target distribution

