DOMAIN-FREE ADVERSARIAL SPLITTING FOR DOMAIN GENERALIZATION

Abstract

Domain generalization is an approach that utilizes several source domains to train the learner to be generalizable to unseen target domain to tackle domain shift issue. It has drawn much attention in machine learning community. This paper aims to learn to generalize well to unseen target domain without relying on the knowledge of the number of source domains and domain labels. We unify adversarial training and meta-learning in a novel proposed Domain-Free Adversarial Splitting (DFAS) framework. In this framework, we model the domain generalization as a learning problem that enforces the learner to be able to generalize well for any train/val subsets splitting of the training dataset. To achieve this goal, we propose a min-max optimization problem which can be solved by an iterative adversarial training process. In each iteration, it adversarially splits the training dataset into train/val subsets to maximize domain shift between them using current learner, and then updates the learner on this splitting to be able to generalize well from train-subset to val-subset using meta-learning approach. Extensive experiments on three benchmark datasets under three different settings on the source and target domains show that our method achieves state-of-the-art results and confirm the effectiveness of our method by ablation study. We also derive a generalization error bound for theoretical understanding of our method.

1. INTRODUCTION

Deep learning approach has achieved great success in image recognition (He et al., 2016; Krizhevsky et al., 2012; Simonyan & Zisserman, 2014) . However, deep learning methods mostly succeed in the case that the training and test data are sampled from the same distribution (i.e., the i.i.d. assumption). However, this assumption is often violated in real-world applications since the equipments/environments that generate data are often different in training and test datasets. When there exists distribution difference (domain shift (Torralba & Efros, 2011) ) between training and test datasets, the performance of trained model, i.e., learner, will significantly degrade. To tackle the domain shift issue, domain adaptation approach (Pan & Yang, 2010; Daume III & Marcu, 2006; Huang et al., 2007) learns a transferable learner from source domain to target domain. Domain adaptation methods align distributions of different domains either in feature space (Long et al., 2015; Ganin et al., 2016) or in raw pixel space (Hoffman et al., 2018) , which relies on unlabeled data from target domain at training time. However, in many applications, it is unrealistic to access the unlabeled target data, therefore this prevents us to use domain adaptation approach in this setting, and motivates the research on the learning problem of domain generalization. Domain generalization (DG) approach (Blanchard et al., 2011; Muandet et al., 2013) commonly uses several source domains to train a learner that can generalize to an unseen target domain. The underlying assumption is that there exists a latent domain invariant feature space across source domains and unseen target domain. To learn the domain invariant features, (Muandet et al., 2013; Ghifary et al., 2015; Li et al., 2018b) explicitly align distributions of different source domains in feature space. (Balaji et al., 2018; Li et al., 2019b; 2018a; Dou et al., 2019) split source domains into meta-train and meta-test to simulate domain shift and train learner in a meta-learning approach. (Shankar et al., 2018; Carlucci et al., 2019; Zhou et al., 2020; Ryu et al., 2020) augment images or features to train learner to enhance generalization capability. Conventional domain generalization methods assume that the domain labels are available. But in a more realistic scenario, the domain labels may be unknown (Wang et al., 2019) . To handle this domain-free setting, Carlucci et al. ( 2019 In this work, we focus on a general learning scenario of domain generalization as follows. First, we do not know the domain label of each data and do not assume that there are several domains in the training dataset. Second, we do not assume that the training and test data are from different domains (e.g., styles). However, the previous domain-free DG methods (Matsuura & Harada, 2020) commonly evaluate on the datasets (e.g., PACS) composed of several domains though they do not use domain labels in training. In our domain-free setting, we do not assume and know the domains in the training dataset, we therefore model domain generalization as a learning problem that the learner should be able to generalize well for any splitting of train/val subsets, i.e., synthetic source/target domains, over the training dataset. This explicitly enforces that the trained learner should be generalizable for any possible domain shifts within the training dataset. To achieve this goal, we propose an adversarial splitting model that is a min-max optimization problem, due to the difficulty of enumerating all splittings. In this min-max problem, we adversarially split training dataset to train/val subsets by maximizing the domain shift between them based on the given learner, and then update learner by minimizing the prediction error on val-subset using meta-learning approach given the splitting. By optimizing this min-max problem, we enforce the learner to generalize well even in the worst-case splitting. We also investigate L 2 -normalization of features in our domain generalization method. It is surprisingly found that L 2 -normalization can improve performance of learner and mitigate gradient explosion in the meta-learning process of DG. We further theorectically analyze the underlying reasons for this finding. This proposed domain generalization approach is dubbed Domain-Free Adversarial Splitting, i.e., DFAS. To verify the effectiveness of our method, we conduct extensive experiments on benchmark datasets of PACS, Office-Home and CIFAR-10 under different settings with multiple/single source domains. In experiments that the training data are from several source domains, our method achieves state-ofthe-art results on both PACS and Office-Home datasets. We also find that our method significantly outperforms baselines in experiments that the training data are from a single source domain on PACS and CIFAR-10. We also confirm the effectiveness of our method by ablation study. Based on domain adaptation theory, we also derive an upper bound of the generalization error on unseen target domain. We analyze that the terms in this upper bound are implicitly minimized by our method. This theoretical analysis partially explains the success of our method.

2. RELATED WORKS

We summarize and compare with related domain generalization (DG) methods in two perspectives, i.e., DG with domain labels and DG without domain labels. DG with domain labels. When the domain labels are available, there are three categories of methods for DG. First, (Muandet et al., 2013; Ghifary et al., 2015; Li et al., 2018b; Piratla et al., 2020) learn domain invariant features by aligning feature distributions or by common/specific feature decomposition. Second, (Li et al., 2019a; Balaji et al., 2018; Li et al., 2019b; 2018a; Dou et al., 2019; Du et al., 2020a; b) are based on meta-learning approach that splits given source domains into meta-train and meta-test domains and trains learner in an episodic training paradigm. Third, (Shankar et al., 2018; Carlucci et al., 2019; Zhou et al., 2020; Wang et al., 2020) augment fake domain data to train learner for enhancing generalization capability of learner. Our method may mostly relate to the above second category of methods. But differently, we consider the DG problem in domain-free setting and adversarially split training dataset to synthesize domain shift in a principled min-max optimization method, instead of using leave-one-domain-out splitting in these methods.



) combines supervised learning and self-supervised learning to solve jigsaw puzzles of the training images. Matsuura & Harada (2020) divides samples into several latent domains via clustering and trains a domain invariant feature extractor via adversarial training. Huang et al. (2020) discards the dominant activated features, forcing the learner to activate remaining features that correlate with labels. Another line of works(Volpi et al., 2018; Qiao et al.,  2020)  tackle the single source setting that the training set comprises a single domain, and the train and test data are from different domains.

