DOMAIN GENERALISATION VIA DOMAIN ADAPTATION: AN ADVERSARIAL FOURIER AMPLITUDE APPROACH

Abstract

We tackle the domain generalisation (DG) problem by posing it as a domain adaptation (DA) task where we adversarially synthesise the worst-case 'target' domain and adapt a model to that worst-case domain, thereby improving the model's robustness. To synthesise data that is challenging yet semantics-preserving, we generate Fourier amplitude images and combine them with source domain phase images, exploiting the widely believed conjecture from signal processing that amplitude spectra mainly determines image style, while phase data mainly captures image semantics. To synthesise a worst-case domain for adaptation, we train the classifier and the amplitude generator adversarially. Specifically, we exploit the maximum classifier discrepancy (MCD) principle from DA that relates the target domain performance to the discrepancy of classifiers in the model hypothesis space. By Bayesian hypothesis modeling, we express the model hypothesis space effectively as a posterior distribution over classifiers given the source domains, making adversarial MCD minimisation feasible. On the DomainBed benchmark including the large-scale DomainNet dataset, the proposed approach yields significantly improved domain generalisation performance over the state-of-the-art.

1. INTRODUCTION

Contemporary machine learning models perform well when training and testing data are identically distributed. However, in practice it is often impossible to obtain an unbiased sample of real-world data for training, and therefore distribution-shift inevitably exists between training and deployment. Performance can degrade dramatically under such domain shift (Koh et al., 2021) , and this is often the cause of poor performance of real-world deployments (Geirhos et al., 2020) . This important issue has motivated a large amount of research into the topic of domain generalisation (DG) (Zhou et al., 2021a) , which addresses training models with increased robustness to distribution shift. These DG approaches span a diverse set of strategies including architectural innovations (Chattopadhyay et al., 2020) , novel regularisation (Balaji et al., 2018 ), alignment (Sun & Saenko, 2016) and learning (Li et al., 2019) objectives, and data augmentation (Zhou et al., 2021b) to make available training data more representative of potential testing data. However, the problem remains essentially unsolved, especially as measured by recent carefully designed benchmarks (Gulrajani & Lopez-Paz, 2021). Our approach is related to existing lines of work on data-augmentation solutions to DG (Zhou et al., 2021b; Shankar et al., 2018) , which synthesise more data for model training; and alignment-based approaches to Domain Adaptation (Sun & Saenko, 2016; Saito et al., 2018) that adapt a source model to an unlabeled target set -but cannot address the DG problem where the target set is unavailable. We improve on both by providing a unified framework for stronger data synthesis and domain alignment. Our framework combines two key innovations: A Bayesian approach to maximum classifier discrepancy, and a Fourier analysis approach to data augmentation. We start from the perspective of maximum classifier discrepancy (MCD) from domain adaptation (Ben-David et al., 2007; 2010; Saito et al., 2018) . This bounds the target-domain error as a function of discrepancy between multiple source-domain classifiers. It is not obvious how to apply MCD to the DG problem where we have no access to target-domain data. A key insight is that MCD provides a principled objective that we can maximise in order to synthesise a worst-case target domain, and also minimise in order to train a model that is adapted to that worst-case domain. Specifically, we take a Bayesian approach that learns (Saito et al., 2018) , which leaves us free to adversarially train the worst-case target domain. To enable challenging worst-case augmentations to be generated without the risk of altering image semantics, our augmentation strategy operates in the Fourier amplitude domain. It synthesises amplitude images, which can be combined with phase images from source-domain data to produce images that are substantially different in style (amplitude), while retaining the original semantics (phase). Our overall strategy termed Adversarial Generation of Fourier Amplitude (AGFA) is illustrated in Fig. 1 . In summary, we make the following main contributions: (1) We provide a novel and principled perspective on DG by drawing upon the MCD principle from DA. (2) We provide AGFA, an effective algorithm for DG based on variational Bayesian learning of the classifier and Fourier-based synthesis of the worst-case domain for robust learning. (3) Our empirical results show clear improvement on previous state-of-the-arts on the rigorous DomainBed benchmark.

2. PROBLEM SETUP AND BACKGROUND

We follow the standard setup for the Domain Generalisation (DG) problem. As training data, we are given labeled data S = {(x, y)|(x, y) ∼ D i , i = 1, . . . , N } where x ∈ X and y ∈ Y = {1, . . . , C}. Although the source domain S consists of different domains {D i } N i=1 with domain labels available, we simply take their union without using the originating domain labels. This is because in practice the number of domains (N ) is typically small, and it is rarely possible to estimate a meaningful population distribution for empirical S from a few different domains. What distinguishes DG from the closely-related (unsupervised) Domain Adaptation (DA), is that the target domain (T ) on which model's prediction performance is measured is unknown for DG, whereas in DA the input data x from the target domain are revealed (without class labels y). Below we briefly summarise the MCD principle and Ben-David's theorem, one of the key theorems in DA, as we exploit them to tackle DG. Ben-David's theorem and MCD principle in DA. In unsupervised DA, Ben-David's theorem (Ben-David et al., 2010; 2007) provides an upper bound for the target-domain generalisation error of a model (hypothesis). We focus on the tighter bound version, which states that for any classifier h in the hypothesis space H = {h|h : X → Y}, the following holds (without the sampling error term): e T (h) ≤ e S (h) + sup h,h ′ ∈H d S (h, h ′ ) -d T (h, h ′ ) + e * (H; S, T ), where e S (h) : = E (x,y)∼S [I(h(x) ̸ = y)] is the error rate of h(•) on the source domain S, d S (h, h ′ ) := E x∼S [I(h(x) ̸ = h ′ (x)) ] denotes the discrepancy between two classifiers h and h ′ on S (similarly for e T (h) and d T (h, h ′ )), and e * (H; S, T ) := min h∈H e S (h) + e T (h). Thus we can provably reduce the target domain generalisation error by simultaneously minimizing the three terms in the upper boundfoot_0 , namely source-domain error e S (h), classifier discrepancy, and minimal source-target error. Previous approaches (Saito et al., 2018; Kim et al., 2019) aim to minimise the upper bound, and one reasonable strategy is to constrain the hypothesis space H in such a way that it contains only those



Some recent work such as(Vedantam et al., 2021), however, empirically studied potential risk of looseness of the bound in certain scenarios.



Figure 1: Overall training flow of the proposed approach (AGFA). We generate target-domain data by synthesizing Fourier amplitude images trained adversarially. See main text in Sec. 3 for details. a distribution over source-domain classifiers, with which we can compute MCD. This simplifies the model by eliminating the need for adversarial classifier training in previous applications of MCD(Saito et al., 2018), which leaves us free to adversarially train the worst-case target domain. To enable challenging worst-case augmentations to be generated without the risk of altering image semantics, our augmentation strategy operates in the Fourier amplitude domain. It synthesises amplitude images, which can be combined with phase images from source-domain data to produce images that are substantially different in style (amplitude), while retaining the original semantics (phase). Our overall strategy termed Adversarial Generation of Fourier Amplitude (AGFA) is illustrated in Fig.1.

