VARIATIONAL INVARIANT LEARNING FOR BAYESIAN DOMAIN GENERALIZATION

Abstract

Domain generalization addresses the out-of-distribution problem, which is challenging due to the domain shift and the uncertainty caused by the inaccessibility to data from the target domains. In this paper, we propose variational invariant learning, a probabilistic inference framework that jointly models domain invariance and uncertainty. We introduce variational Bayesian approximation into both the feature representation and classifier layers to facilitate invariant learning for better generalization across domains. In the probabilistic modeling framework, we introduce a domain-invariant principle to explore invariance across domains in a unified way. We incorporate the principle into the variational Bayesian layers in neural networks, achieving domain-invariant representations and classifier. We empirically demonstrate the effectiveness of our proposal on four widely used cross-domain visual recognition benchmarks. Ablation studies demonstrate the benefits of our proposal and on all benchmarks our variational invariant learning consistently delivers state-of-the-art performance.

1. INTRODUCTION

Domain generalization (Muandet et al., 2013) , as an out-of-distribution problem, aims to train a model on several source domains and have it generalize well to unseen target domains. The major challenge stems from the large distribution shift between the source and target domains, which is further complicated by the prediction uncertainty (Malinin & Gales, 2018) introduced by the inaccessibility to data from target domains during training. Previous approaches focus on learning domain-invariant features using novel loss functions (Muandet et al., 2013; Li et al., 2018a) or specific architectures (Li et al., 2017a; D'Innocente & Caputo, 2018) . Meta-learning based methods were proposed to achieve similar goals by leveraging an episodic training strategy (Li et al., 2017b; Balaji et al., 2018; Du et al., 2020) . Most of these methods are based on deep neural network backbones (Krizhevsky et al., 2012; He et al., 2016) . However, while deep neural networks have achieved remarkable success in various vision tasks, their performance is known to degenerate considerably when the test samples are out of the training data distribution (Nguyen et al., 2015; Ilse et al., 2019) , due to their poorly calibrated behavior (Guo et al., 2017; Kristiadi et al., 2020) . As an attractive solution, Bayesian learning naturally represents prediction uncertainty (Kristiadi et al., 2020; MacKay, 1992) , possesses better generalizability to out-of-distribution examples (Louizos & Welling, 2017) and provides an elegant formulation to transfer knowledge across different datasets (Nguyen et al., 2018) . Further, approximate Bayesian inference has been demonstrated to be able to improve prediction uncertainty (Blundell et al., 2015; Louizos & Welling, 2017; Atanov et al., 2019) , even when only applied to the last network layer (Kristiadi et al., 2020) . These properties make it appealing to introduce Bayesian learning into the challenging and unexplored scenario of domain generalization. In this paper, we propose variational invariant learning (VIL), a Bayesian inference framework that jointly models domain invariance and uncertainty for domain generalization. We apply variational Bayesian approximation to the last two network layers for both the representations and classifier by placing prior distributions over their weights, which facilitates generalization. We adopt Bayesian neural networks to domain generalization, which enjoys the representational power of deep neural networks while facilitating better generalization. To further improve the robustness to domain shifts, we introduce the domain-invariant principle under the Bayesian inference framework, which enables us to explore domain invariance for both feature representations and the classifier in a unified way. We evaluate our method on four widely-used benchmarks for cross-domain visual object classification. Our ablation studies demonstrate the effectiveness of the variational Bayesian domain-invariant features and classifier for domain generalization. Results further show that our method achieves the best performance on all of the four benchmarks.

2. METHODOLOGY

We explore Bayesian inference for domain generalization. In this task, the samples from the target domains are never seen during training, and are usually out of the data distribution of the source domains. This leads to uncertainty when making predictions on the target domains. Bayesian inference offers a principled way to represent the predictive uncertainty in neural networks (MacKay, 1992; Kristiadi et al., 2020) . We briefly introduce approximate Bayesian inference, under which we will introduce our variational invariant learning for domain generalization.

2.1. APPROXIMATE BAYESIAN INFERENCE

Given a dataset {x (i) , y (i) } N i=1 of N input-output pairs and a model parameterized by weights θ with a prior distribution p(θ), Bayesian neural networks aim to infer the true posterior distribution p(θ|x, y). As the exact inference of the true posterior is computationally intractable, Hinton & Camp (1993) and Graves ( 2011) recommended learning a variational distribution q(θ) to approximate p(θ|x, y) by minimizing the Kullback-Leibler (KL) divergence between them: θ * = arg min θ D KL q(θ)||p(θ|x, y) . (1) The above optimization is equivalent to minimizing the loss function: L Bayes = -E q(θ) [log p(y|x, θ)] + D KL [q(θ)||p(θ)], which is also known as the negative value of the evidence lower bound (ELBO) (Blei et al., 2017) .

2.2. VARIATIONAL DOMAIN-INVARIANT LEARNING

In domain generalization, let D = {D i } |D| i=1 = S ∪ T be a set of domains, where S and T denote source domains and target domains respectively. S and T do not have any overlap with each other but share the same label space. For each domain D i ∈ D, we can define a joint distribution p(x i , y i ) in the input space X and the output space Y. We aim to learn a model f : X → Y in the source domains S that can generalize well to the target domains T . The fundamental problem in domain generalization is to achieve robustness to domain shift between source and target domains, that is, we aim to learn a model invariant to the distributional shift between the source and target domains. In this work, we mainly focus on the invariant property across domains instead of exploring general invariance properties (Nalisnick & Smyth, 2018) . Therefore, we introduce a formal definition of domain invariance, which is easily incorporated as criteria into the Bayesian framework to achieve domain-invariant learning. Provided that all domains in D are in the same domain space, then for any input sample x s in domain D s , we assume that there exists a domain-transform function g ζ (•) which is defined as a mapping function that is able to project x s to other different domains (3)



D ζ with respect to the parameter ζ, where ζ ∼ q(ζ), and a different ζ lead to different post-transformation domains D ζ . Usually the exact form of g ζ (•) is not necessarily known. Under this assumption, we introduce the definition of domain invariance, which we will incorporate into the Bayesian layers of neural networks for domain-invariant learning. Definition 2.1 (Domain Invariance) Let x s be a given sample from domain D s ∈ D, and x ζ = g ζ (x s ) be a transformation of x s in another domain D ζ , where ζ ∼ q(ζ). p θ (y|x) denotes the output distribution of input x with model θ. The model θ is domain-invariant if, p θ (y s |x s ) = p θ (y ζ |x ζ ), ∀ζ ∼ q(ζ).

