GATED DOMAIN UNITS FOR MULTI-SOURCE DOMAIN GENERALIZATION Anonymous

Abstract

Distribution shift (DS) is a common problem that deteriorates the performance of learning machines. To tackle this problem, we postulate that real-world distributions are composed of elementary distributions that remain invariant across different environments. We call this an invariant elementary distribution (I.E.D.) assumption. The I.E.D. assumption implies an invariant structure in the solution space that enables knowledge transfer to unseen domains. To exploit this property in domain generalization (DG), we developed a modular neural network layer that consists of Gated Domain Units (GDUs). Each GDU learns an embedding of an individual elementary distribution that allows us to encode the domain similarities during the training. During inference, the GDUs compute similarities between an observation and each of the corresponding elementary distributions which are then used to form a weighted ensemble of learning machines. Because our layer is trained with backpropagation, it can naturally be integrated into existing deep learning frameworks. Our evaluation on image, text, graph, and time-series data shows a significant improvement in the performance on out-of-training target domains without domain information and any access to data from the target domains. This finding supports the practicality of the I.E.D. assumption and demonstrates that our GDUs can learn to represent these elementary distributions.

1. INTRODUCTION

A fundamental assumption in machine learning is that training and test data are independently and identically distributed (I.I.D.). This assumption ensures consistency-results from statistical learning theory, meaning that the learning machine obtained from an empirical risk minimization (ERM) attains the lowest achievable risk as sample size grows (Vapnik, 1998; Schölkopf, 2019) . Unfortunately, a considerable amount of research and real-world applications in the past decades has provided a staggering evidence against this assumption (Zhao et al., 2018; 2020; Ren et al., 2019; Taori et al., 2020 ) (see D'Amour et al. (2020) for case studies). The violation of the I.I.D. assumption is usually caused by a distribution shift (DS) and can result in inconsistent learning machines (Sugiyama & Kawanabe, 2012), implying the loss of performance guarantee of machine learning models in the real world. Therefore, to tackle DS, recent work advocates for domain generalization (DG) (Blanchard et al., 2011; Muandet et al., 2013; Li et al., 2017; 2018b; Zhou et al., 2021a) . This generalization to utterly unseen domains is crucial for robust deployment of the models in practice, especially when new, unforeseeable domains emerge after model deployment. However, the most important question that DG seeks to answer is how to identify the right invariance that allows for generalization. The contribution of this work is twofold. First, we advocate that real-world distributions are composed of smaller "units" called invariant elementary distributions that remain invariant across different domains; see Section 2.1. Second, we propose to implement this hypothesis through so-called gated domain units (GDUs). Specifically, we developed a modular neural network layer that consists of GDUs. Each GDU learns an embedding of an individual elementary domain that allows us to express the domain similarities during training. For this purpose, we adopt the theoretical framework of reproducing kernel Hilbert space (RKHS) to retrieve a geometrical representation of each distribution in the form of a kernel mean embedding (KME) without information loss (Berlinet & Thomas-Agnan, 2004; Smola et al., 2007; Sriperumbudur et al., 2010; Muandet et al., 2017) . This representation accommodates methods based on analytical geometry to measure similarities between distributions.

