EQUIVARIANT DISENTANGLED TRANSFORMATION FOR DOMAIN GENERALIZATION UNDER COMBINATION SHIFT

Abstract

Machine learning systems may encounter unexpected problems when the data distribution changes in the deployment environment. A major reason is that certain combinations of domains and labels are not observed during training but appear in the test environment. Although various invariance-based algorithms can be applied, we find that the performance gain is often marginal. To formally analyze this issue, we provide a unique algebraic formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement. The algebraic requirements naturally derive a simple yet effective method, referred to as equivariant disentangled transformation (EDT), which augments the data based on the algebraic structures of labels and makes the transformation satisfy the equivariance and disentanglement requirements. Experimental results demonstrate that invariance may be insufficient, and it is important to exploit the equivariance structure in the combination shift problem.

1. INTRODUCTION

The way we humans perceive the world is combinatorial -we tend to cognize a complex object or phenomenon as a combination of simpler factors of variation. Further, we have the ability to recognize, imagine, and process novel combinations of factors that we have never observed so that we can survive in this rapidly changing world. Such ability is usually referred to as generalization. However, despite recent super-human performance on certain tasks, machine learning systems still lack this generalization ability, especially when only a limited subset of all combinations of factors are observable (Sagawa et al., 2020; Träuble et al., 2021; Goel et al., 2021; Wiles et al., 2022) . In risk-sensitive applications such as driver-assistance systems (Alcorn et al., 2019; Volk et al., 2019) and computer-aided medical diagnosis (Castro et al., 2020; Bissoto et al., 2020) , performing well only on a given subset of combinations but not on unobserved combinations may cause unexpected and catastrophic failures in a deployment environment. A more practical but still challenging learning problem is to learn all domains and labels, but only given a limited subset of the domain-label combinations for training. We refer to the usual setting of domain generalization as domain shift and this new setting as combination shift. An illustration is given in Fig. 1 . Combination shift is more feasible because all domains are at least partially observable during training but is also more challenging because the distribution of labels can vary significantly across domains. The learning goal is to improve generalization with as few combinations as possible.



Domain generalization(Wang et al., 2021a)  is a problem where we need to deal with combinations of two factors: domains and labels. Recently, Gulrajani & Lopez-Paz (2021) questioned the progress of the domain generalization research, claiming that several algorithms are not significantly superior to an empirical risk minimization (ERM) baseline. In addition to the model selection issue raised by Gulrajani & Lopez-Paz (2021), we conjecture that this is due to the ambitious goal of the usual domain generalization setting: generalizing to a completely unknown domain. Is it really possible to understand art if we have only seen photographs(Li et al., 2017)? Besides, those datasets used for evaluation usually have almost uniformly distributed domains and classes for training, which may be unrealistic to expect in real-world applications.

