LEARNING ROBUST MODELS USING THE PRINCIPLE OF INDEPENDENT CAUSAL MECHANISMS Anonymous

Abstract

Standard supervised learning breaks down under data distribution shift. However, the principle of independent causal mechanisms (ICM, Peters et al. ( 2017)) can turn this weakness into an opportunity: one can take advantage of distribution shift between different environments during training in order to obtain more robust models. We propose a new gradient-based learning framework whose objective function is derived from the ICM principle. We show theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments and ignore unstable ones. Moreover, we prove that the recovered stable relations correspond to the true causal mechanisms under certain conditions. In both regression and classification, the resulting models generalize well to unseen scenarios where traditionally trained models fail.

1. INTRODUCTION

Standard supervised learning has shown impressive results when training and test samples follow the same distribution. However, many real world applications do not conform to this setting, so that research successes do not readily translate into practice (Lake et al., 2017) . The task of Domain Generalization (DG) addresses this problem: it aims at training models that generalize well under domain shift. In contrast to domain adaption, where a few labeled and/or many unlabeled examples are provided for each target test domain, in DG absolutely no data is available from the test domains' distributions making the problem unsolvable in general. In this work, we view the problem of DG specifically using ideas from causal discovery. To make the problem of DG well-posed from this viewpoint, we assume that there exists a feature vector h (X) whose relation to the target variable Y is invariant across all environments. Consequently, the conditional probability p(Y | h (X)) has predictive power in each environment. From a causal perspective, changes between domains or environments can be described as interventions; and causal relationships -unlike purely statistical ones -remain invariant across environments unless explicitly changed under intervention. This is due to the fundamental principle of "Independent Causal Mechanisms" which will be discussed in Section 3. From a causal standpoint, finding robust models is therefore a causal discovery task (Bareinboim & Pearl, 2016; Meinshausen, 2018) . Taking a causal perspective on DG, we aim at identifying features which (i) have an invariant relationship to the target variable Y and (ii) are maximally informative about Y . This problem has already been addressed with some simplifying assumptions and a discrete combinatorial search in Magliacane et al. (2018); Rojas-Carulla et al. (2018) , but we make weaker assumptions and use gradient based optimization. Gradient based optimization is attractive because it readily scales to high dimensions and offers the possibility to learn very informative features, instead of merely selecting among predefined ones. Approaches to invariant relations similar to ours were taken in Ghassami et al. (2017) , who restrict themselves to linear relations, and Arjovsky et al. (2019); Krueger et al. (2020) , who minimize an invariant empirical risk objective. Problems (i) and (ii) are quite intricate because the search space has combinatorial complexity and testing for conditional independence in high dimensions is notoriously difficult. Our main contributions to this problem are the following: • By connecting invariant (causal) relations with normalizing flows, we propose a differentiable two-part objective of the form I(Y ; h(X))+λ I L I , where I is the mutual information

