THE RISKS OF INVARIANT RISK MINIMIZATION

Abstract

Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of data which are a complex function of latent variables; many alternatives have subsequently been suggested. However, formal guarantees for all of these works are severely lacking. In this paper, we present the first analysis of classification under the IRM objective-as well as these recently proposed alternatives-under a fairly natural and general model. In the linear case, we give simple conditions under which the optimal solution succeeds or, more often, fails to recover the optimal invariant predictor. We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution-this is precisely the issue that it was intended to solve. Thus, in this setting we find that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.

1. INTRODUCTION

Prediction algorithms are evaluated by their performance on unseen test data. In classical machine learning, it is common to assume that such data are drawn i.i.d. from the same distribution as the data set on which the learning algorithm was trained-in the real world, however, this is often not the case. When this discrepancy occurs, algorithms with strong in-distribution generalization guarantees, such as Empirical Risk Minimization (ERM), can fail catastrophically. In particular, while deep neural networks achieve superhuman performance on many tasks, there is evidence that they rely on statistically informative but non-causal features in the data (Beery et al., 2018; Geirhos et al., 2018; Ilyas et al., 2019) . As a result, such models are prone to errors under surprisingly minor distribution shift (Su et al., 2019; Recht et al., 2019) . To address this, researchers have investigated alternative objectives for training predictors which are robust to possibly egregious shifts in the test distribution. The task of generalizing under such shifts, known as Out-of-Distribution (OOD) Generalization, has led to many separate threads of research. One approach is Bayesian deep learning, accounting for a classifier's uncertainty at test time (Neal, 2012) . Another technique that has shown promise is data augmentation-this includes both automated data modifications which help prevent overfitting (Shorten & Khoshgoftaar, 2019) and specific counterfactual augmentations to ensure invariance in the resulting features (Volpi et al., 2018; Kaushik et al., 2020) . A strategy which has recently gained particular traction is Invariant Causal Prediction (ICP; Peters et al. 2016), which views the task of OOD generalization through the lens of causality. This framework assumes that the data are generated according to a Structural Equation Model (SEM; Bollen 2005), which consists of a set of so-called mechanisms or structural equations that specify variables given their parents. ICP assumes moreover that the data can be partitioned into environments, where each environment corresponds to interventions on the SEM (Pearl, 2009) , but where the mechanism by which the target variable is generated via its direct parents is unaffected. Thus the causal mechanism of the target variable is unchanging but other aspects of the distribution can vary broadly. As a result, learning mechanisms that are the same across environments ensures recovery of the invariant features which generalize under arbitrary interventions. In this work, we consider objectives that attempt to learn what we refer to as the "optimal invariant predictor"-this is the classifier which uses and is optimal with respect to only the invariant features in the SEM. By definition, such a classifier does not overfit to environment-specific properties of the data distribution, so it will generalize even under major distribution shift at test time. In particular, we focus our analysis on one of the more popular objectives, Invariant Risk Minimization (IRM; Arjovsky et al. ( 2019)), but our results can easily be extended to similar recently proposed alternatives. Various works on invariant prediction (Muandet et al., 2013; Ghassami et al., 2017; Heinze-Deml et al., 2018; Rojas-Carulla et al., 2018; Subbaswamy et al., 2019; Christiansen et al., 2020) consider regression in both the linear and non-linear setting, but they exclusively focus on learning with fully or partially observed covariates or some other source of information. Under such a condition, results from causal inference (Maathuis et al., 2009; Peters et al., 2017) allow for formal guarantees of the identification of the invariant features, or at least a strict subset of them. With the rise of deep learning, more recent literature has developed objectives for learning invariant representations when the data are a non-linear function of unobserved latent factors, a common assumption when working with complex, high-dimensional data such as images. Causal discovery and inference with unobserved confounders or latents is a much harder problem (Peters et al., 2017) , so while empirical results seem encouraging, these objectives are presented with few formal guarantees. IRM is one such objective for invariant representation learning. The goal of IRM is to learn a feature embedder such that the optimal linear predictor on top of these features is the same for every environment-the idea being that only the invariant features will have an optimal predictor that is invariant. Recent works have pointed to shortcomings of IRM and have suggested modifications which they claim prevent these failures. However, these alternatives are compared in broad strokes, with little in the way of theory. In this work, we present the first formal analysis of classification under the IRM objective under a fairly natural and general model which carefully formalizes the intuition behind the original work. Our results show that despite being inspired by invariant prediction, this objective can frequently be expected to perform no better than ERM. In the linear setting, we present simple, exact conditions under which solving to optimality succeeds or, more often, breaks down in recovering the optimal invariant predictor. We also demonstrate another major failure case-under mild conditions, there exists a feasible point that uses only non-invariant features and achieves lower empirical risk than the optimal invariant predictor; thus it will appear as a more attractive solution, yet its reliance on non-invariant features mean it will fail to generalize. As corollaries, we present similar settings where all recently suggested alternatives to IRM likewise fail. Futhermore, we present the first results in the non-linear regime: we demonstrate the existence of a classifier with exponentially small suboptimality which nevertheless heavily relies on non-invariant features on most test inputs, resulting in worse-than-chance performance on distributions that are sufficiently dissimilar from the training environments. These findings strongly suggest that existing approaches to ICP for high-dimensional latent variable models do not cleanly achieve their stated objective and that future work would benefit from a more formal treatment.

2. RELATED WORK

Works on learning deep invariant representations vary considerably: some search for a domaininvariant representation (Muandet et al., 2013; Ganin et al., 2016) , i.e. invariance of the distribution p(Φ(x)), typically used for domain adaptation (Ben-David et al., 2010; Ganin & Lempitsky, 2015; Zhang et al., 2015; Long et al., 2018) , with assumed access to labeled or unlabeled data from the target distribution. Other works instead hope to find representations that are conditionally domain-invariant, with invariance of p(Φ(x) | y) (Gong et al., 2016; Li et al., 2018) . However, there is evidence that invariance may not be sufficient for domain adaptation (Zhao et al., 2019; Johansson et al., 2019) . In contrast, this paper focuses instead on domain generalization (Blanchard et al., 2011; Rosenfeld et al., 2021) , where access to the test distribution is not assumed. Recent works on domain generalization, including the objectives discussed in this paper, suggest invariance of the feature-conditioned label distribution. In particular, Arjovsky et al. ( 2019) only assume invariance of E[y | Φ(x)]; follow-up works rely on a stronger assumption of invariance of higher conditional moments (Krueger et al., 2020; Xie et al., 2020; Jin et al., 2020; Mahajan et al., 2020; Bellot & van der Schaar, 2020) . Though this approach has become popular in the last year, it is somewhat similar to the existing concept of covariate shift (Shimodaira, 2000; Bickel et al., 2009), 

