CONTINUAL INVARIANT RISK MINIMIZATION

Abstract

Empirical risk minimization can lead to poor generalization behaviour on unseen environments if the learned model does not capture invariant feature representations. Invariant risk minimization (IRM) is a recent proposal for discovering environment-invariant representations. It was introduced by Arjovsky et al. ( 2019) and extended by Ahuja et al. (2020). The assumption of IRM is that all environments are available to the learning system at the same time. With this work, we generalize the concept of IRM to scenarios where environments are observed sequentially. We show that existing approaches, including those designed for continual learning, fail to identify the invariant features and models across sequentially presented environments. We extend IRM under a variational Bayesian and bilevel framework, creating a general approach to continual invariant risk minimization. We also describe a strategy to solve the optimization problems using a variant of the alternating direction method of multiplier (ADMM). We show empirically using multiple datasets and with multiple sequential environments that the proposed methods outperforms or is competitive with prior approaches.

1. INTRODUCTION

Empirical risk minimization (ERM) is the predominant principle for designing machine learning models. In numerous application domains, however, the test data distribution can differ from the training data distribution. For instance, at test time, the same task might be observed in a different environment. Neural networks trained by minimizing ERM objectives over the training distribution tend to generalize poorly in these situations. Improving generalization of learning systems has become a major research topic in recent years, with many different threads of research including, but not limited to, robust optimization (e.g., Hoffman et al. (2018) ) and domain adaptation (e.g., Johansson et al. (2019) ). Both of these research directions, however, have their own intrinsic limitations (Ahuja et al. ( 2020)). Recently, there have been proposals of approaches that learn environment-invariant representations. The motivating idea is that the behavior of a model being invariant across environments makes it more likely that the model has captured a causal relationship between features and prediction targets. This in turn should lead to a better generalization behavior. Invariant risk minimization (IRM, Arjovsky et al. ( 2019)), which pioneered this idea, introduces a new optimization loss function to identify non-spurious causal feature-target interactions. Invariant risk minimization games (IRMG, Ahuja et al. ( 2020)) expands on IRM from a game-theoretic perspective. The assumption of IRM and its extensions, however, is that all environments are available to the learning system at the same time, which is unrealistic in numerous applications. A learning agent experiences environments often sequentially and not concurrently. For instance, in a federated learning scenario with patient medical records, each hospital's (environment) data might be used to train a shared machine learning model which receives the data from these environments in a sequential manner. The model might then be applied to data from an additional hospital (environment) that was not available at training time. Unfortunately, both IRM and IRMG are incompatible with such a continual learning setup in which the learner receives training data from environments presented in a sequential manner. As already noted by Javed et al. (2020) , "IRM Arjovsky et al. ( 2019) requires sampling data from multiple environments simultaneously for computing a regularization term pertinent to its learning objective, where different environments are defined by intervening on one or more variables of the world." The same applies to IRMG (Ahuja et al. (2020)) 

