OUT-OF-DISTRIBUTION GENERALIZATION ANALYSIS VIA INFLUENCE FUNCTION Anonymous

Abstract

The mismatch between training and target data is one major challenge for current machine learning systems. When training data is collected from multiple domains and the target domains include all training domains and other new domains, we are facing an Out-of-Distribution (OOD) generalization problem that aims to find a model with the best OOD accuracy. One of the definitions of OOD accuracy is worst-domain accuracy. In general, the set of target domains is unknown, and the worst over target domains may be unseen when the number of observed domains is limited. In this paper, we show that the worst accuracy over the observed domains may dramatically fail to identify the OOD accuracy. To this end, we introduce Influence Function, a classical tool from robust statistics, into the OOD generalization problem and suggest the variance of influence function to monitor the stability of a model on training domains. We show that the accuracy on test domains and the proposed index together can help us discern whether OOD algorithms are needed and whether a model achieves good OOD generalization.

1. INTRODUCTION

Most machine learning systems assume both training and test data are independently and identically distributed, which does not always hold in practice (Bengio et al. (2019) ). Consequently, its performance is often greatly degraded when the test data is from a different domain (distribution). A classical example is the problem to identify cows and camels (Beery et al. (2018) ), where the empirical risk minimization (ERM, Vapnik (1992) ) may classify images by background color instead of object shape. As a result, when the test domain is "out-of-distribution" (OOD), e.g. when the background color is changed, its performance will drop significantly. The OOD generalization is to obtain a robust predictor against this distribution shift. Suppose that we have training data collected from m domains: S = {S e : e ∈ E tr , |E tr | = m}, S e = {z e 1 , z e 2 , . . . , z e n e } with z e i ∼ P e , where P e is the distribution corresponding to domain e, E tr is the set of all available domains, including validation domains, and z e i is a data point. The OOD problem we considered is to find a model f OOD such that f OOD = arg min f sup P e ∈E all (f, P e ), where E all is the set of all target domains and (f, P e ) is the expected loss of f on the domain P e . Recent algorithms address this OOD problem by recovering invariant (causal) features and build the optimal model on top of these features, such as Invariant Risk Minimization (IRM, Arjovsky et al. 2017)). This is not surprising, since these tasks only require high performance on certain domains, while an OOD algorithm is expected to learn truly invariant



(2019)), Risk Extrapolation (REx, Krueger et al. (2020)), Group Distributionally Robust Optimization (gDRO, Sagawa et al. (2019)) and Inter-domain Mixup (Mixup, Xu et al. (2020); Yan et al. (2020); Wang et al. (2020)). Most works evaluate on Colored MNIST (see 5.1 for details) where we can directly obtain the worst domain accuracy over E all . Gulrajani & Lopez-Paz (2020) has assembled many algorithms and multi-domain datasets, and finds that OOD algorithms can't outperform ERM in some domain generalization tasks (Gulrajani & Lopez-Paz (2020)), e.g. VLCS (Torralba & Efros (2011)) and PACS (Li et al. (

