LEARNING ROBUST MODELS BY COUNTERING SPURIOUS CORRELATIONS

Abstract

Machine learning has demonstrated remarkable prediction accuracy over i.i.d data, but the accuracy often drops when tested with data from another distribution. One reason behind this accuracy drop is the reliance of models on the features that are only associated with the label in the training distribution, but not the test distribution. This problem is usually known as spurious correlation, confounding factors, or dataset bias. In this paper, we formally study the generalization error bound for this setup with the knowledge of how the spurious features are associated with the label. We also compare our analysis to the widely-accepted domain adaptation error bound and show that our bound can be tighter, with more assumptions that we consider realistic. Further, our analysis naturally offers a set of solutions for this problem, linked to established solutions in various topics about robustness in general, and these solutions all require some understandings of how the spurious features are associated with the label. Finally, we also briefly discuss a method that does not require such an understanding.

1. INTRODUCTION

Machine learning, especially deep neural networks, has demonstrated remarkable empirical successes over various benchmarks. One promising next step is to extend such empirical achievements beyond i.i.d benchmarks. If we train a model with data from one distribution (i.e., the source distribution), how can we guarantee the error to be small over other unseen, but related distributions (i.e., target distributions). Quantifying the generalization error over two arbitrary distributions is not useful, thus, we require the distributions of study similar but different: being similar in the sense that there exists a common function that can achieve zero error over both distributions, while being different in the sense that there exists another different function that can only achieve zero error over the training distribution, but not the test distribution. This problem may not be trivial because the empirical risk minimizer (ERM) may lead the model to learn this second function, a topic studied under different terminologies such as spurious correlations (Vigen, 2015) , confounding factors (McDonald, 2014) or dataset bias (Torralba & Efros, 2011) . As a result, small empirical error may not mean the model learns what we expect (Geirhos et al., 2019; Wang et al., 2020) , thus the model may not be able to perform consistently over other related data. In particular, our view of the challenges in this topic is illustrated with a toy example in Figure 1 where the model is trained on the source domain data to classify triangle vs. circle and tested on the target domain data. However, the color coincides with the shape on the source domain, so the model may learn either the desired function (relying on shape) or the spurious function (relying on color). The spurious function will not classify the target domain data correctly while the desired function can, but the ERM cannot differentiate them. As one may expect, whether shape or color is considered as desired or spurious is subjective dependent on the task or the data, and in general irrelevant to the statistical nature of the problem. Therefore, our error bound will require the knowledge of the spurious function. While this is a toy example, this scenario surly exists in real world tasks (e.g., Jo & Bengio, 2017; Geirhos et al., 2019; Wang et al., 2020) . The contributions of this paper are: • We analyze the cross-distribution generalization error bound of a model when the model is trained with a distribution with spuriously correlated features, which is formalized as the main theorem of this paper.

