LEARNING ROBUST MODELS BY COUNTERING SPURIOUS CORRELATIONS

Abstract

Machine learning has demonstrated remarkable prediction accuracy over i.i.d data, but the accuracy often drops when tested with data from another distribution. One reason behind this accuracy drop is the reliance of models on the features that are only associated with the label in the training distribution, but not the test distribution. This problem is usually known as spurious correlation, confounding factors, or dataset bias. In this paper, we formally study the generalization error bound for this setup with the knowledge of how the spurious features are associated with the label. We also compare our analysis to the widely-accepted domain adaptation error bound and show that our bound can be tighter, with more assumptions that we consider realistic. Further, our analysis naturally offers a set of solutions for this problem, linked to established solutions in various topics about robustness in general, and these solutions all require some understandings of how the spurious features are associated with the label. Finally, we also briefly discuss a method that does not require such an understanding.

1. INTRODUCTION

Machine learning, especially deep neural networks, has demonstrated remarkable empirical successes over various benchmarks. One promising next step is to extend such empirical achievements beyond i.i.d benchmarks. If we train a model with data from one distribution (i.e., the source distribution), how can we guarantee the error to be small over other unseen, but related distributions (i.e., target distributions). Quantifying the generalization error over two arbitrary distributions is not useful, thus, we require the distributions of study similar but different: being similar in the sense that there exists a common function that can achieve zero error over both distributions, while being different in the sense that there exists another different function that can only achieve zero error over the training distribution, but not the test distribution. This problem may not be trivial because the empirical risk minimizer (ERM) may lead the model to learn this second function, a topic studied under different terminologies such as spurious correlations (Vigen, 2015) , confounding factors (McDonald, 2014) or dataset bias (Torralba & Efros, 2011) . As a result, small empirical error may not mean the model learns what we expect (Geirhos et al., 2019; Wang et al., 2020) , thus the model may not be able to perform consistently over other related data. In particular, our view of the challenges in this topic is illustrated with a toy example in Figure 1 where the model is trained on the source domain data to classify triangle vs. circle and tested on the target domain data. However, the color coincides with the shape on the source domain, so the model may learn either the desired function (relying on shape) or the spurious function (relying on color). The spurious function will not classify the target domain data correctly while the desired function can, but the ERM cannot differentiate them. As one may expect, whether shape or color is considered as desired or spurious is subjective dependent on the task or the data, and in general irrelevant to the statistical nature of the problem. Therefore, our error bound will require the knowledge of the spurious function. While this is a toy example, this scenario surly exists in real world tasks (e.g., Jo & Bengio, 2017; Geirhos et al., 2019; Wang et al., 2020) . The contributions of this paper are: • We analyze the cross-distribution generalization error bound of a model when the model is trained with a distribution with spuriously correlated features, which is formalized as the main theorem of this paper. spurious labeling function (classifier) of the shape because color coincides with the shape in this distribution, this function will not predict the shape correctly over target domain data. desired labeling function (classifier) of the shape, this function will predict the shape correctly over target domain data.

Source Domain Target Domain

Figure 1 : A toy example of the main problem focused in this paper. • We compare our bound to the widely-accepted domain adaptation one (Ben-David et al., 2010) and show that our bound can be tighter under assumptions that we consider realistic. • Our main theorem naturally offers principled solutions of this problem, and the solutions are linked to many previous established methods for robustness in a broader context. • As the principled solutions all require some knowledge of the task or the data, our main theorem also leads to a new heuristic absent of the knowledge. This new method may be on a par with the principled solutions, and can outperform the vanilla training empirically.

2. RELATED WORK

There is a rich history of learning robust models. We first discuss works in three topics, all centering around the concept of invariance, where invariance intuitively means the model's prediction preserves under certain shift of the data. We then highlight works related to our theoretical discussion.

Cross-domain Generalization

This line of works probably originates from domain adaptation (Ben-David et al., 2007) , which studies the problem of training a model over one distribution and test it over another one. Since (Ganin et al., 2016) , recent advances along this topic mainly center around the concept of invariance: most techniques leverage different regularizers to learn representations that are invariant to the marginals of these two distributions (e.g., Ghifary et al., 2016; Rozantsev et al., 2018) . Further, the community aims beyond the situation that a trained model from domain adaptation may only be applicable to one distribution, and focuses on domain generalization (Muandet et al., 2013) , which studies the problem of training a model over a collection of distributions and test it with distributions unseen during training. Similarly, most recent methods aim to learn representations invariant to the marginals of the training distributions (e.g., Motiian et al., 2017; Li et al., 2018; Carlucci et al., 2018) . Recently, the community extends the study to domain generalization without domain IDs to address the real-world situations that domain IDs are unavailable (Wang et al., 2019b) , which again focuses on learning representations invariant to specifically designed functions. Adversarially Robust Models The study of robustness against adversarial examples was popularized by the empirical observations that small perturbations on image data can significantly alter the model's prediction (Szegedy et al., 2013; Goodfellow et al., 2015) . This observation initiated a line of works building models invariant to such small perturbations (the rigorous definitions of "small perturbations" will not be discussed in details here) (e.g., Lee et al., 2017; Akhtar et al., 2018) and adversarial training (Madry et al., 2018) is currently the most widely-accepted method in terms of empirical defense. On the other hand, the community also aims to develop methods that are provably robust to predefined perturbations (e.g., Wong & Kolter, 2018; Croce & Hein, 2020) , which links back to the works of distributional robust models (e.g., Abadeh et al., 2015; Sagawa* et al., 2020) , whose central goal is to train models invariant to a predefined shift of distributions. Recent evidence shows key challenges of learning adversarially robust models are spuriously correlated features (Ilyas et al., 2019; Wang et al., 2020) , connecting adversarial robustness to the next topic. Countering Spurious Correlation Works along this line usually connects the robustness of a model to its ability of ignoring the spurious correlation in the data, which was also studied under the terminologies of confounding factors, or dataset bias. With different concrete definitions of the spurious correlation, methods have been developed for various applications, such as image/video classification (e.g., Goyal et al., 2017; Wang et al., 2019a; b; Bahng et al., 2019; Shi et al., 2020 ), text classification (e.g., He et al., 2019; Clark et al., 2019; Bras et al., 2020; Zhou & Bansal, 2020;  

