AGREE TO DISAGREE: DIVERSITY THROUGH DIS-AGREEMENT FOR BETTER TRANSFERABILITY

Abstract

Gradient-based learning algorithms have an implicit simplicity bias which in effect can limit the diversity of predictors being sampled by the learning procedure. This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -present in the training data but absent from the test data -and (ii) by only leveraging a small subset of predictive features. Such an effect is especially magnified when the test distribution does not exactly match the train distribution-referred to as the Out of Distribution (OOD) generalization problem. However, given only the training data, it is not always possible to apriori assess if a given feature is spurious or transferable. Instead, we advocate for learning an ensemble of models which capture a diverse set of predictive features. Towards this, we propose a new algorithm D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data, but disagreement on the OOD data. We show how D-BAT naturally emerges from the notion of generalized discrepancy, as well as demonstrate in multiple experiments how the proposed method can mitigate shortcut-learning, enhance uncertainty and OOD detection, as well as improve transferability.

1. INTRODUCTION

While gradient-based learning algorithms such as Stochastic Gradient Descent (SGD), are nowadays ubiquitous in the training of Deep Neural Networks (DNNs), it is well known that the resulting models are (i) brittle when exposed to small distribution shifts (Beery et al., 2018; Sun et al., 2016; Amodei et al., 2016) , (ii) can easily be fooled by small adversarial perturbations (Szegedy et al., 2014) , (iii) tend to pick up spurious correlations (McCoy et al., 2019; Oakden-Rayner et al., 2020; Geirhos et al., 2020) -present in the training data but absent from the downstream task -, as well as (iv) fail to provide adequate uncertainty estimates (Kim et al., 2016; van Amersfoort et al., 2020; Liu et al., 2021b) . Recently those learning algorithms have been investigated for their implicit bias toward simplicity -known as Simplicity Bias (SB), seen as one of the reasons behind their superior generalization properties (Arpit et al., 2017; Dziugaite & Roy, 2017) . While for deep neural networks, simpler decision boundaries are often seen as less likely to overfit, Shah et al. ( 2020), Pezeshki et al. ( 2021) demonstrated that the SB can still cause the aforementioned issues. In particular, they show how the SB can be extreme, compelling predictors to rely only on the simplest feature available, despite the presence of equally or even more predictive complex features. Its effect is greatly increased when we consider the more realistic out of distribution (OOD) setting (Ben-Tal et al., 2009) , in which the source and target distributions are different, known to be a challenging problem (Sagawa et al., 2020; Krueger et al., 2021) . The difference between the two domains can be categorized into either a distribution shift -e.g. a lack of samples in certain parts of the data manifold due to limitations of the data collection pipeline -, or as simply having completely different distributions. In the first case, the SB in its extreme form would increase the chances of learning to rely on spurious features -shortcuts not generalizing to the target distribution. Classic manifestations of this in vision applications are when models learn to rely mostly on textures or backgrounds instead of more complex and likely more generalizable semantic features such as using shapes (Beery et al., 2018; Ilyas et al., 2019; Geirhos et al., 2020) . In the second instance, by relying only on the simplest feature, and being invariant to more complex ones, the SB would cause confident predictions (low uncertainty) on completely OOD samples. This even if complex features Correspondence to matteo.pagliardini@epfl.ch and sp.karimireddy@berkeley.edu 1

