LEARNING FROM OTHERS' MISTAKES: AVOIDING DATASET BIASES WITHOUT MODELING THEM

Abstract

State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We consider cases where the bias issues may not be explicitly identified, and show a method for training models that learn to ignore these problematic correlations. Our approach relies on the observation that models with limited capacity primarily learn to exploit biases in the dataset. We can leverage the errors of such limited capacity models to train a more robust model in a product of experts, thus bypassing the need to hand-craft a biased model. We show the effectiveness of this method to retain improvements in out-of-distribution settings even if no particular bias is targeted by the biased model.

1. INTRODUCTION

The natural language processing community has made tremendous progress in using pre-trained language models to improve predictive accuracy (Devlin et al., 2019; Raffel et al., 2019) . Models have now surpassed human performance on language understanding benchmarks such as Super-GLUE (Wang et al., 2019) . However, studies have shown that these results are partially driven by these models detecting superficial cues that correlate well with labels but which may not be useful for the intended underlying task (Jia & Liang, 2017; Schwartz et al., 2017) . This brittleness leads to overestimating model performance on the artificially constructed tasks and poor performance in out-of-distribution or adversarial examples. A well-studied example of this phenomenon is the natural language inference dataset MNLI (Williams et al., 2018) . The generation of this dataset led to spurious surface patterns that correlate noticeably with the labels. Poliak et al. (2018) highlight that negation words ("not", "no", etc.) are often associated with the contradiction label. Gururangan et al. (2018 ), Poliak et al. (2018 ), and Tsuchiya (2018) show that a model trained solely on the hypothesis, completely ignoring the intended signal, reaches strong performance. We refer to these surface patterns as dataset biases since the conditional distribution of the labels given such biased features is likely to change in examples outside the training data distribution (as formalized by He et al. ( 2019)). A major challenge in representation learning for NLP is to produce models that are robust to these dataset biases. Previous work (He et al., 2019; Clark et al., 2019; Mahabadi et al., 2020) has targeted removing dataset biases by explicitly factoring them out of models. These studies explicitly construct a biased model, for instance, a hypothesis-only model for NLI experiments, and use it to improve the robustness of the main model. The core idea is to encourage the main model to find a different explanation where the biased model is wrong. During training, products-of-experts ensembling (Hinton, 2002) is used to factor out the biased model. While these works show promising results, the assumption of knowledge of the underlying dataset bias is quite restrictive. Finding dataset biases in established datasets is a costly and time-consuming process, and may require access to private details about the annotation procedure, while actively re-Published as a conference paper at ICLR 2021 ducing surface correlations in the collection process of new datasets is challenging given the number of potential biases (Zellers et al., 2019; Sakaguchi et al., 2020) . In this work, we explore methods for learning from biased datasets which do not require such an explicit formulation of the dataset biases. We first show how a model with limited capacity, which we call a weak learner, trained with a standard cross-entropy loss learns to exploit biases in the dataset. We then investigate the biases on which this weak learner relies and show that they match several previously manually identified biases. Based on this observation, we leverage such limited capacity models in a product of experts ensemble to train a more robust model and evaluate our approach in various settings ranging from toy datasets up to large crowd-sourced benchmarks: controlled synthetic bias setup (He et al., 2019; Clark et al., 2019) Our contributions are the following: (a) we show that weak learners are prone to relying on shallow heuristics and highlight how they rediscover previously human-identified dataset biases; (b) we demonstrate that we do not need to explicitly know or model dataset biases to train more robust models that generalize better to out-of-distribution examples; (c) we discuss the design choices for weak learners and show trade-offs between higher out-of-distribution performance at the expense of the in-distribution performance.

2. RELATED WORK

Many studies have reported dataset biases in various settings. Examples include visual question answering (Jabri et al., 2016; Zhang et al., 2016 ), story completion (Schwartz et al., 2017) , and reading comprehension (Kaushik & Lipton, 2018; Chen et al., 2016) . Towards better evaluation methods, researchers have proposed to collect "challenge" datasets that account for surface correlations a model might adopt (Jia & Liang, 2017; McCoy et al., 2019b) . Standard models without specific robust training methods often drop in performance when evaluated on these challenge sets. While these works have focused on data collection, another approach is to develop methods allowing models to ignore dataset biases during training. Several active areas of research tackle this challenge by adversarial training (Belinkov et al., 2019a; b; Stacey et al., 2020) , example forgetting (Yaghoobzadeh et al., 2019) and dynamic loss adjustment (Cadène et al., 2019) . Previous work (He et al., 2019; Clark et al., 2019; Mahabadi et al., 2020) has shown the effectiveness of product of experts to train un-biased models. In our work, we show that we do not need to explicitly model biases to apply these de-biasing methods and can use a more general setup than previously presented. Orthogonal to these evaluation and optimization efforts, data augmentation has attracted interest as a way to reduce model biases by explicitly modifying the dataset distribution (Min et al., 2020; Belinkov & Bisk, 2018) , either by leveraging human knowledge about dataset biases such as swapping male and female entities (Zhao et al., 2018) or by developing dynamic data collection and benchmarking (Nie et al., 2020) . Our work is mostly orthogonal to these efforts and alleviates the need for a human-in-the-loop setup which is common to such data-augmentation approaches. Large pre-trained language models have contributed to improved out-of-distribution generalization (Hendrycks et al., 2020) . However, in practice, that remains a challenge in natural language processing (Linzen, 2020; Yogatama et al., 2019) and our work aims at out-of-distribution robustness without significantly compromising in-distribution performance. Finally, in parallel the work of Utama et al. ( 2020) presents a related de-biasing method leveraging the mistakes of weakened models without the need to explicitly model dataset biases. Our approach is different in several ways, in particular we advocate for using limited capacity weak learner while Utama et al. (2020) uses the same architecture as the robust model trained on a few thousands examples. We investigated the trade-off between learner's capacity and resulting performances as well as the resulting few-shot learning regime in the limit of a high capacity weak model.



, natural language inference (McCoy et al., 2019b), extractive question answering (Jia & Liang, 2017) and fact verification Schuster et al. (2019).

funding

* Supported by the Viterbi Fellowship in the Center for Computer Engineering at the Technion

