DATA FEEDBACK LOOPS: MODEL-DRIVEN AMPLIFICATION OF DATASET BIASES

Abstract

Datasets scraped from the internet have been critical to large-scale machine learning. Yet, its success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios -image classification, visual role-labeling, and language generation -demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems.

1. INTRODUCTION

Due to the successes of large-scale training in machine learning (He et al., 2016; Brown et al., 2020; Radford et al., 2021) , datasets derived from publicly available internet data have become indispensable to the machine learning community. For example, without relying on internet scraping, it would be cost-prohibitive to manually construct key datasets such as ImageNet (Deng et al., 2009) , The Pile (Gao et al., 2020 ), or YFCC100M (Thomee et al., 2016) . While the internet has served as a large, easily-accessible source of human generated data in the past, the growing deployment of machine learning systems puts this procedure at risk. As models begin to create and annotate a significant fraction of internet content, the utility of the internet as a data source may decrease rapidly. As an example in visual role-labeling, consider a classifier trained on public photos and their associated tags, as depicted in Figure 1 . Instead of manually tagging photos, some users may instead choose to auto-tag their photos with the model. These photos, now stored in internet history, may be scraped as training data for an updated iteration of the image-tagging model. Any systematic biases introduced by the model, such as consistently mislabeling female doctors as nurses as in Figure 1 , are now encoded into the training data. This data feedback gradually degrades the quality of the internet as a data source, since supervision becomes driven by model outputs rather than human annotation. Issues stemming from having previously model-generated content included in training data have already been encountered in machine translation (Venugopal et al., 2011) and speech recognition (Radford et al., 2022) . These concerns are especially important in situations where model predictions may exacerbate existing toxicity, harm, or other biases (Gehman et al., 2020; Zhao et al., 2017) . In such cases, a viable strategy for model developers is to weigh the benefit of updating their model to new internet content versus the cost of amplifying biases via such model-induced feedback. However, it is not yet understood when and to what degree data feedback is an issue in practice. In this work, we define the data feedback setting and carefully study how model biases change under feedback. In particular, we ask: Are there conditions that stabilize bias amplification? We answer this in the affirmative, finding that one crucial path to achieving stability guarantees is having a consistently calibrated training procedure -one that produces models with a bias similar to its training distribution. Furthermore, this form of calibration can be realistically achieved in natural experimental settings. Specifically, models that behave like samplers (i.e. replicate their training distribution well) are more likely to be calibrated and thus more stable. In addition, many prediction algorithms that do not explicitly perform sampling, such as image classifiers, fulfill this behavior through a conjectured phenomenon called Distributional Generalization (Nakkiran & Bansal, 2020) .

