SELECTION INDUCED COLLIDER BIAS IN LLMS: A GENDER PRONOUN UNCERTAINTY CASE STUDY

Abstract

In this paper, we cast the problem of task underspecification in causal terms, and develop a method for empirical measurement of spurious associations between gender and gender-neutral entities for unmodified LLMs, detecting previously unreported spurious correlations. We then describe a lightweight method to exploit the resulting spurious associations for prediction task uncertainty classification, achieving over 90% accuracy on a Winogender Schemas challenge set. Finally, we generalize our approach to address a wider range of prediction tasks and provide open-source demos for each method described here.

1. INTRODUCTION AND RELATED WORK

This paper investigates models trained to estimate the conditional distribution: P (Y |X, S), where S is the cause of sample selection bias in the training dataset. Selection bias is not an uncommon problem, as most datasets are subsampled representations of a larger population, yet few are sampled with randomization (Heckman, 1979) .

1.1. CAUSAL DAGS AND BIASES

Sample selection bias occurs when some mechanism, observed or not, causes preferential inclusion of samples into the dataset (Bareinboim and Pearl, 2012) . Employing the language of causal inference, selection bias is distinct from both confounder and collider bias. Confounder bias can occur when two variables have a common cause, whereas collider bias can occur when two variables have a common effect. Correcting for confounding bias requires that one condition upon the common cause variable; conversely correcting for collider bias requires that one does not condition upon the common effect (Pearl, 2009) . The type of selection bias that interests us here is that which involves more than one variable (observed or not), whose common effect results in selection bias. Such assumed relationships can be compactly and transparently represented as a causal data-generating process (DGP) in the form of a directed acyclic graph (DAG), for example illustrated in Figure 1 . The absence of arrows connecting nodes in causal DAGs encodes assumptions, for example that W and G in Figure 1 (a) are stochastically independent of one another. The direction of the arrowhead encodes our assumptions about the direction of causation. For example, the two arrows departing from W and G toward S encode the assumption that S is a common effect of W and G. In Figure 1 , the twice-encircled node, S, symbolizes some mechanism that can cause samples to be selected into the dataset. To capture the statistical process of sampling for dataset formation, one must condition on S, thus inducing the collider bias relationship between W and G into the DGP. We will use the term selection induced collider bias to refer to circumstances such as this one, when the selection bias mechanism induces a collider bias relationship in the dataset that would not have been there otherwisefoot_0 . Selection induced collider bias has been covered in medical and epidemiological literature (Griffith et al., 2020 ) (Munafò et al., 2018 ) (Cole et al., 2009) and received extensive theoretical treatment from Pearl and Bareinboim in (Bareinboim and Pearl, 2012), (Bareinboim et al., 2014) , (Bareinboim and Tian, 2015) and (Bareinboim and Pearl, 2016 ), yet has received very little attention in deep learning literature.

1.2. UNDERSPECIFICATION AND SPURIOUS ASSOCIATIONS

We define a learning task as underspecified when none of the features available to the model (at training or inference time) are causes of the label. Figure 1 (b) encodes this relationship with the absence of an arrow between features, X, and labels, Y. With no causal features available, models must resort to learning any spurious associations that will reduce predictive risk, regardless of how tenuous the association may be. We refer to these as otherwise non-interacting spurious associations. We would like to draw a distinction between the type of spurious association induced by underspecification, and the spurious associations most often addressed in today's literature. For example, the task of predicting cow vs camel (perhaps based on spurious grassy vs sandy background pixel features), would not be considered an underspecified task, due to the availability of the causal cow vs camel pixel features in the foreground. From a causal perspective, the symbolic background entity is a common cause of both the pixel features and the labels, inducing confounder bias and thus the learning of spurious associations along a secondary path (Arjovsky et al., 2019) , in addition to the primary direct causal path from feature to label. A natural question to ask is, how does spurious association flow from X to Y, if not through some confounding variable like background, nor though a direct causal path. As demonstrated in (D'Amour et al., 2020) , weakly-interacting prediction tasks display significant variance, even due to changes in the random seed initialization. In this work, by focusing on variables engaged in a relationship of selection induced collider bias, we are able to open up a tertiary path between X and Y : the path along X ← W → S ← G → Y in Figure 1(b) . In distinction to (D'Amour et al., 2020) , we argue this causal perspective facilitates the identification of otherwise non-interacting (and previously unreported) spurious associations, and importantly enables the injection of these 'benign' spurious tokens into text at inference time, to achieve an uncertainty measurement.

2. CONTRIBUTIONS

In this paper we make the following contributions: • We cast the problem of task underspecification in causal terms and apply causal inference methods to hypothesize the effects of selection induced collider bias on underspecified tasks. • We test these hypotheses on unmodified and widely used pre-trained LLMs via a case study of gender pronoun resolution, resulting in two new findings: -A method for empirical measurement of spurious correlations between gender and genderneutral entities for unmodified LLMs which permits measurement of previously unreported spurious correlations between gender vs location and time. -A method for quantifying inference-time task uncertainty with an accuracy of over 90% when testing RoBERTa-large with the Winogender Schema challenge test set. • To demonstrate that both above methods are reproducible, lightweight (dozens of lines of code), time-efficient (takes seconds), and plug-n-play compatible with almost any BERT-like LLM, we provide open-source and running demos: -Spurious Correlations: https://huggingface.co/spaces/paper5186/spurious. -Uncertainty: https://huggingface.co/spaces/paper5186/uncertainty. • We generalize our approach to address a wider range of prediction tasks and provide results on a generic DGP that are consistent with our empirically measured results on LLMs.



Although conflated, collider bias can occur independent of selection bias and vice versa (Hernán, 2017).



Figure 1: Data generating process for high dimensional data, such as in NLP, where X and Y represent high dimensional text features: the dataset features and labels, while W, G, and S represent low dimensional symbolic entities that may cause the text.

