UNFAIR GEOMETRIES: EXACTLY SOLVABLE DATA MODEL WITH FAIRNESS IMPLICATIONS

Abstract

Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In the present work, we aim at clarifying the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Despite the simplicity of the data model, we retrace and unpack typical unfairness behaviour observed on real-world datasets. We also obtain a detailed analytical characterisation of a class of bias mitigation strategies. We first consider a basic loss-reweighing scheme, which allows for an implicit minimisation of different unfairness metrics, and quantify the incompatibilities between some existing fairness criteria. Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models. Our theoretical analysis of this approach shows that the coupled strategy can strike superior fairness-accuracy trade-offs.

1. INTRODUCTION

Machine Learning (ML) systems are actively being integrated in multiple aspects of our lives, from face recognition systems on our phones, to applications in the fashion industry, to high stake scenarios like healthcare. Together with the advantages of automatising these processes, however, we must also face the consequences of their -often hidden -failures. Recent studies Buolamwini & Gebru (2018) ; Weidinger et al. (2021) have shown that these systems may have significant disparity in failure rates across the multiple sub-populations targeted in the application. ML systems appear to perpetuate discriminatory behaviours that align with those present in our society Benjamin (2019); Noble (2018); Eubanks (2018); Broussard (2018). Discrimination over marginalised groups could originate at many levels in the ML pipeline, from the very problem definition, to data collection, to the training and deployment of the ML algorithm Suresh & Guttag (2021). Data represents a critical source of bias Perez (2019). In some cases, the dataset can contain a record of a history of discriminatory behaviour, causing complex dependencies that are hardly eradicated even when the explicit discriminatory attribute is removed. In other cases (or even concurrently), the root of the discrimination can be found in the data collection process, and is related to the structural properties of the dataset. Heterogeneous representations of different sub-populations typically induce major bias in the ML predictions. Drug testing provides a historically significant example: substantial evidence Hughes (2007); Perez (2019) shows that the scarcity of data points corresponding to women individuals in drug-efficiency studies resulted in a larger number of side effects in their group. In spite of a vast empirical literature, a large gap remains in the theoretical understanding of the bias-induction mechanism. A better theoretical grasp of this issue could help raise awareness and design more theoretically grounded and effective solutions. In this work, we aim to address this gap by introducing a novel synthetic data model, offering a controlled setting where data imbalances and the emergence of bias become more transparent and can be better understood. To the best of our knowledge, the present study constitutes the first attempt to explore and exactly characterise by analytical means the complex phenomenology of ML fairness. Summary of main results. We devise a novel synthetic model of data, the Teacher-Mixture (T-M), to obtain a theoretical analysis of the bias-induction mechanism. The geometrical properties of the model are motivated by common observations on the data structure in realistic datasets, concerning the coexistence of non-trivial correlations at the level of the inputs and between inputs and labels (some empirical observations can be found in appendix B). In particular, we focus on the role played by the presence of different sub-populations in the data, both from the point of view of the input distribution and from that of the labelling rule. Surprisingly, this simple structural feature is sufficient for producing a rich and realistic ML fairness phenomenology. The parameters of the T-M can be tuned to emulate disparate learning regimes, allowing for an exploration of the impact of each bias-inducing factor and for an assessment of the effectiveness of a tractable class of mitigation strategies. In summary, in the present work we: • Derive, through a statistical physics approach, an analytical characterisation of the typical performance of solutions of the T-M problem in the high-dimensional limit. The obtained learning curves are found to be in perfect agreement with numerical simulations in the same synthetic settings (as shown in the central panel in Fig. 1 ), and produce unfairness behaviours that are closely reminiscent of the results seen on real data. • Isolate the different sources of bias (shown in the left panel of Fig. 1 ) and evaluate their interplay in the bias-induction mechanism. This analysis also allows us to highlight how unfairness can emerge in settings where the data distribution is apparently balanced. • Trace a positive transfer effect between the different sub-populations, which implies that, despite their distinctions, an overall similarity can be exploited for achieving better performance on each group. • Analyse the trade-offs between the different definitions of fairness, by studying the effects of a sample reweighing mitigation strategy, which can be encompassed in the theoretical framework proposed in this work and thus characterised analytically. • Propose a model-matched mitigation strategy, where two coupled networks are simultaneously trained and can specialise on different sub-populations while mutually transferring useful information. We analytically characterise its effectiveness, finding that with this method, in the T-M, the competition between accuracy and different fairness metrics becomes negligible. Preliminary positive results are also reported on real data. Further related works. In the past decade, algorithmic fairness has been receiving growing attention, spurred by the increasing number of ML applications in highly consequential social and economic areas Datta et al. ( 2015 



); Metz & Satariano (2020); Angwin et al. (2016). A central question in the field is on the proper mathematical definition of bias: the plethora of alternative fairness criteria includes measures of group fairness, e.g. statistical parity Corbett-Davies et al. (2017); Dwork et al. (2012); Kleinberg et al. (2016), disparate impact Calders & Verwer (2010); Feldman et al. (2015); Zafar et al. (2017b); Chouldechova (2017), equality of opportunity Hardt et al. (2016), calibration within groups Kleinberg et al. (2016), disparate mistreatment Zafar et al. (2017a), as well as measures of individual fairness Speicher et al. (2018); Castelnovo et al. (2022). We focus on group fairness in the following, since it is well-defined also in the high-dimensional limit considered in our theoretical framework. Recent works have highlighted incompatibilities between some of these fairness measures Kleinberg et al. (2016); Corbett-Davies & Goel (2018); Barocas et al. (2019), e.g. calibration and error disparity Pleiss et al. (2017), and their instability with respect to fluctuations in the training dataset Friedler et al. (2019); Castelnovo et al. (2022). Our work is the first to allow an exact quantification of the intrinsic trade-offs between these notions of group-fairness. A second major topic in the field of algorithmic fairness is that of bias mitigation. In this work, we focus on in-processing strategies Arrieta et al. (2020), where the training process is altered in order to include fairness as a secondary optimisation objective for the learning model. These methods range from including ad hoc regularisation terms to the loss function Kamishima et al. (2012); Huang & Vishnoi (2019), to formulating fair classification as a constrained optimisation problem and deriving reduction-based algorithms Agarwal et al. (2018; 2019); Celis et al. (2019). Other possible strategies include adversarial training Zhang et al. (2018), where a fairness-arbiter model can drive learning towards a sough fairness criterion, and distributionally robust optimisation Słowik & Bottou (2021),

