THERE IS NO TRADE-OFF: ENFORCING FAIRNESS CAN IMPROVE ACCURACY

Abstract

One of the main barriers to the broader adoption of algorithmic fairness in machine learning is the trade-off between fairness and performance of ML models: many practitioners are unwilling to sacrifice the performance of their ML model for fairness. In this paper, we show that this trade-off may not be necessary. If the algorithmic biases in an ML model are due to sampling biases in the training data, then enforcing algorithmic fairness may improve the performance of the ML model on unbiased test data. We study conditions under which enforcing algorithmic fairness helps practitioners learn the Bayes decision rule for (unbiased) test data from biased training data. We also demonstrate the practical implications of our theoretical results in real-world ML tasks.

1. INTRODUCTION

Machine learning (ML) models are routinely used to make or support consequential decisions in hiring, lending, sales etc. (Citron and Pasquale, 2014) . This proliferation of ML models in decision making and decision support roles has led to concerns that ML models may inherit (or even exacerbate) social biases in the training data. For example, Pro-Publica's investigation of Northpointe (now Equivant)'s COMPAS recidivism prediction tool revealed racial biases against African-Americans (Angwin et al., 2016) . In response, the ML community has developed many rigorous definitions of algorithmic fairness, including calibration (Corbett-Davies and Goel, 2018), (statistical) parity (Feldman et al., 2014 ), equalized odds (Hardt et al., 2016) , and individual fairness (Dwork et al., 2011) . Researchers have also designed many algorithms for enforcing the definitions during training (Agarwal et al., 2018; Cotter et al., 2019; Yurochkin et al., 2020) . Despite this flurry of work, algorithmic fairness practices remain uncommon in production. We conjecture that the lack of broader adoption of algorithmic fairness practices is because there seems to be a trade-off between accuracy and fairness. Many algorithms that enforce fairness solve optimization problems that maximize how well the model fits the training data subject to fairness constraints. The trade-off arises because imposing fairness constraints usually leads to a model that fits the training data less well (compared to a model from maximizing goodness-of-fit without any extra constraints). In practice, this trade-off may not be relevant because the training data may be biased. For example, a resume screening model may reject most female applicants for technical roles because women are historically underrepresented in STEM fields, so women are underrepresented in the training data. This is a form of sampling bias, and it causes the model to perform poorly at test time because women are better represented in STEM fields today. In this example, the trade-off is irrelevant because we are mostly concerned with out-of-distribution (OOD) performance of the model. There are many other examples of algorithmic bias arising due to biases in the training data. As another example, the systemic racism in the US criminal justice system disproportionately affects African-Americans, leading to higher rates of arrest, conviction, and incarceration. It is no surprise that recidivism prediction instruments trained on such biased data is biased against African-Americans (Angwin et al., 2016). In 2014, then U.S. Attorney General Eric Holder warned that recidivism prediction instruments "may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society". In this paper, we study whether the common algorithmic fairness practice of enforcing equal accuracy on certain segments of the population improves the OOD performance of the model. Such algorithmic fairness practices are common enough that there are methods (Agarwal et al., 2018; 2019) and software (e.g. TensorFlow Constrained Optimization (?)) devoted to operationalizing them. This provides an alternative argument for broader adoption of algorithmic fairness practices. Instead of viewing fairness as an intrinsically desirable property of ML models, we show that enforcing fairness helps ML models overcome biases in the training data. Our main contributions are: 1. We decompose the bias in the training data into two parts: a recoverable part orthogonal to the fair constraint and a non-recoverable part. We also derive necessary and sufficient conditions under which enforcing fairness on the training data leads to the Bayes optimal model at test time (see Theorem 3.4). 2. We show that it is possible to completely overcome the recoverable part of the bias (hence its name) by enforcing an appropriate risk-based notion of algorithmic fairness. This is possible regardless of the magnitude of this part of the bias (see Corollary 3.5). 3. We specialize our results to recidivism prediction task and demonstrate the benefits of enforcing fairness empirically (see section 4).

2. PROBLEM SETUP

To keep things simple, we consider a standard classification setup. Our results generalize readily to other supervised learning problems (see Appendix C for details). Let X ⊂ R d be the feature space, Y be the set of possible labels, and A be the set of possible values of the sensitive attribute. In this setup, training and test examples are tuples of the form (X, A, Y ) ∈ X × A × Y. If the ML task is predicting whether a borrower will default on a loan, then each training/test example corresponds to a loan. The features in X may include the borrower's credit history, income level, and outstanding debts; the label Y ∈ {0, 1} encodes whether the borrower defaulted on the loan; the sensitive attribute may be the borrower's gender or race. Let P * and P be probability distributions on X ×A×Y. We consider P * as the unbiased distribution from which samples at test time come from and P as the biased distribution from which the training data comes from. Let H = {h : X → Y} be a model class (e.g. neural nets with a particular architecture) and be a loss function. Our goal is to learn the unbiased Bayes decision rule h * ∈ arg min h∈H L * (h) E * (h(X), Y ) , where E * denotes expectation with respect to P * , using only the biased training data from P . Without further assumptions on P * , this goal is impossible. To facilitate our goal, we assume the unbiased Bayes decision rule is algorithmically fair in some sense and hope that enforcing the correct notion of fairness allows us to recover h * from P . We shall elaborate on the allowable differences between P * and P in subsection 2.2.

2.1. RISK-BASED NOTIONS OF ALGORITHMIC FAIRNESS

In this paper, we study the efficacy of enforces risk-based notions of algorithmic fairness in overcoming bias in the training data. To fix ideas, we provide two examples of risk-based notions of algorithmic fairness. The first notion of algorithmic fairness we consider is risk parity (RP). This definition is motivated by the notion of demographic parity (DP) in classification. Recall DP requires the output of the ML model h(X) to be independent of the sensitive attribute A: h(X) ⊥ A. RP imposes a similar condition on the risk of the ML model. Definition 2.1 (risk parity). ML model h satisfies risk parity with respect to data distribution P if E P (h(X), Y ) | A = a = E P (h(X), Y ) | A = a for all a, a ∈ A. RP is widely used in practice to measure algorithmic bias in ML models. For example, the US National Institute of Standards and Technology (NIST) tested facial recognition systems and found

