ENFORCING DELAYED-IMPACT FAIRNESS GUARANTEES

Abstract

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on people's lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term. Existing fairness-aware algorithms consider static fairness constraints, such as equal opportunity or demographic parity, but enforcing constraints of this type may result in models that have a negative long-term impact on disadvantaged individuals and communities. We introduce ELF (Enforcing Long-term Fairness), the first classification algorithm that provides high-confidence fairness guarantees in terms of long-term, or delayed, impact. Importantly, ELF solves the open problem of providing such guarantees based only on historical data that includes observations of delayed impact. Prior methods, by contrast, require prior knowledge (or an estimate) of analytical models describing the relationship between a classifier's predictions and their corresponding delayed impact. We prove that ELF satisfies delayed-impact fairness constraints with high confidence and that it is guaranteed to identify a fair solution, if one exists, given sufficient data. We show empirically, using real-life data, that ELF can successfully mitigate long-term unfairness with high confidence.

1. INTRODUCTION

Using machine learning (ML) for high-stakes applications, such as lending, hiring, and criminal sentencing, may potentially harm historically disadvantaged communities (Flage, 2018; Blass, 2019; Bartlett et al., 2021) . For example, software meant to guide lending decisions has been shown to exhibit racial bias (Bartlett et al., 2021) . Extensive research has been devoted to algorithmic approaches that promote fairness and ameliorate concerns of bias and discrimination for socially impactful applications. Most of this research has focused on the classification setting, in which an ML model must make predictions given information about a person or community. Most fairness definitions studied in the classification setting are static: they do not consider how a classifier's predictions impact the long-term well-being of a community (Liu et al., 2018) . In their seminal paper, Liu et al. show that classifiers' predictions that appear fair with respect to static fairness criteria can nevertheless negatively impact the long-term wellness of the community it aims to protect. Importantly, Liu et al., and others investigating long-term fairness (see Section 6), assume that the precise analytical relationship between a classifier's prediction and its long-term impact, or delayed impact (DI), is known. Designing classification algorithms that mitigate negative delayed impact when this relationship is not known, or cannot be computed analytically, has remained an open problem. We introduce ELF (Enforcing Long-term Fairness), the first classification algorithm that solves this open problem. ELF, unlike existing methods, does not require access to an analytic model of the delayed impact of a classifier's predictions. Instead, it works under the less strict assumption that the method only has access to historical data containing observations of the delayed impact that resulted from predictions of a previously-deployed classifier. Below, we illustrate this setting with an example. Loan repayment example. As a running example, consider a bank that wishes to increase its profit by maximizing successful loan repayments. The bank's decisions are informed by a classifier that predicts repayment success. These decisions may have a delayed impact in terms of the financial well-being of loan applicants, such as their savings rate or debt-to-income ratio, two years after a lending decision is

