SENSEI: SENSITIVE SET INVARIANCE FOR ENFORC-ING INDIVIDUAL FAIRNESS

Abstract

In this paper, we cast fair machine learning as invariant machine learning. We first formulate a version of individual fairness that enforces invariance on certain sensitive sets. We then design a transport-based regularizer that enforces this version of individual fairness and develop an algorithm to minimize the regularizer efficiently. Our theoretical results guarantee the proposed approach trains certifiably fair ML models. Finally, in the experimental studies we demonstrate improved fairness metrics in comparison to several recent fair training procedures on three ML tasks that are susceptible to algorithmic bias.

1. INTRODUCTION

As machine learning (ML) models replace humans in high-stakes decision-making and decisionsupport roles, concern regarding the consequences of algorithmic bias is growing. For example, ML models are routinely used in criminal justice and welfare to supplement humans, but they may have racial, class, or geographic biases (Metz & Satariano, 2020) . In response, researchers proposed many formal definitions of algorithmic fairness as a first step towards combating algorithmic bias. Broadly speaking, there are two kinds of definitions of algorithmic fairness: group fairness and individual fairness. In this paper, we focus on enforcing individual fairness. At a high-level, the idea of individual fairness is the requirement that a fair algorithm should treat similar individuals similarly. Individual fairness was dismissed as impractical because there is no consensus on which users are similar for many ML tasks. Fortunately, there is a flurry of recent work that addresses this issue (Ilvento, 2019; Wang et al., 2019; Yurochkin et al., 2020; Mukherjee et al., 2020) . In this paper, we assume there is a similarity metric for the ML task at hand and consider the task of enforcing individual fairness. Our main contributions are: 1. we define distributional individual fairness, a variant of Dwork et al.'s original definition of individual fairness that is (i) more amenable to statistical analysis and (ii) easier to enforce by regularization; 2. we develop a stochastic approximation algorithm to enforce distributional individual fairness when training smooth ML models; 3. we show that the stochastic approximation algorithm converges and the trained ML model generalizes under standard conditions; 4. we demonstrate the efficacy of the approach on three ML tasks that are susceptible to algorithmic bias: income-level classification, occupation prediction, and toxic comment detection.

2. ENFORCING INDIVIDUAL FAIRNESS WITH SENSITIVE SET INVARIANCE (SENSEI) 2.1 A TRANSPORT-BASED DEFINITION OF INDIVIDUAL FAIRNESS

Let X and Y be the space of inputs and outputs respectively for the supervised learning task at hand. For example, in classification tasks, Y may be the probability simplex. An ML model is a function h : X → Y in a space of functions H (e.g. the set of all neural nets with a certain architecture).  d Y (h(x), h(x )) ≤ Ld X (x, x ) (2.1) for all x, x ∈ X . The choice of d Y is often determined by the form of the output. For example, if the ML model outputs a vector of the logits, then we may pick the Euclidean norm as d Y (Kannan et al., 2018; Garg et al., 2018) . The metric d X is the crux of (2.1) because it encodes our intuition of which inputs are similar for the ML task at hand. For example, in natural language processing tasks, d X may be a metric on word/sentence embeddings that ignores variation in certain sensitive directions. In light of the importance of the similarity metric in (2.1) to enforcing individual fairness, there is also a line of work on learning the similarity metric from data (Ilvento, 2019; Wang et al., 2019; Mukherjee et al., 2020) . Let , δ > 0 be tolerance parameters and ∆(X × X ) be the set of probability measures on X × X . Define R(h)      sup Π∈∆(X ×X ) E Π d Y (h(X), h(X )) subject to E Π d X (X, X ) ≤ Π(•, X ) = P X      . (2.2) where P X is the (marginal) distribution of the inputs in the ML task at hand. An ML model h is ( , δ)-distributionally individually fair (DIF) iff R(h) ≤ δ. We remark that DIF only depends on h and P X . It does not depend on the (conditional) distribution of the labels P Y |X , so it does not depend on the performance of the ML model. In other words, it is possible for a model to perform poorly and be perfectly DIF (e.g. the constant model h(x) = 0). The optimization problem in (2.2) formalizes correspondence studies in the empirical literature (Bertrand & Duflo, 2016) . Here is a prominent example. Example 2.2. Bertrand & Mullainathan studied racial bias in the US labor market. The investigators responded to help-wanted ads in Boston and Chicago newspapers with fictitious resumes. To manipulate the perception of race, they randomly assigned African-American or white sounding names to the resumes. The investigators concluded there is discrimination against African-Americans because the resumes assigned white names received 50% more callbacks for interviews than the resumes. We view Bertrand & Mullainathan's investigation as evaluating the objective in (2.3) at a special T . Let X be the space of resumes, and h : X → {0, 1} be the decision rule that decides whether a resume receives a callback. Bertrand & Mullainathan implicitly pick the T that reassigns the name on a resume from an African-American sounding name to a white one (or vice versa) and measures discrimination with the difference between callback rates before and after reassignment:  E P 1{h(X) = h(T (X))} = P{h(X) = h(T (X))}.



Dwork et al. (2011)  define individual fairness as L-Lipschitz continuity of an ML model h with respect to appropriate metrics on X and Y:

We consider distributional individual fairness a variant of Dwork et al.'s original definition. It is not a fundamentally new definition of algorithmic fairness because it encodes the same intuition as Dwork et al.'s original definition. Most importantly, it remains an individual notion of algorithmic fairness because it compares individuals to similar (close in d X ) individuals. 2.2 DIF VERSUS INDIVIDUAL FAIRNESS It is hard to directly compare Dwork et al.'s original definition of individual fairness and DIF directly. First, they are parameterized differently: the original definition (2.1) is parameterized by a Lipschitz

. In our experiments, we adapt the methods fromYurochkin et al. (2020)  to learn similarity metrics.Although intuitive, individual fairness is statistically and computationally intractable. Statistically, it is generally impossible to detect violations of individual fairness on zero measure subset of the sample space. Computationally, individual fairness is a Lipschitz restriction, and such restrictions are hard to enforce. In this paper, we address both issues by lifting (2.1) to the space of probability distributions on X to obtain an "average case" version of individual fairness. This version (i) is more amenable to statistical analysis, (ii) is easy to enforce by minimizing a data-dependent regularizer, and (iii) preserves the intuition behind Dwork et al. (2011)'s original definition of individual fairness. Definition 2.1 (distributional individual fairness (DIF))

