SENSEI: SENSITIVE SET INVARIANCE FOR ENFORC-ING INDIVIDUAL FAIRNESS

Abstract

In this paper, we cast fair machine learning as invariant machine learning. We first formulate a version of individual fairness that enforces invariance on certain sensitive sets. We then design a transport-based regularizer that enforces this version of individual fairness and develop an algorithm to minimize the regularizer efficiently. Our theoretical results guarantee the proposed approach trains certifiably fair ML models. Finally, in the experimental studies we demonstrate improved fairness metrics in comparison to several recent fair training procedures on three ML tasks that are susceptible to algorithmic bias.

1. INTRODUCTION

As machine learning (ML) models replace humans in high-stakes decision-making and decisionsupport roles, concern regarding the consequences of algorithmic bias is growing. For example, ML models are routinely used in criminal justice and welfare to supplement humans, but they may have racial, class, or geographic biases (Metz & Satariano, 2020) . In response, researchers proposed many formal definitions of algorithmic fairness as a first step towards combating algorithmic bias. Broadly speaking, there are two kinds of definitions of algorithmic fairness: group fairness and individual fairness. In this paper, we focus on enforcing individual fairness. At a high-level, the idea of individual fairness is the requirement that a fair algorithm should treat similar individuals similarly. Individual fairness was dismissed as impractical because there is no consensus on which users are similar for many ML tasks. Fortunately, there is a flurry of recent work that addresses this issue (Ilvento, 2019; Wang et al., 2019; Yurochkin et al., 2020; Mukherjee et al., 2020) . In this paper, we assume there is a similarity metric for the ML task at hand and consider the task of enforcing individual fairness. Our main contributions are: 1. we define distributional individual fairness, a variant of Dwork et al.'s original definition of individual fairness that is (i) more amenable to statistical analysis and (ii) easier to enforce by regularization; 2. we develop a stochastic approximation algorithm to enforce distributional individual fairness when training smooth ML models; 3. we show that the stochastic approximation algorithm converges and the trained ML model generalizes under standard conditions; 4. we demonstrate the efficacy of the approach on three ML tasks that are susceptible to algorithmic bias: income-level classification, occupation prediction, and toxic comment detection.

2. ENFORCING INDIVIDUAL FAIRNESS WITH SENSITIVE SET INVARIANCE (SENSEI) 2.1 A TRANSPORT-BASED DEFINITION OF INDIVIDUAL FAIRNESS

Let X and Y be the space of inputs and outputs respectively for the supervised learning task at hand. For example, in classification tasks, Y may be the probability simplex. An ML model is a function h : X → Y in a space of functions H (e.g. the set of all neural nets with a certain architecture).

