LEARNING ANTIDOTE DATA TO INDIVIDUAL UNFAIR-NESS

Abstract

Fairness is an essential factor for machine learning systems deployed in high-stake applications. Among all fairness notions, individual fairness, following a consensus that 'similar individuals should be treated similarly,' is a vital notion to guarantee fair treatment for individual cases. Previous methods typically characterize individual fairness as a prediction-invariant problem when perturbing sensitive attributes, and solve it by adopting the Distributionally Robust Optimization (DRO) paradigm. However, adversarial perturbations along a direction covering sensitive information do not consider the inherent feature correlations or innate data constraints, and thus mislead the model to optimize at off-manifold and unrealistic samples. In light of this, we propose a method to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. These on-manifold antidote data can be used through a generic optimization procedure with original training data, resulting in a pure pre-processing approach to individual unfairness, or can also fit well with the in-processing DRO paradigm. Through extensive experiments, we demonstrate our antidote data resists individual unfairness at a minimal or zero cost to the model's predictive utility.

1. INTRODUCTION

Unregulated decisions could reflect racism, ageism, and sexism in high-stakes applications, such as grant assignments (Mervis, 2022) , recruitment (Dastin, 2018) , policing strategies (Gelman et al., 2007) , and lending services (Bartlett et al., 2022) . To avoid societal concerns, fairness, as one of the fundamental ethical guidelines for AI, has been proposed to encourage practitioners to adopt AI responsibly and fairly. The unifying idea of fairness articulates that ML systems should not discriminate against individuals or any groups segmented by legally-protected and sensitive attributes, therefore preventing disparate impact in automated decision-making (Barocas & Selbst, 2016). Many notions have been proposed to specify AI Fairness (Dwork et al., 2012; Kusner et al., 2017; Hashimoto et al., 2018) . Group fairness is currently the most influential notion in the fairness community, driving different groups to receive equitable outcomes regardless of their sensitive attributes, in terms of statistics like true positive rates or positive rates (Hardt et al., 2016) . However, these statistics describe the average of a group, hence lacking guarantees on the treatments of individual cases. Alternatively, individual fairness established upon a consensus that 'similar individuals should be treated similarly,' shift force to reduce the predictive gap between conceptually similar instances. Here, 'similar' means two instances have close profiles regardless of their different sensitive attributes, and usually have customized definitions upon domain knowledge. We invite readers to look into Section 2 for a more concrete establishment on individual fairness. Previous methods solve the individual fairness problem mainly by Distributionally Robust Optimization (DRO) (Yurochkin et al., 2020; Yurochkin & Sun, 2021; Ruoss et al., 2020; Yeom & Fredrikson, 2021) . They convert the problem to optimize models for invariant predictions towards original data and their perturbations, where the perturbations are adversarially constructed to mostly change the sensitive information in a sample. However, one use case of DRO in model robustness is to adversarially perturb a sample by a small degree. The perturbations can be regarded as local perturbations, and the adversarial sample is still on the data manifold. In contrast, perturbing a sample for individual fairness purposes, e.g., directly flipping its sensitive attributes like gender from male to female, cannot be regarded as a local perturbation. These perturbations may violate inherent feature cor-

