LEARNING ANTIDOTE DATA TO INDIVIDUAL UNFAIR-NESS

Abstract

Fairness is an essential factor for machine learning systems deployed in high-stake applications. Among all fairness notions, individual fairness, following a consensus that 'similar individuals should be treated similarly,' is a vital notion to guarantee fair treatment for individual cases. Previous methods typically characterize individual fairness as a prediction-invariant problem when perturbing sensitive attributes, and solve it by adopting the Distributionally Robust Optimization (DRO) paradigm. However, adversarial perturbations along a direction covering sensitive information do not consider the inherent feature correlations or innate data constraints, and thus mislead the model to optimize at off-manifold and unrealistic samples. In light of this, we propose a method to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. These on-manifold antidote data can be used through a generic optimization procedure with original training data, resulting in a pure pre-processing approach to individual unfairness, or can also fit well with the in-processing DRO paradigm. Through extensive experiments, we demonstrate our antidote data resists individual unfairness at a minimal or zero cost to the model's predictive utility.

1. INTRODUCTION

Unregulated decisions could reflect racism, ageism, and sexism in high-stakes applications, such as grant assignments (Mervis, 2022) , recruitment (Dastin, 2018) , policing strategies (Gelman et al., 2007) , and lending services (Bartlett et al., 2022) . To avoid societal concerns, fairness, as one of the fundamental ethical guidelines for AI, has been proposed to encourage practitioners to adopt AI responsibly and fairly. The unifying idea of fairness articulates that ML systems should not discriminate against individuals or any groups segmented by legally-protected and sensitive attributes, therefore preventing disparate impact in automated decision-making (Barocas & Selbst, 2016) . Many notions have been proposed to specify AI Fairness (Dwork et al., 2012; Kusner et al., 2017; Hashimoto et al., 2018) . Group fairness is currently the most influential notion in the fairness community, driving different groups to receive equitable outcomes regardless of their sensitive attributes, in terms of statistics like true positive rates or positive rates (Hardt et al., 2016) . However, these statistics describe the average of a group, hence lacking guarantees on the treatments of individual cases. Alternatively, individual fairness established upon a consensus that 'similar individuals should be treated similarly,' shift force to reduce the predictive gap between conceptually similar instances. Here, 'similar' means two instances have close profiles regardless of their different sensitive attributes, and usually have customized definitions upon domain knowledge. We invite readers to look into Section 2 for a more concrete establishment on individual fairness. Previous methods solve the individual fairness problem mainly by Distributionally Robust Optimization (DRO) (Yurochkin et al., 2020; Yurochkin & Sun, 2021; Ruoss et al., 2020; Yeom & Fredrikson, 2021) . They convert the problem to optimize models for invariant predictions towards original data and their perturbations, where the perturbations are adversarially constructed to mostly change the sensitive information in a sample. However, one use case of DRO in model robustness is to adversarially perturb a sample by a small degree. The perturbations can be regarded as local perturbations, and the adversarial sample is still on the data manifold. In contrast, perturbing a sample for individual fairness purposes, e.g., directly flipping its sensitive attributes like gender from male to female, cannot be regarded as a local perturbation. These perturbations may violate inherent feature cor-relations, e.g., some features are subject to gender but without notice, thus driving the adversarial samples leaving the data manifold. Additionally, perturbations in a continuous space could break the innate constraints from tabular, e.g., discrete features should be in a one-hot format. Consequently, these adversarial samples for fairness are unrealistic and do not match the data distribution. Taking these data can result in sub-optimal tradeoffs between utility and individual fairness. In this work, we address the above limitations and propose an approach to rectify models for individual fairness from a pure data-centric perspective. Following the high-level idea of the DRO paradigm, and by giving a concrete setup for similar samples, we learn the data manifold through generative models, and continue to construct on-manifold samples with different sensitive attributes as antidote data to mitigate individual unfairness. We launch two ways to use the generated antidote data: simply inserting antidote data into the original training set and training models through regular optimization, or equipping antidote data to the DRO pipeline as an in-processing approach. Our approach works for multiple sensitive attributes, and each sensitive attribute can have multiple values. We conduct experiments on census, criminological, and educational datasets, compared to standard classifiers and several baseline methods. Compared to baseline methods, our method greatly mitigates individual unfairness, and has minimal or zero side effects to model utility.

2. INVIDIVUAL FAIRNESS AND COMPARABLE SAMPLES

Notations Let f θ denote a parameterized probabilistic classifier, X and Y denote input and output space with instance x and label y, respectively. For tabular datasets, we assume every input instance x contains three parts of features: sensitive features s = [s 1 , s 2 , • • • , s Ns ], continuous features c = [c 1 , c 2 , • • • , c Nc ], and discrete features d = [d 1 , d 2 , • • • , d N d ], with N denoting the number of features in each parts. We assume these three parts of features are exclusive, i.e., s, c, and d do not share any feature or column. We use d x to denote the discrete features of instance x, and the same manner for other features. For simplification we shall assume discrete features d contain categorical features before one-hot encoding, continuous features c contain features in a unified range like [0, 1] after some scaling operations, and all data has the same feature dimension. We consider sensitive attributes in a categorical format. Any continuous sensitive attribute can be binned into discrete intervals to fit our scope. We use ⊕ to denote vector-vector or vector-scalar concatenation. Individual Fairness: Concept and Practical Usage The concept of individual fairness is firstly raised in Dwork et al. (2012) . Following a consensus that 'similar individuals should be treated similarly,' the problem is formulated as a Lipschitz mapping problem. Formally, for arbitrary instances x and x ′ ∈ X , individual fairness is defined as a (D X , D Y )-Lipschitz property of a classifier f θ : D Y (f θ (x), f θ (x ′ )) ≤ D X (x, x ′ ), where D X (•, •) and D Y (•, •) are some distance functions respectively defined in the input space X and output space Y, and shall be customized upon domain knowledge. However, for a general problem, it could be demanding to carry out a concrete and interpretable D X (•, •) and D Y (•, •), hence makes individual fairness impractical in many applications. To simplify this problem from a continuous Lipschitz constraint, some works evaluate individual fairness of models with a binary distance function: D X (x, x ′ ) = 0 for two different samples x and x ′ if they are exactly the same except sensitive attributes, i.e., c = c ′ , d = d ′ , and s ̸ = s ′ (Yurochkin et al., 2020; Yurochkin & Sun, 2021) . Despite the interpretability, this constraint can be too harsh to find sufficient comparable samples since other attributes may correlate with sensitive attributes. For empirical studies, these studies can only simulate the experiments with semi-synthetic data where they flip one's sensitive attribute to construct a sample and evaluate the predictive gap. Note that for tabular data, simply discarding the sensitive attributes could be a perfectly individually fair solution to this simulation. In this work, we consider a relaxed version of the above individual fairness definition for an imperfect classifier. We present Definition 2.1 to characterize in what conditions we shall consider two samples are comparable. When two samples x and x ′ are coming to be comparable, their predictive gap |f θ (x) -f θ (x ′ )| should be minimized for the individual fairness purpose. Definition 2.1 (comparable samples). Given T d , T c ∈ R ≥0 , x and x ′ are comparable iff all constraints are satisfied: 1. N d i=1 1{d i ̸ = d ′ i } ≤ T d ; 2. max{|c i -c ′ i |} ≤ T c , ∀ 1 ≤ i ≤ N c ; and 3. y = y ′ . Remark 2.1. For some thresholds T d and T c , two samples are considered as comparable iff 1. there are at most T d features differing in discrete features; 2. the largest disparity among all continuous features is smaller or equal to T c , and 3. two samples have the same ground-truth label.

