LEARNING ANTIDOTE DATA TO INDIVIDUAL UNFAIR-NESS

Abstract

Fairness is an essential factor for machine learning systems deployed in high-stake applications. Among all fairness notions, individual fairness, following a consensus that 'similar individuals should be treated similarly,' is a vital notion to guarantee fair treatment for individual cases. Previous methods typically characterize individual fairness as a prediction-invariant problem when perturbing sensitive attributes, and solve it by adopting the Distributionally Robust Optimization (DRO) paradigm. However, adversarial perturbations along a direction covering sensitive information do not consider the inherent feature correlations or innate data constraints, and thus mislead the model to optimize at off-manifold and unrealistic samples. In light of this, we propose a method to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. These on-manifold antidote data can be used through a generic optimization procedure with original training data, resulting in a pure pre-processing approach to individual unfairness, or can also fit well with the in-processing DRO paradigm. Through extensive experiments, we demonstrate our antidote data resists individual unfairness at a minimal or zero cost to the model's predictive utility.

1. INTRODUCTION

Unregulated decisions could reflect racism, ageism, and sexism in high-stakes applications, such as grant assignments (Mervis, 2022) , recruitment (Dastin, 2018) , policing strategies (Gelman et al., 2007) , and lending services (Bartlett et al., 2022) . To avoid societal concerns, fairness, as one of the fundamental ethical guidelines for AI, has been proposed to encourage practitioners to adopt AI responsibly and fairly. The unifying idea of fairness articulates that ML systems should not discriminate against individuals or any groups segmented by legally-protected and sensitive attributes, therefore preventing disparate impact in automated decision-making (Barocas & Selbst, 2016) . Many notions have been proposed to specify AI Fairness (Dwork et al., 2012; Kusner et al., 2017; Hashimoto et al., 2018) . Group fairness is currently the most influential notion in the fairness community, driving different groups to receive equitable outcomes regardless of their sensitive attributes, in terms of statistics like true positive rates or positive rates (Hardt et al., 2016) . However, these statistics describe the average of a group, hence lacking guarantees on the treatments of individual cases. Alternatively, individual fairness established upon a consensus that 'similar individuals should be treated similarly,' shift force to reduce the predictive gap between conceptually similar instances. Here, 'similar' means two instances have close profiles regardless of their different sensitive attributes, and usually have customized definitions upon domain knowledge. We invite readers to look into Section 2 for a more concrete establishment on individual fairness. Previous methods solve the individual fairness problem mainly by Distributionally Robust Optimization (DRO) (Yurochkin et al., 2020; Yurochkin & Sun, 2021; Ruoss et al., 2020; Yeom & Fredrikson, 2021) . They convert the problem to optimize models for invariant predictions towards original data and their perturbations, where the perturbations are adversarially constructed to mostly change the sensitive information in a sample. However, one use case of DRO in model robustness is to adversarially perturb a sample by a small degree. The perturbations can be regarded as local perturbations, and the adversarial sample is still on the data manifold. In contrast, perturbing a sample for individual fairness purposes, e.g., directly flipping its sensitive attributes like gender from male to female, cannot be regarded as a local perturbation. These perturbations may violate inherent feature cor-relations, e.g., some features are subject to gender but without notice, thus driving the adversarial samples leaving the data manifold. Additionally, perturbations in a continuous space could break the innate constraints from tabular, e.g., discrete features should be in a one-hot format. Consequently, these adversarial samples for fairness are unrealistic and do not match the data distribution. Taking these data can result in sub-optimal tradeoffs between utility and individual fairness. In this work, we address the above limitations and propose an approach to rectify models for individual fairness from a pure data-centric perspective. Following the high-level idea of the DRO paradigm, and by giving a concrete setup for similar samples, we learn the data manifold through generative models, and continue to construct on-manifold samples with different sensitive attributes as antidote data to mitigate individual unfairness. We launch two ways to use the generated antidote data: simply inserting antidote data into the original training set and training models through regular optimization, or equipping antidote data to the DRO pipeline as an in-processing approach. Our approach works for multiple sensitive attributes, and each sensitive attribute can have multiple values. We conduct experiments on census, criminological, and educational datasets, compared to standard classifiers and several baseline methods. Compared to baseline methods, our method greatly mitigates individual unfairness, and has minimal or zero side effects to model utility.

2. INVIDIVUAL FAIRNESS AND COMPARABLE SAMPLES

Notations Let f θ denote a parameterized probabilistic classifier, X and Y denote input and output space with instance x and label y, respectively. For tabular datasets, we assume every input instance x contains three parts of features: sensitive features s = [s 1 , s 2 , • • • , s Ns ], continuous features c = [c 1 , c 2 , • • • , c Nc ], and discrete features d = [d 1 , d 2 , • • • , d N d ], with N denoting the number of features in each parts. We assume these three parts of features are exclusive, i.e., s, c, and d do not share any feature or column. We use d x to denote the discrete features of instance x, and the same manner for other features. For simplification we shall assume discrete features d contain categorical features before one-hot encoding, continuous features c contain features in a unified range like [0, 1] after some scaling operations, and all data has the same feature dimension. We consider sensitive attributes in a categorical format. Any continuous sensitive attribute can be binned into discrete intervals to fit our scope. We use ⊕ to denote vector-vector or vector-scalar concatenation. Individual Fairness: Concept and Practical Usage The concept of individual fairness is firstly raised in Dwork et al. (2012) . Following a consensus that 'similar individuals should be treated similarly,' the problem is formulated as a Lipschitz mapping problem. Formally, for arbitrary instances x and x ′ ∈ X , individual fairness is defined as a (D X , D Y )-Lipschitz property of a classifier f θ : D Y (f θ (x), f θ (x ′ )) ≤ D X (x, x ′ ), where D X (•, •) and D Y (•, •) are some distance functions respectively defined in the input space X and output space Y, and shall be customized upon domain knowledge. However, for a general problem, it could be demanding to carry out a concrete and interpretable D X (•, •) and D Y (•, •), hence makes individual fairness impractical in many applications. To simplify this problem from a continuous Lipschitz constraint, some works evaluate individual fairness of models with a binary distance function: D X (x, x ′ ) = 0 for two different samples x and x ′ if they are exactly the same except sensitive attributes, i.e., c = c ′ , d = d ′ , and s ̸ = s ′ (Yurochkin et al., 2020; Yurochkin & Sun, 2021) . Despite the interpretability, this constraint can be too harsh to find sufficient comparable samples since other attributes may correlate with sensitive attributes. For empirical studies, these studies can only simulate the experiments with semi-synthetic data where they flip one's sensitive attribute to construct a sample and evaluate the predictive gap. Note that for tabular data, simply discarding the sensitive attributes could be a perfectly individually fair solution to this simulation. In this work, we consider a relaxed version of the above individual fairness definition for an imperfect classifier. We present Definition 2.1 to characterize in what conditions we shall consider two samples are comparable. When two samples x and x ′ are coming to be comparable, their predictive gap |f θ (x) -f θ (x ′ )| should be minimized for the individual fairness purpose. Definition 2.1 (comparable samples). Given T d , T c ∈ R ≥0 , x and x ′ are comparable iff all constraints are satisfied: 1. N d i=1 1{d i ̸ = d ′ i } ≤ T d ; 2. max{|c i -c ′ i |} ≤ T c , ∀ 1 ≤ i ≤ N c ; and 3. y = y ′ . Remark 2.1. For some thresholds T d and T c , two samples are considered as comparable iff 1. there are at most T d features differing in discrete features; 2. the largest disparity among all continuous features is smaller or equal to T c , and 3. two samples have the same ground-truth label. Definition 2.1 allows two samples to be slightly different in discrete and continuous features, and arbitrarily different in sensitive attributes. The definition is also flexible to extend if users want to enforce some crucial features to be identical for comparable samples, and this does not affect our model design. Our comparable samples are highly interpretable and semantically rich. For example, in lending data, to certify individual fairness for two samples, we can set discrete features to the history of past payment status (where value 1 indicates a complete payment, and value 0 indicates a missing payment), and continuous features to the monthly amount of bill statement. Two samples are considered to be comparable if they have a determinate difference in payment status and amount of bills. In what follows we shall build models and evaluate individual fairness by Definition 2.1, and mostly consider comparable samples with different sensitive attributes.

3. LEARNING ANTIDOTE DATA TO INDIVIDUAL UNFAIRNESS

Motivation Several methods solve the individual fairness problem through Distributionally Robust Optimization (DRO) (Yurochkin et al., 2020; Yurochkin & Sun, 2021; Ruoss et al., 2020; Yeom & Fredrikson, 2021) . The high-level idea is to optimize a model at some samples with perturbations that dramatically change their sensitive information. The solution can be summarized as: min f θ E (x,y) ℓ(f θ (x), y) and min f θ E (x,y) max x+ϵ∼DSen ℓ(f (x + ϵ), y), where the first term is standard empirical risk minimization, and the second term is for loss minimization over adversarial samples. D Sen is some customized distribution offering perturbations to specifically change one's sensitive information. For example, Yurochkin et al. (2020) characterizes D Sen as a subspace called sensitive subspace learnt from logistic regression, which contains the most predictability of sensitive attributes. Ruoss et al. (2020) find out this distribution via logical constraints. Though feasible, we would like to respectfully point out that (1) Perturbations violate feature correlations could push adversarial samples leave the data manifold. An intuitive example is treating age as a sensitive attribute. Perturbations can change a person's age arbitrarily to find an optimal age that encourage the model to predict the most differently. Such perturbations ignore the correlations between the sensitive feature and other features like education or annual income, resulting in an adversarial sample with age 5 or 10 but holding a doctoral degree or getting $80K annual income. (2) Samples with arbitrary continuous perturbations can easily break the nature of tabular data. There are only one-hot discrete values for categorical variables after one-hot encoding, and potentially a fixed range for continuous variables. For example, the adversarial samples may in half bachelor degree and half doctoral degree. These two observations make the adversarial samples from D Sen unrealistic and leaving the data manifold, thus distorting the following DRO paradigm, and resulting in sub-optimal tradeoffs between fairness and utility. In this work, we address the above issues related to D Sen , and propose to generate on-manifold data for individual fairness purposes. The high-level philosophy is, by giving an original training sample, generate its comparable samples with different and reasonable sensitive attributes, and the generated data should fit into existing data manifold and obey the inherent feature correlations or innate data constraints. We name the generated data as antidote data. The antidote data can either mix with original training data to be a pre-processing technique, or either serve as D Sen in Equation ( 2) as an in-processing approach. By taking antidote data, a classifier would give individually fair predictions.

3.1. ANTIDOTE DATA GENERATOR

We start by elaborating on the generator of antidote data. The purpose of antidote data generator g θ is, given a training sample x, generating its comparable samples with different sensitive attribute(s). To ensure the generations have different sensitive features, we build g θ as a conditional generative model to generate a sample with pre-defined sensitive features. Given sensitive attributes s ̸ = s x (recall s x is the sensitive attributes of instance x), the objective is: g θ : (x, s, z) → x, with s x = s, x and x satisfy Definition 2.1, where z ∼ N (0, 1) is drawn from a standard multivariate normal distribution as a noise vector. The generation x should follow the data distribution and satisfy some innate constraints from discrete or continuous features, i.e., the one-hot format for discrete features and a reasonable range for continuous features. In the following, we shall elaborate the design and training strategy for g θ . Encoding Continuous Values For continuous features, we adopt mode-specific normalization (Xu et al., 2019) to encode every column of continuous values independently. We use Variational Bayesian to estimate the Gaussian mixture in the distribution of one continuous feature. This approach will decompose the distribution into several modes, where each mode is a Gaussian distribution with unique parameters. Formally, given a value c i,j in the i-th column of continuous feature and j-th row in the tabular, the learned Gaussian mixture is P(c i,j ) = Ki k=1 w i,k N (c i,j ; µ i,k , σ 2 i,k ), where w i,k is the weight of k-th mode in i-th continuous feature, and µ k and σ k are the mean and standard deviation of the normal distribution of k-th mode. We use the learned Gaussian mixture to encode every continuous value. For each value c i,j , we estimate the probability from each mode via p i,k (c i,j ) = w i,k N (c i,j ; µ i,k , σ 2 i,k ), and sample one mode from the discrete probability distribution p i with K i values. Having a sampled mode k, we represent the mode of c i,j using a one-hot mode indicator vector, an all-zero vector e i,x except the k-th entry equal to 1. We use a scalar to represent the relative value within k-th mode: v i,x = (c i,j -µ i,k )/4σ i,k . By encoding all continuous values, we have a re-representation x to substitute x as as the input for antidote data generator g θ : x = (v 1,x ⊕ e 1,x ⊕ • • • ⊕ v Nc,x ⊕ e Nc,x ) ⊕ d x ⊕ s x . (4) Recall ⊕ denotes vector-vector or vector-scalar concatenation. To construct a comparable sample x, the task for continuous features is to classify the mode from latent representations, i.e., estimate e i,k , and predict the relative value v i,x . We can decode v i,x and e i,x back to a continuous value using the learned Gaussian mixture. Structural Design The whole model is designed in a Generative Adversarial Networks (Goodfellow et al., 2014) style, consisting of a generator g θ and a discriminator d θ . The generator g θ takes the re-representation x, a pre-defined sensitive feature s, and noisy vector z as input. The output from g θ will be a vector with the same size as x including v x, e x, d x, and s x. To ensure all discrete features are in a one-hot manner so that the generations will follow a tabular distribution, we apply Gumbel softmax (Jang et al., 2017) as the final activation to each discrete feature and obtain d x. Gumbel softmax is a differentiable operation to encode a continuous distribution over a simplex and approximate it to a categorical distribution. This function controls the sharpness of output via a hyperparameter called temperature. Gumbel softmax is also applied to sensitive features s x and mode indicator vectors e x to ensure the one-hot format. The purpose for the discriminator model d θ is to distinguish the fake generations from real samples, and we also build discriminator to identify generated samples in terms of its comparability from real comparable samples. Through discriminator, the constraints from comparable samples are implicitly encoded into the adversarial training. We formulate the fake sample for discriminator as x⊕ x⊕(xx), and real samples as x′ ⊕ x ⊕ (x ′ -x), where x′ is the re-representation of a comparable sample x ′ to x drawn from the training data. The third term x -x is encoded to emphasize the different between two comparable samples. Implicitly regularizing the comparability leaves full flexibility to the generator to fit with various definitions of comparable samples, and avoid adding complicated penalty terms, as long as there are real comparable samples prepared for training. Training Antidote Data Generator We train generator and discriminator iteratively through the following objectives with gradient penalty Gulrajani et al. (2017) to ensure stability: min g θ E x,x ′ ∼Dcomp ℓ CE (s x, s x ′ ) -d θ (g θ (x ⊕ s x ′ ⊕ z)), min d θ E x,x ′ ∼Dcomp d θ (g θ (x ⊕ s x ′ ⊕ z)) -d θ (x ′ ), where D comp is the distribution describing the real comparable samples in data, ℓ CE is cross entropy loss to penalty the prediction of every sensitive attribute in s x with s x ′ as the ground-truth.

3.2. LEARNING WITH ANTIDOTE DATA

We elaborate two ways to easily apply the generated antidote data for the individual fairness purpose. In practice, it is not strictly guaranteed that g θ will produce comparable samples submitting to Definition 2.1. Some samples may be incompatible with some pre-defined sensitive features coming from the violations of neural networks. Thus, we apply a post-processing step Post to filter out comparable samples from all the generations. Given a dataset X, for one iteration of sampling, we input every x with all possible sensitive features (except s x ) to the generator, collect raw generations  θ ← θ -ηE (x,y) [∇ θ (max x∈{xi} M ←x ℓ(x, y) + ℓ(x, y))] ▷ {x i } M ← x is the set of M comparable samples of x and {x i } M ∈ Post( X) 6: until convergence X, and apply Post( X) to get the antidote data. The label y for antidote data is copied from the original data. We may have multiple iterations of sampling to enlarge the pool of antidote data. The first way to use antidote data is to simply insert all antidote data to the original training set: min f θ ℓ(f θ (x), y), x ∈ X + Post( X). Since we only add additional training data, this approach is model-agnostic, flexible to any model optimization procedure, and fits well with well-developed data analytical toolkits such as sklearn (Pedregosa et al., 2011) . We consider the convenience as a favorable property for practitioners. The second way is to apply antidote data with Distributionally Robust Optimization. We present the training procedure in Algorithm 1. In every training iteration, except the optimization at real data with ℓ(x, y), we add an additional step to select x's comparable samples in antidote data with the highest loss incurred by the current model's parameters, and capture gradients from max x∈{xi} M ←x ℓ(x, y) to update the model. The algorithm is similar to DRO with perturbations along some sensitive directions, but instead we replace the perturbations with on-manifold generated data. The additional loss term in Algorithm 1 can be upper bounded by a gradient smoothing regularization term. Taking Taylor expansion, we have: max x∈{xi} m ←x ℓ(x, y) = ℓ(x, y) + max x∈{xi} m ←x [ℓ(x, y) -ℓ(x, y)] = ℓ(x, y) + max x∈{xi} m ←x [⟨∇ x ℓ(x, y), (x -x)⟩] + O(δ 2 ) ≤ ℓ(x, y) + T d max i ∇ di ℓ(x, y) + T c max i ∇ ci ℓ(x, y) + N s max i ∇ si ℓ(x, y) + O(δ 2 ). Recall T d and T c are the thresholds for discrete and continuous features in Definition 2.1. O(δ 2 ) is the higher-order from Taylor expansion. The last inequality is from Definition 2.1. The three gradients on discrete, continuous, and sensitive features serve as gradient regularization and encourage the model to have invariant loss with regard to comparable samples. However, the upper bound is only a sufficient but not necessary condition, and our solution encodes real data distribution into the gradient regularization to solve individual unfairness with more favorable trade-offs.

4.1. EXPERIMENTAL SETUP

Datasets We involve censual datasets Adult (Kohavi & Becker, 1996) and Dutch (Van der Laan, 2000), educational dataset Law School (Wightman, 1998) and Oulad (Kuzilek et al., 2017) , and criminological dataset Compas (Angwin et al., 2016) in our experiments. For each dataset, we select one or two attributes related to ethics as sensitive attributes which expose a significant individual unfairness in a base model like neural networks. We report their details in Appendix A. Protocol For all datasets, we transform discrete features into one-hot encoding, and standardize the features by removing the mean and scaling to unit variance. We transform continuous features into the range between 0 and 1. We construct pairs of comparable samples for both training and testing sets. In experiments, different from (Yurochkin & Sun, 2021; Yurochkin et al., 2020) , our evaluations on tabular datasets are sampled from real testing data but not simulated. We evaluate both the model utility and individual fairness in experiments. For utility, we consider the area under the Receiver Operating Characteristic Curve (ROC), and Average Precision (AP) to characterize the precision of probabilistic outputs in binary classification. For individual fairness, we consider the gap in probabilistic scores between comparable samples when both two samples have the same positive or negative label (abbreviated as Pos. Comp. and Neg. Comp.). We evaluate unfairness for positive and negative comparable samples in terms of the arithmetic mean (Mean) and upper quartile (Q3). The upper quartile can show us the performance of some worse-performed pairs. For a base model with randomness like NN, we ran the experiments five times and report the average results. Baselines We consider two base models: logistic regression (LR), and three-layers neural networks (NN). We use logistic regression from Scikit-learn (Pedregosa et al., 2011) , and our antidote data is compatible with this mature implementation since it does not make a change to the model. Approaches involving DRO currently do not support this LR pipeline, but will be validated through neural networks implemented with PyTorch. We have the following five baselines in experiments: 1. Discard sensitive features (Dis). This approach simply deletes the appointed sensitive features in the input data; 2. Project (Proj) (Yurochkin et al., 2020) . Project finds a linear projection via logistic regression which minimizes the predictability of sensitive attributes in data. It requires an extra preprocessing step to project input data. 3. SenSR (Yurochkin et al., 2020) . SenSR is based on DRO. It finds a sensitive subspace through logistic regression which encodes the sensitive information most, and generates perturbations on this sensitive subspace during optimization. 4. SenSeI (Yurochkin & Sun, 2021) . SenSeI also uses the DRO paradigm, but involves distances penalties on both input and model predictions to construct perturbations; 5. LCIFR (Ruoss et al., 2020) . LCIFR computes adversarial perturbations with logical constraints, and optimizes representations under the attacks from perturbations. We basically follow the default hyperparameter setting from the original implementation but fine-tune some parameters to avoid degeneration in some cases. For our approaches, we use Anti to denote the approach that simply merges original data and antidote data, use 'Anti.' to denote adding Dis to original and antidote data, and use DRO-Anti to denote antidote data with DRO. We standardized baselines with the same base model in experiments.

4.2. HOW ANTIDOTE DATA MITIGATE UNFAIRNESS

We present our numerical results on Table 1 , Table 2 , and Figure 1 , and defer more to Appendix C. From these results we have the following major observations. Antidote Data Show Good Performance Across all datasets, with antidote data, our models mostly perform the best in terms of individual fairness, and with only a minimal drop or sometimes even a slight improvement on predictive utility. For example, on Law School dataset, our NN+Anti mitigates individual unfairness by 70.38% and 63.36% in terms of the Mean in Pos. Comp. and Neg. Comp., respectively, with improvements on ROC by 0.47% and AP by 0.07%. On this dataset, other methods typically bring a 0.1%-2.5% drop in utility, and deliver less mitigation on individual unfairness. In some cases, some baseline methods do give better individual fairness, e.g., LCIFR for Neg. Comp., but their fairness is not consistent for positive comparable samples, which is usually achieved at a significant cost on utility (up to a 13.03% drop in ROC). Improvements from DRO-Anti Our DRO-Anti outperforms base models that learn with antidote data through regular optimizations. This model gets fairer results and slightly better predictive utility. This is because DRO-Anti introduces antidote data into every optimization iteration and selects the worst performed data instead of treating them equally. The typical DRO training has an iterative optimization in every epoch to search for good perturbations. In contrast, DRO-Anti omits the inner optimizations but only evaluates every antidote data in each round. Binding well with Dis. Removing sensitive features from input data generally improves individual fairness. In Law School dataset, discarding sensitive features can bring up to 44.32% -63.36% mitigation in individual fairness. But once sensitive features are highly correlated with other features, the mitigation is not guaranteed. In Adult dataset, removing sensitive features only gets 0.93% -2.94% improvements across these two models. Regardless of the varying performance from Dis, our antidote data bind well with sensitive features discarding. On Adult dataset, our LR+Anti plus Figure 2 : A & B: The tradeoffs between utility and fairness on Adult dataset. For SenSeI we iterate the controlling hyperparameter in (1e+3, 5e+3, 1e+4, 5e+4, 1e+5, 2e+5, 5e+5) . For LCIFR, we iterate the weight for fairness in (0.1, 1.0, 10.0, 50.0, 100.0). For Anti, we have the proportion of antidote ratio at 0%, 45%, 90%, 134%, 180%, 225%, 270%, 316%, 361%, and 406%. For DRO-Anti, we have the proportion of antidote ratio at 45%, 90%, 136%, 180%, 225%. Every point is plotted with variances, and the variance for our models is too small to observe in this figure. C: The convergence in terms of the comparability ratio during the training of the generator. Dis boosts individual fairness in Pos. Comp. by 5.36%, where solely discarding sensitive features only has 0.94% improvements. This number is consistent in NN, i.e., 4.96% compared to 0.93%.

Algorithmic Tradeoffs

In Figure 2 A & B, we show the tradeoffs between utility and fairness. We have two major observations: (1) Models with antidote data perform better tradeoffs, i.e., with more antidote data, we have lower individual unfairness, and less drop in model utility. DRO-Anti has the best tradeoffs and achieves individual fairness with an inconspicuous sacrifice of utility even when the amount of antidote data goes up. (2) Our models enjoy a lower variance with different random seeds. For baseline methods, when we turn up the hyperparameters controlling the tradeoffs, there is an instability in the final results and a significant variance. However, as our model is optimized on approximately real data, and with no change on a model from Anti. and minimal change in optimization from DRO-Anti, there is no observational variance in the final results. Convergence In Figure 2 C, we show the change of the comparability ratio, i.e., the rate of comparable samples from the entire generated samples, during training for different types of features. The comparability ratio of sensitive features quickly converged to 1 since we have direct supervision. The ratio of discrete and numerical features converged around the 500-th iteration due to the implicit supervision from the discriminator. The ratio of continuous features is lower than discrete features due to more complex patterns. Due to the imperfect comparability ratio, we add an additional step Post() to filter out incomparable samples. 4.3 MODELING THE DATA MANIFOLD 4 show that our antidote data suffer from a performance drop compared to the original data because the generator cannot perfectly fit the data manifold. Even so, antidote data surpass random data and perturbations from SenSeI, indicating that antidote data are closer to the original training data.

5. RELATED WORK

Machine Learning Fairness AI Fairness proposes ethical regulations to rectify algorithms not discriminating against any party or individual. To quantify the goal, the concept 'group fairness' asks for equalized outcomes from algorithms across sensitive groups in terms of statistics like true positive rate or positive rate (Hardt et al., 2016) . Similarly, minimax fairness (Hashimoto et al., 2018) characterizes the algorithmic performance of the worst-performed group among all. Though appealing, both of these two notions guarantee poorly on individuals. To compensate for the deficiency, counterfactual fairness (Kusner et al., 2017) describes the consistency of algorithms on one instance and its counterfacts when sensitive attributes got changed. However, this notion and corresponding evaluations strongly rely on the casual structure (Glymour et al., 2016) which originates from the data generating process. Thus, in practice, an explicit modeling is usually unavailable. Individual fairness (Dwork et al., 2012) describes the pair-wise predictive gaps between similar instances, and it is feasible when the constraints in input and output spaces are properly defined. Crafting Adversarial Samples Beyond regular adversary (Madry et al., 2018) , using generative models to craft on-manifold adversarial samples is an attractive technique for model robustness (Xiao et al., 2018; Zhao et al., 2018; Kos et al., 2018; Song et al., 2018) . Compared to general adversarial samples without too many data-dependent considerations, generative samples are good approximations to the data distribution and can offer attacks with rich semantics. Experimentally, crafting adversarial samples is in accordance with intuition and has shown to boost model generalization capacity (Stutz et al., 2019; Raghunathan et al., 2019) .

A DATASET

Adult dataset The Adult dataset contains census personal records with attributes like age, education, race, etc. The task is to determine whether a person makes over $50K a year. We use 45.25% antidote data for Anti, and 225.97% antidote data for DRO-Anti. We set T d = 1 and T c = 0.025 for the constraints of comparable samples. Compas dataset The Compas dataset is a criminological dataset recording prisoners' information like criminal history, jail and prison time, demographic, sex, etc. The task is to predict a recidivism risk score for defendants. We use 148.55% antidote data for Anti, and 184.89% antidote data for DRO-Anti. We set T d = 1 and T c = 0.025. Note that from (Bao et al., 2021) , Compas dataset may not be the ideal dataset for demonstrating algorithmic fairness. Law School dataset The Law School dataset dataset contains law school admission records. The goal is to predict whether a candidate would pass the bar exam, with available features like sex, race, and student's decile, etc. We use 56.18% antidote data for Anti, and 338.50% antidote data for DRO-Anti. We set T d = 1 and T c = 0.1. Oulad The Open University Learning Analytics (Oulad) dataset contains information of students and their activities in the virtual learning environment for seven courses. It offers students' gender, region, age, and academic information to predict students' final results in a module-presentation. We use 523.23% antidote data for Anti, and 747.85% antidote data for DRO-Anti. We set T d = 1 and T c = 0.025.

Dutch dataset

The Dutch dataset dataset shows people profiles in Netherlands in 2001. It provides information like sex, age, household, citizenship, etc., and aim to predict a person's occupation. We remove 8,549 duplication in the test set and reduce the size to 6,556. We use 205.44% antidote data for Anti, and 770.65% antidote data for DRO-Anti. We set T d = 1 and T c = 0.025. 

B IMPLEMENTATION DETAILS

We elaborate the architecture of our model in details by using h as the hidden representations. We use Adam optimizer. We set the learning rate for generator g θ to 2e-4, for discriminator d θ to 2e-4, weight decay for g θ to 1e-6, for d θ to 0. We set batch size to 4096 and training epochs to 500. 1e+3, 5e+3, 1e+4, 5e+4, 1e+5, 2e+5, 5e+5) . For LCIFR, we iterate the weight for fairness in (0.1, 1.0, 10.0, 50.0, 100.0). For Anti, we have the proportion of antidote ratio at 110%, 130%, 150%, 167%, 185%, 206%. For DRO-Anti, we have the proportion of antidote ratio at 129%, 146%, 167%, 184%, 201%, 222%. Every point is plotted with variances. g θ =                  h 1 = ReLU(BatchNorm1d(Linear →256 (x ⊕ s ⊕ z))) ⊕ x ⊕ s ⊕ z h 2 = ReLU(BatchNorm1d(Linear →256 (h 1 ))) ⊕ h 1 h 3 = ReLU(



CONCLUSIONIn this paper we studied individual fairness on tabular datasets, and focused on an individual fairness definition with rich semantics. We proposed an antidote data generator to learn on-manifold comparable samples, and used the generator to produce antidote data for the individual fairness purpose. We provided two approaches to equip antidote data to regular classification pipeline or a distributionally robust optimization paradigm. By incorporating generated antidote data, we showed good individual fairness as well as good tradeoffs between predictive utility and individual fairness.



Figure 1: Box plots for experimental results on Compas dataset. Experiments in the left three figures use Logistic Regression as the base model, and the right three figures use Neural Networks. The top two rows plot the results in individual fairness, while the bottom two rows plot the model's utility. Since we set two sensitive attributes for Compas dataset, we plot three situations for comparable samples upon sensitive attributes for these two samples, and use logical expressions to denote them. We use 'and' to indicate none of the sensitive attributes is same between a pair of comparable samples, use 'or' to denote at least one sensitive attribute is different, and use 'not' to indicate both two sensitive attributes are consistent. The dash line in the box plots indicate the arithmetic mean.

Several methods have been proposed for individual fairness. Sharifi-Malvajerdi et al. (2019) study Average Individual Fairness. They regulate the average error rate for individuals on a series of classification tasks with different targets, and bound the rate for the worst-performed individual. Yurochkin et al. (2020); Yurochkin & Sun (2021); Ruoss et al. (2020); Yeom & Fredrikson (2021) develop models via DRO that iteratively optimized at samples which violate fairness at most. To overcome the hardness for choosing distance functions, Mukherjee et al. (2020) inherit the knowledge of similar/dissimilar pairs of inputs, and propose to learn good similarity metrics from data. Ilvento (2020) learns metrics for individual fairness from human judgements, and construct an approximation from a limited queries to the arbiter. Petersen et al. (2021) propose a graph smoothing approach to mitigate individual bias based on a similarity graph. Lahoti et al. (2019) propose a probabilistic mapping from input to low-rank representations that reconcile individual fairness well. To introduce individual fairness to more applications, Vargo et al. (2021) study individual fairness in gradient boosting, and the model is able to work with non-smooth models such as decision trees. Dwork et al. (2020) study individual fairness in a multi-stage pipeline. Maity et al. (2021); John et al. (2020) study model auditing with individual fairness.

BatchNorm1d(Linear →Dim(x) (h 2 ))) vi = tanh(Linear →1 (h 3 [index for v i ])) ∀ 0 ≤ i ≤ N c êi = gumbel 0.2 (Linear →|di| (h 3 [index for K i ])) ∀ 0 ≤ i ≤ N c di = gumbel 0.2 (Linear →|di| (h 3 [index for d i ])) ∀ 0 ≤ i ≤ N d d θ =    h 1 = Dropout 0.5 (LeakyReLU 0.2 (Linear →256 (x ⊕ x ⊕ x -x))) h 2 = Dropout 0.5 (LeakyReLU 0.2 (Linear →256 (h 1 ))) score = Linear →1 (h 2 )

Figure3: The tradeoffs between utility and fairness on Compas dataset. For SenSeI we iterate the controlling hyperparameter in(1e+3, 5e+3, 1e+4, 5e+4, 1e+5, 2e+5, 5e+5). For LCIFR, we iterate the weight for fairness in (0.1, 1.0, 10.0, 50.0, 100.0). For Anti, we have the proportion of antidote ratio at 110%, 130%, 150%, 167%, 185%, 206%. For DRO-Anti, we have the proportion of antidote ratio at 129%, 146%, 167%, 184%, 201%, 222%. Every point is plotted with variances.

Algorithm 1 DRO-Anti: DRO with Antidote Data for Individual Fairness 1: Input: Training data T = {(x i , y i )} N , learning rate η, loss function ℓ 2: Train Comparable Sample Generator g θ with {x i } N and comparable constraints 3: Sample antidote data X using g θ 4: repeat

Experimental results on Adult dataset

Experimental results on Law School dataset

Comparing to random generated comparable samples on Adult dataset We add these randomly generated comparable samples to the original training set. From the results in Table3, we observe that with only 44.5% antidote data, the model outperforms the one with 500% randomly generated comparable samples in terms of individual fairness. By surpassing 10x data efficacy, the results demonstrated that modeling on-manifold comparable samples is greatly helpful to mitigate individual unfairness.Learning Efficacy of Antidote DataIn Table4we study the model binary classification performance by training only on generated data. We use Accuracy (Acc.), Bal. Acc. (Balance Accuracy), and F1 Score (F1) for evaluation. We construct a synthetic training set that has the same amount of data as the original training set. We use two baselines. Random Data: the randomly generated data fit the basic constraints from tabular. Pert. in SenSeI: we collect perturbations from the original data in every training iteration of SenSeI, and uniformly sample from these perturbations.

Learning efficacy on Adult dataset Acc. ↑ Bal. Acc. ↑ F1 ↑

Dataset Statistics. We report data statistic including sample size as well as the number of positive and negative comparable samples in training / testing set, respectively.

