FARE: PROVABLY FAIR REPRESENTATION LEARNING

Abstract

Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. However, recent work has shown that prior methods achieve worse accuracy-fairness tradeoffs than originally suggested by their results. This dictates the need for FRL methods that provide provable upper bounds on unfairness of any downstream classifier, a challenge yet unsolved. In this work we address this challenge and propose Fairness with Restricted Encoders (FARE), the first FRL method with provable fairness guarantees. Our key insight is that restricting the representation space of the encoder enables us to derive suitable fairness guarantees, while allowing empirical accuracy-fairness tradeoffs comparable to prior work. FARE instantiates this idea with a tree-based encoder, a choice motivated by inherent advantages of decision trees when applied in our setting. Crucially, we develop and apply a practical statistical procedure that computes a high-confidence upper bound on the unfairness of any downstream classifier. In our experimental evaluation on several datasets and settings we demonstrate that FARE produces tight upper bounds, often comparable with empirical results of prior methods, which establishes the practical value of our approach.

1. INTRODUCTION

It has been repeatedly shown that machine learning systems deployed in real-world applications propagate training data biases, producing discriminatory predictions (Buolamwini & Gebru, 2018; Corbett-Davies et al., 2017; Kleinberg et al., 2017; Tatman & Kasten, 2017) . This is especially concerning in decision-making applications on data that represents humans (e.g., financial or medical), and can lead to unfavorable treatment that negatively affects certain subgroups of the population (Brennan et al., 2009; Khandani et al., 2010; Barocas & Selbst, 2016) . For instance, a loan prediction system deployed by a financial institution might recommend loan rejection based on a sensitive attribute of a client, such as race or gender. These observations have forced regulators into action, leading to directives (FTC, 2021; EU, 2021) which demand parties aiming to deploy such systems to ensure fairness (Dwork et al., 2012) of their predictions. Mitigation of unfairness has become a key concern for organizations, with the highest increase in perceived relevance over the previous year, out of all potential risks of artificial intelligence (Chui et al., 2021; Benaich & Hogarth, 2021) . Fair representation learning A promising approach that attempts to address this issue is fair representation learning (FRL) (Zemel et al., 2013; Moyer et al., 2018; Madras et al., 2018; Gupta et al., 2021; Kim et al., 2022; Shui et al., 2022; Balunović et al., 2022 )-a long line of work that preprocesses the data using an encoder f , transforming each datapoint x ∈ X into a debiased representation z. The key promise of FRL is that these debiased representations can be given to other parties, who want to solve a prediction task without being aware of fairness (or potentially even being fine with discriminating), while ensuring that any downstream classifier they train on these representations has favorable fairness. However, recent work (Xu et al., 2020; Song & Shmatikov, 2020; Gupta et al., 2021) has demonstrated that for some FRL methods it is possible to train significantly more unfair classifiers than originally claimed. This illuminates a major drawback of all existing work-their claim about fairness of the downstream classifiers holds only for the models they considered during the evaluation, and does not guarantee favorable fairness of other downstream classifiers trained on z. This is insufficient for critical applications where fairness must be guaranteed or is enforced by regulations, leading to our key question: Can we create an FRL method that provably bounds the unfairness of any downstream classifier? Figure 1 : Overview of our provably fair representation learning method, FARE. The input dataset is transformed into fair representations using a restricted encoder. Our method can compute a provable upper bound T on unfairness of any classifier g ∈ G trained on these representations. The most prominent prior attempt to tackle this question, and the work most closely related to ours, is FNF (Balunović et al., 2022) ; we discuss other related work in Section 2. Assuming two groups s = 0 and s = 1 based on the sensitive attribute s, FNF shows that knowing the input distribution for each group can lead to an upper bound on unfairness of any downstream classifier. While this work is an important step towards provable fairness, the required assumption is unrealistic for most machine learning settings, and represents an obstacle to applying the approach in practice. Thus, the original problem of creating FRL methods that provide fairness guarantees remains largely unsolved.

This work: provably fair representation learning

We propose FARE (Fairness with Restricted Encoders, Fig. 1 )-the first FRL method that offers provable upper bounds on the unfairness of any downstream classifier g trained on its representations, without unrealistic prior assumptions. Our key insight is that using an encoder with restricted representations, i.e., limiting possible representations to a finite set {z 1 , . . . , z k }, allows us to derive a practical statistical procedure that computes a highconfidence upper bound on the unfairness of any g, detailed in Section 4. FARE instantiates this idea with a suitable encoder based on fair decision trees (see Section 5), leading to a practical end-to-end FRL method which produces debiased representations augmented with strong fairness guarantees. More concretely, FARE takes as input the set of samples {x (1) , . . . , x (n) } from the input distribution X (left of Fig. 1 ), and partitions the input space into k cells (middle plane, k = 3 in this example) using the decision tree encoder. Finally, all samples from the same cell i are transformed into the same representation z i (right). As usual in FRL, training a downstream classifier on representations leads to lower empirical unfairness, while slightly sacrificing accuracy on the prediction task. However, the main advantage of FARE comes from the fact that using a restricted set of representations allows us to, using the given samples, estimate the distribution of two sensitive groups in each cell, i.e., compute an empirical estimate of the conditional probabilities P (s = 0|z i ) and P (s = 1|z i ) (solid color orange bars) for all z i . Further, we can use confidence intervals to obtain upper bounds on these values that hold with high probability (transparent bars). As noted above, this in turn leads to the key feature of our method: a tight upper bound T on the unfairness of any g ∈ G, where G is the set of all downstream classifiers that can be trained on the resulting representations. As we later elaborate on, increasing the number of samples n makes the bounds tighter. Given the current trend of rapidly growing datasets, this further illustrates the practical value of FARE. In our experimental evaluation in Section 6 we empirically demonstrate that on real datasets FARE produces tight upper bounds, i.e., the unfairness of any downstream classifier trained on FARE representations is tightly upper-bounded, which was not possible for any of the previously proposed FRL methods. Moreover, these downstream classifiers are able to achieve comparable empirical accuracy-fairness tradeoffs to methods from prior work. We believe this work represents a major step towards solving the important problem of preventing discriminatory machine learning models.

