FARE: PROVABLY FAIR REPRESENTATION LEARNING

Abstract

Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. However, recent work has shown that prior methods achieve worse accuracy-fairness tradeoffs than originally suggested by their results. This dictates the need for FRL methods that provide provable upper bounds on unfairness of any downstream classifier, a challenge yet unsolved. In this work we address this challenge and propose Fairness with Restricted Encoders (FARE), the first FRL method with provable fairness guarantees. Our key insight is that restricting the representation space of the encoder enables us to derive suitable fairness guarantees, while allowing empirical accuracy-fairness tradeoffs comparable to prior work. FARE instantiates this idea with a tree-based encoder, a choice motivated by inherent advantages of decision trees when applied in our setting. Crucially, we develop and apply a practical statistical procedure that computes a high-confidence upper bound on the unfairness of any downstream classifier. In our experimental evaluation on several datasets and settings we demonstrate that FARE produces tight upper bounds, often comparable with empirical results of prior methods, which establishes the practical value of our approach.

1. INTRODUCTION

It has been repeatedly shown that machine learning systems deployed in real-world applications propagate training data biases, producing discriminatory predictions (Buolamwini & Gebru, 2018; Corbett-Davies et al., 2017; Kleinberg et al., 2017; Tatman & Kasten, 2017) . This is especially concerning in decision-making applications on data that represents humans (e.g., financial or medical), and can lead to unfavorable treatment that negatively affects certain subgroups of the population (Brennan et al., 2009; Khandani et al., 2010; Barocas & Selbst, 2016) . For instance, a loan prediction system deployed by a financial institution might recommend loan rejection based on a sensitive attribute of a client, such as race or gender. These observations have forced regulators into action, leading to directives (FTC, 2021; EU, 2021) which demand parties aiming to deploy such systems to ensure fairness (Dwork et al., 2012) of their predictions. Mitigation of unfairness has become a key concern for organizations, with the highest increase in perceived relevance over the previous year, out of all potential risks of artificial intelligence (Chui et al., 2021; Benaich & Hogarth, 2021) . Fair representation learning A promising approach that attempts to address this issue is fair representation learning (FRL) (Zemel et al., 2013; Moyer et al., 2018; Madras et al., 2018; Gupta et al., 2021; Kim et al., 2022; Shui et al., 2022; Balunović et al., 2022 )-a long line of work that preprocesses the data using an encoder f , transforming each datapoint x ∈ X into a debiased representation z. The key promise of FRL is that these debiased representations can be given to other parties, who want to solve a prediction task without being aware of fairness (or potentially even being fine with discriminating), while ensuring that any downstream classifier they train on these representations has favorable fairness. However, recent work (Xu et al., 2020; Song & Shmatikov, 2020; Gupta et al., 2021) has demonstrated that for some FRL methods it is possible to train significantly more unfair classifiers than originally claimed. This illuminates a major drawback of all existing work-their claim about fairness of the downstream classifiers holds only for the models they considered during the evaluation, and does not guarantee favorable fairness of other downstream classifiers trained on z. This is insufficient for critical applications where fairness must be guaranteed or is enforced by regulations, leading to our key question: Can we create an FRL method that provably bounds the unfairness of any downstream classifier?

