EVALUATING FAIRNESS WITHOUT SENSITIVE AT-TRIBUTES: A FRAMEWORK USING ONLY AUXILIARY MODELS Anonymous authors Paper under double-blind review

Abstract

Although the volume of literature and public attention on machine learning fairness has been growing significantly in recent years, in practice some tasks as basic as measuring fairness, which is the first step in studying and promoting fairness, can be challenging. This is because the sensitive attributes are often unavailable in a machine learning system due to privacy regulations. The straightforward solution is to use auxiliary models to predict the missing sensitive attributes. However, our theoretical analyses show that the estimation error of the directly measured fairness metrics is proportional to the error rates of auxiliary models' predictions. Existing works that attempt to reduce the estimation error often require strong assumptions, e.g. access to the ground-truth sensitive attributes in a subset of samples, auxiliary models' training data and the target data are i.i.d, or some form of conditional independence. In this paper, we drop those assumptions and propose a framework that uses only off-the-shelf auxiliary models. The main challenge is how to reduce the negative impact of imperfectly predicted sensitive attributes on the fairness metrics without knowing the ground-truth sensitive attribute values. Inspired by the noisy label learning literature, we first derive a closed-form relationship between the directly measured fairness metrics and their corresponding ground-truth metrics. And then we estimate some key statistics (most importantly transition matrix in the noisy label literature), which we use, together with the derived relationship, to calibrate the fairness metrics. Our framework can be applied to all popular group fairness definitions as well as multi-class classifiers and multi-category sensitive attributes. In addition, we theoretically prove the upper bound of the estimation error in our calibrated metrics and show our method can substantially decrease the estimation error especially when auxiliary models are inaccurate or the target model is highly biased. Experiments on COMPAS and CelebA validate our theoretical analyses and show our method can measure fairness significantly more accurately than baselines under favorable circumstances.

1. INTRODUCTION

Despite numerous literature in machine learning fairness (Corbett-Davies & Goel, 2018) , in practice even measuring fairness, which is the first step in studying and mitigating fairness, can be challenging as it requires access to sensitive attributes of samples, which are often unavailable due to privacy regulations (Andrus et al., 2021; Holstein et al., 2019; Veale & Binns, 2017) . It is a problem that the industry is facing, which significantly slows down the progress of studying and promoting fairness. Existing methods to estimate fairness without access to ground-truth sensitive attributes mostly fall into two categories. First, some methods assume they have access to the ground-truth sensitive attributes on a subset of samples or they can label them if unavailable, e.g. Youtube asks its creators to voluntarily provide their demographic information (Wojcicki, 2021) . But it either requires labeling resource or depends on the volunteering willingness, and also the resulting measured fairness can be inaccurate due to sampling bias. Second, many works assume there exists an auxiliary dataset that can be used to train models to predict the missing sensitive attributes on the target dataset (i.e. the dataset that we want to measure the fairness on), e.g. Meta (Alao et al., 2021) and others (Elliott et al., 2009; Awasthi et al., 2021; Diana et al., 2022) . However, they often need to assume the aux-

