EVALUATING FAIRNESS WITHOUT SENSITIVE AT-TRIBUTES: A FRAMEWORK USING ONLY AUXILIARY MODELS Anonymous authors Paper under double-blind review

Abstract

Although the volume of literature and public attention on machine learning fairness has been growing significantly in recent years, in practice some tasks as basic as measuring fairness, which is the first step in studying and promoting fairness, can be challenging. This is because the sensitive attributes are often unavailable in a machine learning system due to privacy regulations. The straightforward solution is to use auxiliary models to predict the missing sensitive attributes. However, our theoretical analyses show that the estimation error of the directly measured fairness metrics is proportional to the error rates of auxiliary models' predictions. Existing works that attempt to reduce the estimation error often require strong assumptions, e.g. access to the ground-truth sensitive attributes in a subset of samples, auxiliary models' training data and the target data are i.i.d, or some form of conditional independence. In this paper, we drop those assumptions and propose a framework that uses only off-the-shelf auxiliary models. The main challenge is how to reduce the negative impact of imperfectly predicted sensitive attributes on the fairness metrics without knowing the ground-truth sensitive attribute values. Inspired by the noisy label learning literature, we first derive a closed-form relationship between the directly measured fairness metrics and their corresponding ground-truth metrics. And then we estimate some key statistics (most importantly transition matrix in the noisy label literature), which we use, together with the derived relationship, to calibrate the fairness metrics. Our framework can be applied to all popular group fairness definitions as well as multi-class classifiers and multi-category sensitive attributes. In addition, we theoretically prove the upper bound of the estimation error in our calibrated metrics and show our method can substantially decrease the estimation error especially when auxiliary models are inaccurate or the target model is highly biased. Experiments on COMPAS and CelebA validate our theoretical analyses and show our method can measure fairness significantly more accurately than baselines under favorable circumstances.

1. INTRODUCTION

Despite numerous literature in machine learning fairness (Corbett-Davies & Goel, 2018) , in practice even measuring fairness, which is the first step in studying and mitigating fairness, can be challenging as it requires access to sensitive attributes of samples, which are often unavailable due to privacy regulations (Andrus et al., 2021; Holstein et al., 2019; Veale & Binns, 2017) . It is a problem that the industry is facing, which significantly slows down the progress of studying and promoting fairness. Existing methods to estimate fairness without access to ground-truth sensitive attributes mostly fall into two categories. First, some methods assume they have access to the ground-truth sensitive attributes on a subset of samples or they can label them if unavailable, e.g. Youtube asks its creators to voluntarily provide their demographic information (Wojcicki, 2021) . But it either requires labeling resource or depends on the volunteering willingness, and also the resulting measured fairness can be inaccurate due to sampling bias. Second, many works assume there exists an auxiliary dataset that can be used to train models to predict the missing sensitive attributes on the target dataset (i.e. the dataset that we want to measure the fairness on), e.g. Meta (Alao et al., 2021) and others (Elliott et al., 2009; Awasthi et al., 2021; Diana et al., 2022) . However, they often need to assume the aux-iliary dataset and the target dataset are i.i.d., and some form of conditional independence, which are not realistic. In addition, since the auxiliary dataset also contains sensitive information (i.e. the sensitive labels), it might be more and more difficult to obtain such training data from the open-source projects given the increasingly stringent privacy regulations today. Note that similar to our work, some researchers also draw insight from noisy label literature (Lamy et al., 2019; Celis et al., 2021; Awasthi et al., 2020) . But they assume the noise on sensitive attributes follow assumptions such as conditional independence or known transition probabilities. Furthermore, their goal is to mitigate bias rather than estimating fairness disparity. We emphasize the value of estimating fairness because the metric is vital in reporting and studying fairness in real-world systems. In this work, we drop many commonly made assumptions, i.e. 1) access to labeling resource, 2) access to auxiliary model's training data, 3) data i.i.d, and 4) conditional independence. Instead we only rely on off-the-shelf auxiliary models, which can be easily obtained via various open-source projects (without their training data). The requirement of the auxiliary model is also flexible. We do not need the auxiliary model's input to share the exactly same feature set as the target data. We only need the auxiliary model's input features have some overlap with the target dataset's features 1 . Our contributions are summarized as follows. • We theoretically show that directly using auxiliary models to estimate fairness (by predicting the missing sensitive attributes) would lead to a fairness metric whose estimation error is proportional to the prediction error of auxiliary models and the true fairness disparity (Theorem 1, Corollary 1). • Motivated by the above finding, we propose a general framework (Figure 1 , Algorithm 1) to calibrate the noisy fairness metrics using auxiliary models only. The framework is based on a derived closed-form relationship between the directly estimated noisy fairness metrics and their corresponding ground-truth metrics (Theorem 2) in terms of two key statistics: transition matrix and clean prior probability, which are well-studied in the noisy label literature. To estimate them, our framework can leverage any existing estimator. We show an example by adapting HOC (Zhu et al., 2021b) (Algorithm 2). The estimator only assumes that auxiliary models are informative and different auxiliary models make i.i.d. predictions. • We prove the error upper bound of our estimation (Theorem 3), and show that, in a simplified case, our estimated fairness metrics are guaranteed to be closer to the true metrics than the uncalibrated noisy metrics when auxiliary models are inaccurate or the target model is biased (Corollary 2). • Experiments on COMPAS and CelebA consolidate our theoretical findings and show our calibrated fairness is significantly more accurately than baselines under favorable circumstances.

2. PRELIMINARIES

Consider a K-class classification problem with target dataset D Fairness Definitions. To save space, all our discussions in the main paper are specific to DP. We include the complete derivations for EOd and EOp in the Appendix. DP metric is defined as: Definition 1 (Demographic Parity). The demographic parity metric of f on D conditioned on A is: • := {(x n , y n )|n ∈ [N ]}, ∆ DP (D, f ) := 1 M (M -1)K a,a ′ ∈[M ],k∈[K] |P(f (X) = k|A = a) -P(f (X) = k|A = a ′ )|. 1 For example, if the target dataset contains features about user information (name, location, interests etc.), then our method is applicable as long as the auxiliary model can take any one of those features as input and predict sensitive attributes, e.g. predicting race from name.



where N is the number of instances, x n is the feature, and y n is the label. Denote by X the feature space, Y = [K] := {1, 2, • • • , K} the label space, and (X, Y ) the random variables of (x n , y n ), ∀n. The target model f :X → [K] maps X to a predicted label class f (X) ∈ [K]. We aim at measuring group fairness conditioned on a sensitive attribute A ∈ [M ] := {1, 2, • • • , M } which is unavailable in D • . Denote the dataset with ground-truth sensitive attributes by D := {(x n , y n , a n )|n ∈ [N ]}, the joint distribution of (X, Y, A) by D.The task is to estimate the fairness metrics of f on D • without sensitive attributes such that the resulting metrics are as close to the fairness metrics evaluated on D (with ground-truth A) as possible. See Appendix A.1 for a summary of notations. We consider three group fairness(Wang et al., 2020; Cotter et al., 2019)  definitions and their corresponding measurable metrics: demographic parity (DP)(Calders et al., 2009; Chouldechova, 2017), equalized odds (EOd)(Woodworth et al., 2017), and equalized opportunity (EOp)(Hardt et al., 2016).

