HYPER-PARAMETER TUNING FOR FAIR CLASSIFICA-TION WITHOUT SENSITIVE ATTRIBUTE ACCESS Anonymous

Abstract

Fair machine learning methods seek to train models that balance model performance across demographic subgroups defined over sensitive attributes like race and gender. Although sensitive attributes are typically assumed to be known during training, they may not be available in practice due to privacy and other logistical concerns. Recent work has sought to train fair models without sensitive attributes on training data. However, these methods need extensive hyper-parameter tuning to achieve good results, and hence assume that sensitive attributes are known on validation data. However, this assumption too might not be practical. Here, we propose Antigone, a framework to train fair classifiers without access to sensitive attributes on either training or validation data. Instead, we generate pseudo sensitive attributes on the validation data by training a biased classifier and using the classifier's incorrectly (correctly) labeled examples as proxies for minority (majority) groups. Since fairness metrics like demographic parity, equal opportunity and subgroup accuracy can be estimated to within a proportionality constant even with noisy sensitive attribute information, we show theoretically and empirically that these proxy labels can be used to maximize fairness under average accuracy constraints. Key to our results is a principled approach to select the hyper-parameters of the biased classifier in a completely unsupervised fashion (meaning without access to ground truth sensitive attributes) that minimizes the gap between fairness estimated using noisy versus ground-truth sensitive labels.

1. INTRODUCTION

Deep neural networks have achieved state-of-the-art accuracy on many tasks including face recognition (Buolamwini & Gebru, 2018; Grother et al., 2010; Ngan & Grother, 2015) , autonomous driving (Zhang et al., 2021; Chitta et al., 2021) , medical image diagnosis (Litjens et al., 2017; Cheplygina et al., 2019 ), etc. But, prior work (Hovy & Søgaard, 2015; Oren et al., 2019; Hashimoto et al., 2018a) has found that state-of-the-art networks exhibit unintended biases towards specific population groups, especially harming minority groups. Seminal work by Buolamwini & Gebru (2018) demonstrated, for instance, that commercial face recognition systems had lower accuracy on darker skinned women than other groups. A body of work has sought to design fair machine learning algorithms that account for a model's performance on a per-group basis (Prost et al., 2019; Sagawa* et al., 2020; Liu et al., 2021; Sohoni et al., 2020) . Much of the prior work assume that demographic attributes like gender and race on which we seek to train a fair model, which we refer to as sensitive attributes, are available on training and validation data Sagawa * et al. (2020); Prost et al. (2019) . However, there is a growing body of literature (Veale & Binns, 2017; Holstein et al., 2019) highlighting many real-world settings in which sensitive attributes may not be available. This is for multiple reasons. For example, the data subject may abstain from providing sensitive information to eschew potential discrimination in future (Markos et al., 2017) . In other settings, the attributes on which the model discriminates might not even be known (Citron & Pasquale, 2014; Pasquale, 2015) . For instance, in algorithmic hiring decisions, Köchling & Wehner (2020) highlight that bias and discrimination are recognized only after making real world decisions on applicants due to unknown attributes on which the model discriminates during training. Consequently, a large American e-commerce company had to cease using algorithmic tools for hiring purposes as it was unintentionally discriminating female applicants (Dastin, 2018) . Recent work seeks to train fair classifiers without access to sensitive attributes on the training set (Liu et al., 2021; Creager et al., 2021; Nam et al., 2020; Hashimoto et al., 2018a) 2021) has shown that these methods are highly sensitive to choice of hyper-parameters; the up-weighting factor, for example, can have a large impact on the resulting model's fairness. Some methods, therefore, tune hyper-parameters assuming access to sensitive information on the validation dataset. In fact, without this information, Liu et al. (2021) observed that these methods sometimes do worse than using standard ERM. But, sensitive information on the validation dataset may not be available for the same reasons they are hard to acquire on training data. In this paper, we propose Antigone, a simple, principled approach that enables hyper-parameter tuning for fairness without access to sensitive attributes on validation data. Antigone can be used to tune hyper-parameters for any prior method, for instance, JTT (Liu et al., 2021) , LfF (Nam et al., 2020) , CVaR DRO (Hashimoto et al., 2018a) , that trains fair models without sensitive attributes on training data, and for several fairness metrics including demographic parity, equal opportunity and worst sub-group accuracy. We note that these prior methods also address the problem of spurious correlations (Sagawa et al., 2020; Wang & Culotta, 2021) and their impact on accuracy. As such, Antigone can also be used to address spurious correlations, but we focus on fairness in this paper. Antigone builds on the same intuition as in prior work: mis-classified examples of a classifier trained with standard empirical risk serves as an effective proxy for minority groups. Accordingly, Antigone trains a biased classifier as a noisy sensitive attributes labeller on the validation data, labelling correctly and incorrectly classified examples as majority and minority groups, respectively. But this raises a key question: how do we select the hyper-parameters of the noisy labeler? Intuitively, to maximize utility of the noisily labelled validation set, we seek to maximize the fraction of minority (majority) samples in the incorrect (correct) sets. Since this cannot be measured directly, Antigone instead maximizes the distance between the data distributions of the two sets, which we measure using the Euclidean distance between the means (EDM) of the two distributions. We provide theoretical justification for our choice under the mutually contaminated (MC) noise model (Scott et al., 2013) that assumes that a fraction of majority (minority) group labels are contaminated with labels from minority (majority) group. Lamy et al. (2019) et al. show that common fairness metrics can be estimated up to a proportionality constant under the MC model. We show that Antigone's EDM criteria maximizes this proportionality constant, thus providing the most reliable estimates of fairness. We evaluate Antigone in conjunction with JTT Liu et al. (2021) on the CelebA, Waterbirds and Adult datasets which are commonly used in fairness literature. We compare Antigone with baselines that assume ground-truth knowledge of sensitive attributes and standard ERM on demographic parity, equal opportunity, and worst subgroup accuracy. Antigone significantly closes the fairness gap between standard ERM training and fairness with ground truth sensitive attributes. Compared with GEORGE that estimates majority/minority group labels by clustering the activations of an ERM model, Antigone produces more accurate labels and results in improved fairness. Ablation studies demonstrate the effectiveness of Antigone's EDM based hyper-parameter tuning.

2. PROPOSED METHODOLOGY

We now describe Antigone, starting with the problem formulation (Section 2.1) followed by a description of the Antigone algorithm (Section 2.2).

2.1. PROBLEM SETUP

Consider a data distribution over set D = X × A × Y, the product of input data (X ), sensitive attributes (A) and target labels (Y) triplets. We are given a training set D tr = {x tr i , a tr i , y tr i } N tr i=1 with N tr training samples, and a validation set D val = {x val i , a val i , y val i } N val i=1 with N val validation samples. We will assume binary sensitive attributes (A ∈ {0, 1}) and target labels (Y ∈ {0, 1}). We note that for now Antigone is limited to binary sensitive attributes, but can be extended to multiple target labels.



. The common theme across these methods is to up-weight misclassified examples either by splitting the training stage into two separate stages Liu et al. (2021) (identify mis-classified examples in stage 1 and upweight in stage 2) or by alternating between these stages across training epochs Nam et al. (2020) (identifying misclassified examples in one epoch and upweighting in the next). However, Liu et al. (

