ZERO-SHOT FAIRNESS WITH INVISIBLE DEMOGRAPHICS

Abstract

In a statistical notion of algorithmic fairness, we partition individuals into groups based on some key demographic factors such as race and gender, and require that some statistics of a classifier be approximately equalized across those groups. Current approaches require complete annotations for demographic factors, or focus on an abstract worst-off group rather than demographic groups. In this paper, we consider the setting where the demographic factors are only partially available. For example, we have training examples for white-skinned and dark-skinned males, and white-skinned females, but we have zero examples for dark-skinned females. We could also have zero examples for females regardless of their skin colors. Without additional knowledge, it is impossible to directly control the discrepancy of the classifier's statistics for those invisible groups. We develop a disentanglement algorithm that splits a representation of data into a component that captures the demographic factors and another component that is invariant to them based on a context dataset. The context dataset is much like the deployment dataset, it is unlabeled but it contains individuals from all demographics including the invisible. We cluster the context set, equalize the cluster size to form a "perfect batch", and use it as a supervision signal for the disentanglement. We propose a new discriminator loss based on a learnable attention mechanism to distinguish a perfect batch from a non-perfect one. We evaluate our approach on standard classification benchmarks and show that it is indeed possible to protect invisible demographics.

1. INTRODUCTION

Machine learning is already involved in decision-making processes that affect peoples' lives such as in screening job candidates (Raghavan et al., 2020) and in pricing credit (Hurley & Adebayo, 2017) . Efficiency can be improved, costs can be reduced, and personalization of services and products can be greatly enhanced -these are some of the drivers for the widespread development and deployment of machine learning algorithms. Algorithms such as classifiers, however, are trained from large amount of labeled data, and can therefore encode and even reinforce past discriminatory practices that are present in the data. The classifier might treat some groups of individuals unfavorably, for example, denying credit on the grounds of language, gender, age and their combined effect. Algorithmic fairness aims at building machine learning algorithms that can take biased datasets and outputs fair/unbiased decisions for people with differing protected attributes, such as race, gender, and age. A typical setting of algorithmic fairness is as follows. We are given a training set of observations x ∈ X , their corresponding protected attributes s ∈ S, and the target label y ∈ Y for learning a classifier. In a statistical notion of algorithmic fairness e.g. (Kamiran & Calders, 2012a; Hardt et al., 2016; Zafar et al., 2017) , we control the discrepancy of a classifier's loss for a small number of demographic groups defined on protected attributes. Recently, several works have considered the setting where protected attributes are unknown (Kearns et al., 2018; Hashimoto et al., 2018; Khani et al., 2019) . They aim to control the losses of groups whose size is greater than some predefined value. These works focus on an abstract worst-off group rather than demographic groups. It has been noted that the implied worst-off groups may differ from well-specified demographic groups who are known to suffer from past discriminatory practices (Hashimoto et al., 2018) . We are interested in the setting that is in between having complete annotations for demographic groups and having none. In this paper, we introduce algorithmic fairness with invisible demographics. Who are the invisible demographics? In the context of machine learning systems, those are individuals with thin or non-existent labeled training data. The invisible population is primarily composed of individuals with certain protected attributes (Hendricks, 2005; Abualghaib et al., 2019; Perez, 2019) . We now elaborate on several algorithmic decision scenarios involving invisible demographics. One scenario is when we observe partial outcomes for some of the demographic groups, e.g. we have labeled training data for males (with positive and negative outcomes), but for the group of females, we only observe the one-sided labels (negative outcome). Another scenario is when we do not observe any outcome for some of the demographic (sub)groups, e.g. we have training samples for white-skinned and dark-skinned males, and white-skinned females, but we have zero labeled data for dark-skinned females. An extreme version of the last scenario is when we do not observe any outcome for females regardless of their skin colors, e.g. we only have training samples for males and no training examples for females. To summarize, in the invisible demographics problem, we define the demographics groups that are expected to be seen, so they are not abstract. However, not all of the demographics are observed (labeled) during training, forming missing or invisible demographics. This paper presents learning disentangled representations in the presence of invisible demographics. Our source of supervision is motivated by the observation that we want to deploy our classifier to the eventual real-world population. This deployment dataset will contain individuals from all demographics. We thus consider the setting where unlabeled data is available for learning disentangled representation. We call this data a context set and this context set is much like the deployment dataset, it is unlabeled but it contains all demographics including the invisible ones. We aim to convert our unlabeled context set into a perfect dataset (Kleinberg et al., 2016; Chouldechova, 2017) , a dataset in which the target label and protected attribute are independent (i.e. y ⊥ s). We will then use this perfect dataset as the inductive bias for learning disentangled representations. How do we construct this perfect dataset without labels? We assume that the number of demographic groups (hence clusters) is known a priori corresponding to the diverse demographic groups in the real-world population in which our machine learning system will be deployed. We use unsupervised kmeans clustering, or a supervised clustering based on rank statistics; the latter one allows to form the clusters that also support annotations in the train data. Once the clusters have been found, we can equalize the cluster size to form a perfect dataset and use it as an input for learning a disentangled fair representation. See fig. 1 for an overview of our learning with invisible demographic framework. Specifically, our paper provides the following main contributions: 1. A problem of algorithmic fairness with invisible demographics where we have zero data for some of demographics and we still have to make predictions for those groups. 2. Applying clustering methods to the task of transforming unlabeled context dataset into a perfect dataset. 3. Theoretical and experimental justification that the disentangled model with the perfect dataset as an inductive bias provides a well-disentangled fair representation, one component captures the demographic factors and another component is invariant to them.

Related work

We describe related work in three areas: zero-shot learning, semi-supervised learning, and disentangled representation learning. On zero-shot learning. The setting with incomplete training data, where we aim to account for seen and unseen outcomes is also known as generalized zero-shot learning. Traditionally, zero-shot learning transfers knowledge from classes for which we



Figure1: Overview of learning with invisible demographics: the train dataset, the context dataset and the proposed approach using a perfect dataset.

