ZERO-SHOT FAIRNESS WITH INVISIBLE DEMOGRAPHICS

Abstract

In a statistical notion of algorithmic fairness, we partition individuals into groups based on some key demographic factors such as race and gender, and require that some statistics of a classifier be approximately equalized across those groups. Current approaches require complete annotations for demographic factors, or focus on an abstract worst-off group rather than demographic groups. In this paper, we consider the setting where the demographic factors are only partially available. For example, we have training examples for white-skinned and dark-skinned males, and white-skinned females, but we have zero examples for dark-skinned females. We could also have zero examples for females regardless of their skin colors. Without additional knowledge, it is impossible to directly control the discrepancy of the classifier's statistics for those invisible groups. We develop a disentanglement algorithm that splits a representation of data into a component that captures the demographic factors and another component that is invariant to them based on a context dataset. The context dataset is much like the deployment dataset, it is unlabeled but it contains individuals from all demographics including the invisible. We cluster the context set, equalize the cluster size to form a "perfect batch", and use it as a supervision signal for the disentanglement. We propose a new discriminator loss based on a learnable attention mechanism to distinguish a perfect batch from a non-perfect one. We evaluate our approach on standard classification benchmarks and show that it is indeed possible to protect invisible demographics.

1. INTRODUCTION

Machine learning is already involved in decision-making processes that affect peoples' lives such as in screening job candidates (Raghavan et al., 2020) and in pricing credit (Hurley & Adebayo, 2017) . Efficiency can be improved, costs can be reduced, and personalization of services and products can be greatly enhanced -these are some of the drivers for the widespread development and deployment of machine learning algorithms. Algorithms such as classifiers, however, are trained from large amount of labeled data, and can therefore encode and even reinforce past discriminatory practices that are present in the data. The classifier might treat some groups of individuals unfavorably, for example, denying credit on the grounds of language, gender, age and their combined effect. Algorithmic fairness aims at building machine learning algorithms that can take biased datasets and outputs fair/unbiased decisions for people with differing protected attributes, such as race, gender, and age. A typical setting of algorithmic fairness is as follows. We are given a training set of observations x ∈ X , their corresponding protected attributes s ∈ S, and the target label y ∈ Y for learning a classifier. In a statistical notion of algorithmic fairness e.g. (Kamiran & Calders, 2012a; Hardt et al., 2016; Zafar et al., 2017) , we control the discrepancy of a classifier's loss for a small number of demographic groups defined on protected attributes. Recently, several works have considered the setting where protected attributes are unknown (Kearns et al., 2018; Hashimoto et al., 2018; Khani et al., 2019) . They aim to control the losses of groups whose size is greater than some predefined value. These works focus on an abstract worst-off group rather than demographic groups. It has been noted that the implied worst-off groups may differ from well-specified demographic groups who are known to suffer from past discriminatory practices (Hashimoto et al., 2018) . We are interested in the setting that is in between having complete annotations for demographic groups and having none. In this paper, we introduce algorithmic fairness with invisible demographics.

