GENERATIVE FAIRNESS TEACHING

Abstract

Increasing evidences has shown that data biases towards sensitive features such as gender or race are often inherited or even amplified by machine learning models. Recent advancements in fairness mitigate such biases by adjusting the predictions across sensitive groups during the training. Such a correction, however, can only take advantage of samples in a fixed dataset, which usually has limited amount of samples for the minority groups. We propose a generative fairness teaching framework that provides a model with not only real samples but also synthesized samples to compensate the data biases during training. We employ such a teaching strategy by implementing a Generative Fairness Teacher (GFT) that dynamically adjust the proportion of training data for a biased student model. Experimental results indicated that our teacher model is capable of guiding a wide range of biased models by improving the fairness and performance trade-offs significantly.

1. INTRODUCTION

Automated learning systems are ubiquitous across a wide variety of sectors. Such systems can be used in many sensitive environments to make important and even life-changing decisions. Traditionally, decisions are made primary by human and the basis are usually highly regulated. For example in the Equal Credit Opportunity ACts (ECOA), incorporating attributes such as race, color, or sex into credit lending decisions are illegal in United States (Mehrabi et al., 2019) . As more and more of this process nowadays is implemented by automated learning systems instead, algorithmic fairness becomes a topic of paramount importance. Lending (Hardt et al., 2016) , hiring (Alder & Gilbert, 2006) , and educational rights (Kusner et al., 2017) are examples where gender or race biased decisions from automatic systems can have serious consequences. Even for more mechanical tasks such as image classification (Buolamwini & Gebru, 2018) , image captioning (Hendricks et al., 2018) , word embedding learning (Garg et al., 2018; Bolukbasi et al., 2016) , and named co-reference resolution (Zhao et al., 2018) , algorithmic discrimination can be a major concern. As the society relies more and more on such automated systems, algorithmic fairness becomes a pressing issue. Although much of the focus of developing automated learning systems has been on the performance, it is important to take fairness into consideration while designing and deploying the systems. Unfortunately, state-of-the-art automated systems are usually data driven, which makes it more likely to inherit or even amplify the biases rooted in a dataset. This is an especially serious issue for deep learning and gradient based models, which can easily fit itself into the biased patterns of the dataset. For example, in a dataset with very few female candidates being labeled as hired in a job candidate prediction task, models might choose to give unfavorable predictions to qualified female candidates due to their under-representations in the training data. If deployed, such a biased predictor will deprive minority groups from acquiring the same opportunities as the others. Much of the work in the domain of machine learning fairness has been focusing exclusively on leveraging knowledge from samples in a dataset. One straightforward way is to adjust the distributions of the training data through pre-processing. In the job candidate prediction example above, this means that we can either down-sample the majority class or up-sample the minority ones (Kamiran & Calders, 2012) . Another family of fairness methods aims at matching the model performance on the majority class to that of the minority ones during training by using one of the fairness criteria (Gajane & Pechenizkiy, 2017) . Some examples of such methods includes adding regularizations (Kamishima et al., 2012) or applying adversarial learning (Madras et al., 2018a) . One issue with these approaches is that in many cases minority groups might be heavily under-represented in the dataset. Model training with fairness constraints will typically give up much of the performance ad-vantages (e.g., prediction accuracies) in favor of the fairness metrics. Methods concentrate on solely on a dataset will often find themselves difficult to maintain a good performance -fairness trade-off. One way to make models learn beyond the dataset is to take advantage of causal reasoning (Pearl et al., 2009) , which borrows knowledge from external structures often formulated as a causal graph. Counterfactual Fairness (Kusner et al., 2017) and Causal Fairness (Kilbertus et al., 2017) are examples of such approaches. One unique characteristic of causal fairness is the fact that they need to be built based on a causal graph. And because those metrics are usually optimized and evaluated their own objective, which involves a causal graph, it's not clear how that added knowledge can be used to benefit other more commonly used fairness criteria such as Demographic Parity and Equalized Odds. Although it is possible to create causal structures that subsume conditional independencies in order to benefit DP or EO, we will need those structure information to be known in advance and we will have to derive one such structure for each metric we find. This is, what we believed, a significant limitation of the current causal methods which we aim to improve. In this paper, we propose a generative approach for fairness training that is capable of leveraging both real data and "counterfactual data" generated from a causal graph. The counterfactual data is generated in a way that alters the sensitive attribute while keeping other latent factors unchanged. We formulate such generative model using a novel combination of adversarial training with mutual information regularization. Next, the two types of data are organized by an architecture called the teacher, which dynamically determines the proportion of real and counterfactual samples to train a particular model. Our model -Generative Fairness Teacher (GFT) can be used to improve an arbitrary fairness criteria based on need. Our experimental results indicate that we are able to take advantage of the counterfactual generative model and make it able to achieve a significantly better model fairness on a wide range of datasets across models. we are able to improve upon models with different levels of biases.

2. BACKGROUND

We provides a basic overview for the foundations of our method. Here we assume X to be the input features, while A being the set of sensitive features. We define Y the be favorable outcome and Ŷ to be the models' prediction of the favorable outcome given the features. The core idea of Fairness in machine learning is to distribute those favorable outcomes evenly across each of the sensitive group A.

2.1. FORMAL FAIRNESS CRITERIA

There has been many existing work on fairness focusing on studying criteria to achieve algorithmic fairness. A straightforward way to define fairness is Demographic Parity Madras et al. (2018a) . In Demographic Parity, the chances of allocating the favorable outcomes Ŷ is the same across sensitive groups A. Under that definition, the predictive variable Ŷ is independent with A, making predictions free from discrimination against sensitive groups. Note that even though A takes the form of a binary variable, we can easily extend the definition into the case of multiple values. Definition 1 Demographic Parity  P ( Ŷ |X = x, A = a) = P ( Ŷ |X = x, A = a )



Other fairness criteria that are built based on input features includes include Fairness Through UnawarenessGajane & Pechenizkiy (2017) , and Individual Fairness Kusner et al. (2017). More recently, Hardt et al. argued that criteria that only takes into account sample features making it difficult for the algorithms to allocate favorable outcomes to the actual qualified samples in both the minority and the majority groups. Such an observation leading to a new fairness criteria called Equalized Odds (and its special case Equal Opportunity)Hardt et al. (2016), where the fairness statement includes a condition on target variable Y . Definition 2 Equalized Odds P ( Ŷ = 1|X = x, A = a, Y = y) = P ( Ŷ = 1|X = x, A = a , Y = y)(2)

