ADVERSARIAL LEARNED FAIR REPRESENTATIONS US-ING DAMPENING AND STACKING

Abstract

As more decisions in our daily life become automated, the need to have machine learning algorithms that make fair decisions increases. In fair representation learning we are tasked with finding a suitable representation of the data in which a sensitive variable is censored. Recent work aims to learn fair representations through adversarial learning. This paper builds upon this work by introducing a novel algorithm which uses dampening and stacking to learn adversarial fair representations. Results show that that our algorithm improves upon earlier work in both censoring and reconstruction.

1. INTRODUCTION

The need to have machine learning algorithms that make fair decisions becomes increasingly important in modern society. A decision is fair if it does not depend on a sensitive variable such as gender, race, or age. Models trained with biased data can lead to unfair decisions Mehrabi et al. (2021) . In fair representation learning we are tasked with finding a suitable representation of the data in which the sensitive variable is censored. This ensures that these representations can be used for any downstream task, such as classification or segmentation, which should not rely on the value of the sensitive variable. Throughout this paper, we often refer to this sensitive variable as the protected variable. Fairness can be applied to machine learning algorithms at roughly three stages of the process: during preprocessing, inprocessing or postprocessing. With preprocessing we aim to learn a new representation of the input data which is more fair. A well known example of this is Zemel et al. ( 2013), which obfuscates inputs when it can lead to unfairness. With inprocessing techniques the task is to make a machine learning algorithm more fair during training, typically by modifying the learning algorithm or by adding extra constraints to the learning objective. With postprocessing we are trying to correct the predictions of a machine learning algorithm after training in order to achieve fairness. In recent years, inprocessing techniques such as Zhang et al. ( 2018) have become very popular since they typically strike an optimal balance between accuracy and fairness. However, the major advantage of preprocessing over any other technique is that the transformed data can be used for any downstream task, both supervised and unsupervised. This makes preprocessing still invaluable in many practical applications where we know the protected variable beforehand, but have no specific machine learning task in mind yet. Hence the focus on preprocessing in this paper. Moreover, as shown in McNamara et al. (2017) , preprocessing techniques can still provide us with theoretical fairness guarantees if required. It is important to note that the notion of fairness is not trivial, and a multitude of fairness constraints have been proposed pertaining to both group fairness and individual fairness Mitchell et al. (2021) . In this paper we adopt the demographic parity constraint due to its widespread use in benchmarking and evaluating fairness of machine learning algorithms. Demographic parity enforces that a classifier treats the data containing the protected variable statistically similar to the general population, and a major downside of this criterion is that it tends to cripple accuracy as long as we achieve equal acceptance rates. In reality, for every specific dataset and problem we need to assess which fairness criterion is most applicable and cannot simply select one as preferred Verma & Rubin (2018) . The upside however is that in this paper we encode our fairness constraint in the form of a loss function, and as shown in Madras et al. (2018) we are able to associate different loss functions to different

