ADVERSARIAL LEARNED FAIR REPRESENTATIONS US-ING DAMPENING AND STACKING

Abstract

As more decisions in our daily life become automated, the need to have machine learning algorithms that make fair decisions increases. In fair representation learning we are tasked with finding a suitable representation of the data in which a sensitive variable is censored. Recent work aims to learn fair representations through adversarial learning. This paper builds upon this work by introducing a novel algorithm which uses dampening and stacking to learn adversarial fair representations. Results show that that our algorithm improves upon earlier work in both censoring and reconstruction.

1. INTRODUCTION

The need to have machine learning algorithms that make fair decisions becomes increasingly important in modern society. A decision is fair if it does not depend on a sensitive variable such as gender, race, or age. Models trained with biased data can lead to unfair decisions Mehrabi et al. (2021) . In fair representation learning we are tasked with finding a suitable representation of the data in which the sensitive variable is censored. This ensures that these representations can be used for any downstream task, such as classification or segmentation, which should not rely on the value of the sensitive variable. Throughout this paper, we often refer to this sensitive variable as the protected variable. Fairness can be applied to machine learning algorithms at roughly three stages of the process: during preprocessing, inprocessing or postprocessing. With preprocessing we aim to learn a new representation of the input data which is more fair. A well known example of this is Zemel et al. (2013) , which obfuscates inputs when it can lead to unfairness. With inprocessing techniques the task is to make a machine learning algorithm more fair during training, typically by modifying the learning algorithm or by adding extra constraints to the learning objective. With postprocessing we are trying to correct the predictions of a machine learning algorithm after training in order to achieve fairness. In recent years, inprocessing techniques such as Zhang et al. ( 2018) have become very popular since they typically strike an optimal balance between accuracy and fairness. However, the major advantage of preprocessing over any other technique is that the transformed data can be used for any downstream task, both supervised and unsupervised. This makes preprocessing still invaluable in many practical applications where we know the protected variable beforehand, but have no specific machine learning task in mind yet. Hence the focus on preprocessing in this paper. Moreover, as shown in McNamara et al. (2017) , preprocessing techniques can still provide us with theoretical fairness guarantees if required. It is important to note that the notion of fairness is not trivial, and a multitude of fairness constraints have been proposed pertaining to both group fairness and individual fairness Mitchell et al. (2021) . In this paper we adopt the demographic parity constraint due to its widespread use in benchmarking and evaluating fairness of machine learning algorithms. Demographic parity enforces that a classifier treats the data containing the protected variable statistically similar to the general population, and a major downside of this criterion is that it tends to cripple accuracy as long as we achieve equal acceptance rates. In reality, for every specific dataset and problem we need to assess which fairness criterion is most applicable and cannot simply select one as preferred Verma & Rubin (2018) . The upside however is that in this paper we encode our fairness constraint in the form of a loss function, and as shown in Madras et al. (2018) we are able to associate different loss functions to different group fairness constraints. This makes this approach applicable to far more fairness metrics than the one that we adopted in this paper. Often with learning a fair representation, the naive approach of dropping certain features of the data is insufficient. The origin of the bias might latently depend on some nonlinear combination of other variables, and can thus leak back into a decision making model. This inspired the work by Edwards & Storkey (2016) , which aims to learn a fair representation through adversarial learning. They use an auto-encoder as a generator for the new representation whose aim is to learn a new latent representation which attempts to censor the protected variable for the adversary. This work was later extended in Madras et al. (2018) where they propose learning objectives for other fairness metrics such as equalized odds and equal opportunity. In Kenfack et al. (2021) this work was further extended by introducing stacked auto-encoders to enforce fairness and improve censoring at different latent spaces. This work builds on the previous adversarial approach. In particular it focuses on the case where the downstream task we may encounter is unknown, i.e. it can be either some supervised classification objective or some unsupervised clustering or segmentation objective. The challenge with learning fair representations is that on one hand we want to censor the data, and on the other we want to retain as much information as possible. Since these objectives are often opposed, the approaches in Edwards & Storkey (2016 ), Madras et al. (2018 ), Kenfack et al. (2021) , and various others define the global objective of the model as a weighted sum of reconstruction error and predictive loss. This requires the trainer of a model to select some suitable hyperparameter which defines how much we value reconstruction error over predictive loss. This hyperparameter often has a large impact on the learned representations we get, and we can identify at least three issues with it. Firstly, we have no a priori knowledge on how the reconstruction error and the predictive loss relate. It could be nonlinear, which makes it almost impossible to make an informed decision beforehand. Secondly, the value of this hyperparameter gives us no formal guarantee of the censoring capabilities of the model. Some values can cause a collapse of the model. Thirdly, the hyperparameter choice is not explainable to the relevant stakeholders of the model. This makes it impractical for most industry use cases where hyperparameter choices need to be justified. As such, many authors using this methodology such as Edwards & Storkey ( 2016 2021) use a trial-and-error approach, or an arbitrary chosen constant, with regard to the choice of this hyperparameter. More often than not, the censoring capabilities of the learned representation are a hard constraint of the model. Thus, in many industry use cases, we are only interested in finding solutions in some restricted hypothesis space abiding some censoring constraint. A second perhaps even greater issue with the previous work is its instability. In particular, due to the unstable dynamic between actor and adversary we often learn suboptimal solutions. This has been observed in many cases such as Edwards & Storkey (2016) and Kenfack et al. (2021) , but never fully addressed. This paper attempts to mitigate these issues by introducing a novel algorithm for learning fair representations. In particular, it uses dampening to stabilize the interaction between actor and adversary, and uses stacking to learn strong censored representations within a restricted hypothesis space. The remainder of the paper is structured as follows: in Section 2 we briefly reiterate related work, in Section 3 we formally define the problem, in Section 4 we introduce the algorithm, in Section 5 we discuss the experiments and results, and in Section 6 we conclude this work.

2. RELATED WORK

In Zemel et al. (2013) the first fair representation learning approach was presented. Their methodology aims to map input data to a new representation in terms of a probabilistic mapping to a set of prototypes. Several other noteworthy algorithms for finding fair representations are further explored in Feldman et al. (2015) and Calmon et al. (2017) . In Louizos et al. (2016) an architecture based on the Variational Auto-Encoder (VAE) was proposed in order to learn fair representations, called the Variational Fair Auto-Encoder. A similar idea is explored in Locatello et al. (2019) . Although the idea of disentanglement between the protected variable and other features seems promising, it has not found widespread use yet due to the difficulty of finding independence between the sensitive and latent factors of variations.



), Beutel et al. (2017), Madras et al. (2018), Feng et al. (2019), Kenfack et al. (

