COUNTERFACTUAL FAIRNESS THROUGH DATA PREPROCESSING

Abstract

Machine learning has become more important in real-life decision-making but people are concerned about the ethical problems it may bring when used improperly. Recent work brings the discussion of machine learning fairness into the causal framework and elaborates on the concept of Counterfactual Fairness. In this paper, we develop the Fair Learning through dAta Preprocessing (FLAP) algorithm to learn counterfactually fair decisions from biased training data and formalize the conditions where different data preprocessing procedures should be used to guarantee counterfactual fairness. We also show that Counterfactual Fairness is equivalent to the conditional independence of the decisions and the sensitive attributes given the processed non-sensitive attributes, which enables us to detect discrimination in the original decision using the processed data. The performance of our algorithm is illustrated using simulated data and real-world applications.

1. INTRODUCTION

The rapid popularization of machine learning methods and the growing availability of personal data have enabled decision-makers from various fields such as graduate admission (Waters & Miikkulainen, 2014 ), hiring (Ajunwa et al., 2016 ), credit scoring (Thomas, 2009) , and criminal justice (Brennan et al., 2009) to make data-driven decisions efficiently. However, the community and the authorities have also raised concern that these automatically learned decisions may inherit the historical bias and discrimination from the training data and would cause serious ethical problems when used in practice (Nature Editorial, 2016; Angwin & Larson, 2016; Dwoskin, 2015; Executive Office of the President et al., 2016) . Consider a training dataset D consisting of sensitive attributes S such as gender and race, nonsensitive attributes A and decisions Y . If the historical decisions Y are not fair across the sensitive groups, a powerful machine learning algorithm will capture this pattern of bias and yield learned decisions Ŷ that mimic the preference of the historical decision-maker, and it is often the case that the more discriminative an algorithm is, the more discriminatory it might be. While researchers agree that methods should be developed to learn fair decisions, opinions vary on the quantitative definition of fairness. In general, researchers use either the observational or counterfactual approaches to formalize the concept of fairness. The observational approaches often describe fairness with metrics of the observable data and predicted decisions (Hardt et al., 2016; Chouldechova, 2017; Yeom & Tschantz, 2018) . For example, Demographic Parity (DP) or Group Fairness (Zemel et al., 2013; Khademi et al., 2019) considers the learned decision Ŷ to be fair if it has the same distribution for different sensitive groups, i.e., P ( Ŷ |S = s) = P ( Ŷ |S = s ). The Individual Fairness (IF) definition (Dwork et al., 2012) views fairness as treating similar individuals similarly, which means the distance between Ŷ (s i , a i ) and Ŷ (s j , a j ) should be small if individuals i and j are similar. The other branch of fairness and/or discrimination definitions are built upon the causal framework of Pearl (2009a), such as direct/indirect discrimination (Zhang et al., 2017; Nabi & Shpitser, 2018) , path-specific effect (Wu et al., 2019b) , counterfactual error rate (Zhang & Bareinboim, 2018a) and counterfactual fairness (Kusner et al., 2017; Wang et al., 2019; Wu et al., 2019a) . These definitions often involve the notion of counterfactuals, which means what the attributes or decision would be

