COUNTERFACTUAL FAIRNESS THROUGH DATA PREPROCESSING

Abstract

Machine learning has become more important in real-life decision-making but people are concerned about the ethical problems it may bring when used improperly. Recent work brings the discussion of machine learning fairness into the causal framework and elaborates on the concept of Counterfactual Fairness. In this paper, we develop the Fair Learning through dAta Preprocessing (FLAP) algorithm to learn counterfactually fair decisions from biased training data and formalize the conditions where different data preprocessing procedures should be used to guarantee counterfactual fairness. We also show that Counterfactual Fairness is equivalent to the conditional independence of the decisions and the sensitive attributes given the processed non-sensitive attributes, which enables us to detect discrimination in the original decision using the processed data. The performance of our algorithm is illustrated using simulated data and real-world applications.

1. INTRODUCTION

The rapid popularization of machine learning methods and the growing availability of personal data have enabled decision-makers from various fields such as graduate admission (Waters & Miikkulainen, 2014) , hiring (Ajunwa et al., 2016) , credit scoring (Thomas, 2009) , and criminal justice (Brennan et al., 2009) to make data-driven decisions efficiently. However, the community and the authorities have also raised concern that these automatically learned decisions may inherit the historical bias and discrimination from the training data and would cause serious ethical problems when used in practice (Nature Editorial, 2016; Angwin & Larson, 2016; Dwoskin, 2015; Executive Office of the President et al., 2016) . Consider a training dataset D consisting of sensitive attributes S such as gender and race, nonsensitive attributes A and decisions Y . If the historical decisions Y are not fair across the sensitive groups, a powerful machine learning algorithm will capture this pattern of bias and yield learned decisions Ŷ that mimic the preference of the historical decision-maker, and it is often the case that the more discriminative an algorithm is, the more discriminatory it might be. While researchers agree that methods should be developed to learn fair decisions, opinions vary on the quantitative definition of fairness. In general, researchers use either the observational or counterfactual approaches to formalize the concept of fairness. The observational approaches often describe fairness with metrics of the observable data and predicted decisions (Hardt et al., 2016; Chouldechova, 2017; Yeom & Tschantz, 2018) . For example, Demographic Parity (DP) or Group Fairness (Zemel et al., 2013; Khademi et al., 2019) considers the learned decision Ŷ to be fair if it has the same distribution for different sensitive groups, i.e., P ( Ŷ |S = s) = P ( Ŷ |S = s ). The Individual Fairness (IF) definition (Dwork et al., 2012) views fairness as treating similar individuals similarly, which means the distance between Ŷ (s i , a i ) and Ŷ (s j , a j ) should be small if individuals i and j are similar. The other branch of fairness and/or discrimination definitions are built upon the causal framework of Pearl (2009a), such as direct/indirect discrimination (Zhang et al., 2017; Nabi & Shpitser, 2018) , path-specific effect (Wu et al., 2019b) , counterfactual error rate (Zhang & Bareinboim, 2018a) and counterfactual fairness (Kusner et al., 2017; Wang et al., 2019; Wu et al., 2019a) . These definitions often involve the notion of counterfactuals, which means what the attributes or decision would be if an individual were in a different sensitive group. With the help of the potential outcome concept, the measuring of fairness is no longer restricted to the observable quantities (Kilbertus et al., 2017; Zhang & Bareinboim, 2018b) . For example, the Equal Opportunity (EO) definition Wang et al. ( 2019) has the same idea as IF but it can directly compare the actual and counterfactual decisions of the same individual instead of the actual decisions of two similar individuals. The Counterfactual Fairness (CF) definition (Kusner et al., 2017) or equivalently, the Affirmative Action (AA) definition (Wang et al., 2019) goes one step further than EO and derives the counterfactual decisions from the counterfactual non-sensitive attributes. We adopt CF as our definition of fairness and it is formally described in Section 2. We believe causal reasoning is the key to fair decisions as DeDeo (2014) pointed out that even the most successful algorithms would fail to make fair judgments due to the lack of causal reasoning ability. For the observational definitions, fair decisions can be learned by solving optimization problems, either adding the fairness condition as a constraint (Dwork et al., 2012) or directly optimize the fairness metric as an object (Zemel et al., 2013) . When using the counterfactual definitions, however, an approximation of the causal model or the counterfactuals is often needed since the counterfactuals are unobservable. In the FairLearning algorithm proposed by Kusner et al. (2017) , the unobserved parts of the graphical causal model are sampled using the Markov chain Monte Carlo method. Then they use only the non-descendants of S to learn the decision, which ensures CF but will have a low prediction accuracy. In Wang et al. ( 2019), the counterfactual of A had S been s is imputed as the sum of the counterfactual group mean E(A|S = s ) and the residuals from the original group A -E(A|S = s). As we discuss later, this approach would only work when a strong assumption of the relationship between A and S is satisfied.

1.1. CONTRIBUTIONS

We develop the Fair Learning through dAta Preprocessing (FLAP) algorithm to learn counterfactually fair decisions from biased training data. While current literature is vague about the assumptions needed for their algorithms to achieve fairness, we formalize the weak and strong conditions where different data preprocessing procedures should be used to guarantee CF and prove the results under the causal framework of Pearl (2009a) . We show that our algorithm can predict fairer decisions with similar accuracy when compared with other counterfactual fair learning algorithms using three simulated datasets and three real-world applications, including the loan approval data from a fintech company, the adult income data, and the COMPAS recidivism data. On the other hand, the processed data also enable us to detect discrimination in the original decision. We prove that CF is equivalent to the conditional independence of the decisions and the sensitive attributes given the processed non-sensitive attributes under certain conditions. Therefore any wellestablished conditional independence tests can be used to test CF with the processed data. To our knowledge, it is the first time that a formal statistical test for CF is proposed. We illustrate the idea using the Conditional Distance Correlation test (Wang et al., 2015) in our simulation and test the fairness of the decisions in the loan approval data using a parametric test.

2. CAUSAL MODEL AND COUNTERFACTUAL FAIRNESS

For the discussion below, we consider the sensitive attributes S ∈ S to be categorical, which is a reasonable restriction for the commonly discussed sensitive information such as race and gender. The non-sensitive attributes A ∈ A ⊆ R d , and the decision Y is binary as admit or not in graduate admission, hire or not in the hiring process, approve or not in loan assessment. 



S (U S ), A = f A (S, U A ), Y = f Y (S, A, U Y ), Ŷ = f Ŷ (S, A, U Ŷ ) = 1{U Ŷ < p(S, A)}.

Figure 1: Structural causal model.

