CONFIDENTIAL-PROFITT: CONFIDENTIAL PROOF OF FAIR TRAINING OF TREES

Abstract

Post hoc auditing of model fairness suffers from potential drawbacks: (1) auditing may be highly sensitive to the test samples chosen; (2) the model and/or its training data may need to be shared with an auditor thereby breaking confidentiality. We address these issues by instead providing a certificate that demonstrates that the learning algorithm itself is fair, and hence, as a consequence, so too is the trained model. We introduce a method to provide a confidential proof of fairness for training, in the context of widely used decision trees, which we term Confidential-PROFITT. We propose novel fair decision tree learning algorithms along with customized zero-knowledge proof protocols to obtain a proof of fairness that can be audited by a third party. Using zero-knowledge proofs enables us to guarantee confidentiality of both the model and its training data. We show empirically that bounding the information gain of each node with respect to the sensitive attributes reduces the unfairness of the final tree. In extensive experiments on the COMPAS, Communities and Crime, Default Credit, and Adult datasets, we demonstrate that a company can use Confidential-PROFITT to certify the fairness of their decision tree to an auditor in less than 2 minutes, thus indicating the applicability of our approach. This is true for both the demographic parity and equalized odds definitions of fairness. Finally, we extend Confidential-PROFITT to apply to ensembles of trees.

1. INTRODUCTION

The deployment of machine learning models in high-stake decision systems (Waddell, 2016; Benjamens et al., 2020; Kleinberg et al., 2018) is associated with the risk of unfair decisions towards particular subgroups defined by sensitive attributes (Dwork et al., 2012) . A canonical approach for auditing such deployment is to measure the fairness of a trained model on a reference dataset (Pentyala et al., 2022) . In practice, this would be done by an external party (i.e., an auditor). Such audits, however, can be difficult to organize and are sensitive to the choice of reference dataset (Fukuchi et al., 2020) . This may lead to a form of unhelpful interaction between the company and the auditor, in which the company could deny a model is unfair by claiming that the reference dataset does not belong to the training distribution used, or the auditor can forge a reference dataset that could be used to blame the company for unfair predictions. One avenue to address this problem would be for the company to release its training data and the model to the auditor who can then verify that a fair training algorithm was used by e.g., locally rerunning the training process. However, this approach does not protect the confidentiality of the company's training data. In this paper, we remediate these issues by introducing confidential proofs of fair training. We highlight that our method does not guarantee fairness. Rather, our approach employs a tunable parameter controlling the resulting degree of fairness. The certificate we provide proves that our approach was employed, and also includes the specific parameter value used, and the resulting fairness metrics on the training data. We call this approach "fairness-aware training", or "fair training" for short. Concretely, we design a framework (i.e., Confidential-PROFITT) that allows a company to directly prove to the auditor, through the execution of a cryptographic protocol, that the learning algorithm used to train the model was fair by design. To achieve this without revealing the company's dataset or model to the auditor, we rely on zero knowledge (ZK) proofs (Goldwasser et al., 1985; Goldreich et al., 1991) , which allow a party to prove statements about their private data without revealing it. Our framework is generic in the sense that it allows the company to prove to interested parties (e.g., users in addition to the auditor) the fairness of the model learning process-thus increasing public trust in the model. We instantiate the confidential proof of fair training in the context of decision trees, which are widely used by companies in sensitive domains such as healthcare (Podgorelec et al., 2002) and finance (Ghatasheh, 2014; Güntay et al., 2022) , in part due to their performance, in their ability to be leveraged in ensemble methods (e.g.,, random forests), and also, occasionally, due to their assumed interpretability (Molnar, 2020) . We propose a cryptographic protocol that can prove the fair training of decision trees with various common fairness definitions including demographic parity (Calders et al., 2009) and equalized odds (Hardt et al., 2016) . Fair decision tree learning builds a decision tree iteratively by splitting its dataset according to a criterion capturing the information gain with respect to both the class and sensitive attributes. In particular, our criterion addresses issues of prior fair training algorithms by connecting information gain with respect to the sensitive attribute to demographic parity. To render the verification more efficient, once it is implemented cryptographically, we propose a co-design involving concepts originating from both machine learning and cryptography. First, we design a ZK-friendly fair decision tree learning algorithm based on the insight that verifying fairness is sufficient because the accuracy is of no interest for the auditor. Prior works (Kamiran et al., 2010; Raff et al., 2018) on fair decision tree learning consider only demographic parity and view the problem as as a joint optimization problem between accuracy and demographic parity. The optimal solution is obtained by searching exhaustively through an enumeration of all possible attributes and split points. However, in ZK, the whole exhaustive search would need to be proven using heavy cryptographic machinery, which is computationally expensive in most settings of interest. To address this, we propose a generalized framework that formulates training as a constrained optimization problem, in which fairness is represented as a constraint and the accuracy is the objective of optimization. As a result, we only need to verify the satisfaction of the constraint in ZK without verifying the accuracy. Second, we propose an efficient ZK protocol that can perform the aforementioned fairness constraint verification efficiently using state-of-the-art ZK protocols in the RAM model (Franzese et al., 2021) . Instead of using a naïve verification approach that requires proving a computation as long as the training process, we design a fairness verification protocol with complexity sublinear to the training phase. In summary, we propose a novel way of auditing model fairness by proving directly that the training algorithm itself is fair rather than inspecting the model and its predictions. As a consequence, we do not need a reference dataset and both the training data and the model remain confidential, i.e., are not disclosed to the auditor. We highlight the following contributions: 1. We propose a new ZK-friendly fair decision tree learning algorithm such that its fairness can be verified efficiently without repeating the entire training process. We summarize existing fairness metrics and show that our approach is generally applicable to these metrics. 2. We design and implement a specialized ZK proof protocol to efficiently verify the fairness of the above training algorithm. While prior works exist in secure inference using ZK proofs, to the best of our knowledge we are the first to propose secure training using ZK proofs. 3. We implement and evaluate our framework in terms of accuracy, fairness, running time, and scalability using decision trees and random forests trained on a variety of real-world datasets. For example on the Community and Crime dataset, Confidential-PROFITT can provide a certificate in less than 9 seconds that the company employed our fair training algorithm and obtained a decision tree. In this example, our fair training algorithm can improve the equalized odds fairness, which is not supported by prior works (Kamiran et al., 2010; Raff et al., 2018) , by 50% with negligible effects on the accuracy. In our implementation, Confidential-PROFITT provides the same certificate for random forests in 70 seconds.

2. PROBLEM STATEMENT, BACKGROUND, AND RELATED WORK

Problem statement. Ignoring fairness in the training process may result in models that negatively affect users belonging to specific subgroups with respect to a sensitive attribute such as gender, race or disability (Kamiran & Calders, 2009; Raff et al., 2018) . For example, Buolamwini & Gebru (2018) demonstrated that darker-skinned women are the most misclassified demographic group in

