FAIRGBM: GRADIENT BOOSTING WITH FAIRNESS CONSTRAINTS

Abstract

Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance losses while taking considerably longer to train. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we propose smooth convex error rate proxies for common fairness criteria, enabling gradient-based optimization using a "proxy-Lagrangian" formulation. Our implementation 1 shows an order of magnitude speedup in training time relative to related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.

1. INTRODUCTION

The use of Machine Learning (ML) algorithms to inform consequential decision-making has become ubiquitous in a multitude of high-stakes mission critical applications, from financial services to criminal justice or healthcare (Bartlett et al., 2019; Brennan et al., 2009; Tomar & Agarwal, 2013) . At the same time, this widespread adoption of ML was followed by reports surfacing the risk of bias and discriminatory decision-making affecting people based on ethnicity, gender, age, and other sensitive attributes (Angwin et al., 2016; Bolukbasi et al., 2016; Buolamwini & Gebru, 2018) . This awareness led to the rise of Fair ML, a research area focused on discussing, measuring and mitigating the risk of bias and unfairness in ML systems. Despite the rapid pace of research in Fair ML (Hardt et al., 2016; Zafar et al., 2017; Agarwal et al., 2018; Narasimhan et al., 2019; Celis et al., 2021) and the release of several open-source software packages (Saleiro et al., 2018; Bellamy et al., 2018; Agarwal et al., 2018; Cotter et al., 2019b) , there is still no clear winning method that "just works" regardless of data format and bias conditions. Fair ML methods are usually divided into three families: pre-processing, in-processing and postprocessing. Pre-processing methods aim to learn an unbiased representation of the training data but may not guarantee fairness in the end classifier (Zemel et al., 2013; Edwards & Storkey, 2016) ; while post-processing methods inevitably require test-time access to sensitive attributes and can be suboptimal depending on the structure of the data (Hardt et al., 2016; Woodworth et al., 2017) . Most inprocessing Fair ML methods rely on fairness constraints to prevent the model from disproportionately hurting protected groups (Zafar et al., 2017; Agarwal et al., 2018; Cotter et al., 2019b) . Using constrained optimization, we can optimize for the predictive performance of fair models. In principle, in-processing methods have the potential to introduce fairness with no training-time overhead and minimal predictive performance cost -an ideal outcome for most mission critical applications, such as financial fraud detection or medical diagnosis. Sacrificing a few percentage points of predictive performance in such settings may result in catastrophic outcomes, from safety hazards to substantial monetary losses. Therefore, the use of Fair ML in mission critical systems is particularly challenging, as fairness must be achieved with minimal performance drops. Tabular data is a common data format in a variety of mission critical ML applications (e.g., financial services). While deep learning is the dominant paradigm for unstructured data, gradient boosted decision trees (GBDT) algorithms are pervasive in tabular data due their state-of-the art performance and the availability of fast, scalable, ready-to-use implementations, e.g., LightGBM (Ke et al., 2017) or XGBoost (Chen & Guestrin, 2016) . Unfortunately, Fair ML research still lacks suitable fairnessconstrained frameworks for GBDT, making it challenging to satisfy stringent fairness requirements. As a case in point, Google's TensorFlow Constrained Optimization (TFCO) (Cotter et al., 2019b) , a well-known in-processing bias mitigation technique, is only compatible with neural network models. Conversely, Microsoft's ready-to-use fairlearn EG framework (Agarwal et al., 2018) supports GBDT models, but carries a substantial training overhead, and can only output binary scores instead of a continuous scoring function, making it inapplicable to a variety of use cases. Particularly, the production of binary scores is incompatible with deployment settings with a fixed budget for positive predictions (e.g., resource constraint problems (Ackermann et al., 2018) ) or settings targeting a specific point in the ROC curve (e.g., fixed false positive rate), such as in fraud detection. To address this gap in Fair ML, we present FairGBM, a framework for fairness constrained optimization tailored for GBDT. Our method incorporates the classical method of Lagrange multipliers within gradient-boosting, requiring only the gradient of the constraint w.r.t. (with relation to) the model's output Ŷ . Lagrange duality enables us to perform this optimization process efficiently as a two-player game: one player minimizes the loss w.r.t. Ŷ , while the other player maximizes the loss w.r.t. the Lagrange multipliers. As fairness metrics are non-differentiable, we employ differentiable proxy constraints. Our method is inspired by the theoretical ground-work of Cotter et al. (2019b) , which introduces a new "proxy-Lagrangian" formulation and proves that a stochastic equilibrium solution does exist even when employing proxy constraints. Contrary to related work, our approach does not require training extra models, nor keeping the training iterates in memory. We apply our method to a real-world account opening fraud case study, as well as to five public benchmark datasets from the fairness literature (Ding et al., 2021) . Moreover, we enable fairness constraint fulfillment at a specific ROC point, finding fair models that fulfill business restrictions on the number of allowed false positives or false negatives. This feature is a must for problems with high class imbalance, as the prevailing approach of using a decision threshold of 0.5 is only optimal when maximizing accuracy. When compared with state-of-the-art in-processing fairness interventions, our method consistently achieves improved predictive performance for the same value of fairness. In summary, this work's main contributions are: • A novel constrained optimization framework for gradient-boosting, dubbed FairGBM. • Differentiable proxy functions for popular fairness metrics based on the cross-entropy loss. • A high-performance implementation 1 of our algorithm. • Validation on a real-world case-study and five public benchmark datasets (folktables).

2. FAIRGBM FRAMEWORK

We propose a fairness-aware variant of the gradient-boosting training framework, dubbed FairGBM. Our method minimizes predictive loss while enforcing group-wise parity on one or more error rates. We focus on the GBDT algorithm, which uses regression trees as the base weak learners (Breiman, 1984) . Moreover, the current widespread use of GBDT is arguably due to two highly scalable variants of this algorithm: XGBoost (Chen & Guestrin, 2016) and LightGBM (Ke et al., 2017) . In this work we provide an open-source fairness-aware implementation of LightGBM. Our work is, however, generalizable to any gradient-boosting algorithm, and to any set of differentiable constraints (not limited to fairness constraints). We refer the reader to Appendix G for notation disambiguation. (1)



https://github.com/feedzai/fairgbm



2.1 OPTIMIZATION UNDER FAIRNESS CONSTRAINTS Constrained optimization (CO) approaches aim to find the set of parameters θ ∈ Θ that minimize the standard predictive loss L of a model f θ given a set of m fairness constraints c i , i ∈ {1, ..., m}: θ * = arg min θ∈Θ L(θ) s. t. i∈{1,...,m} c i (θ) ≤ 0.

