STATISTICAL INFERENCE FOR INDIVIDUAL FAIRNESS

Abstract

As we rely on machine learning (ML) models to make more consequential decisions, the issue of ML models perpetuating or even exacerbating undesirable historical biases (e.g. gender and racial biases) has come to the fore of the public's attention. In this paper, we focus on the problem of detecting violations of individual fairness in ML models. We formalize the problem as measuring the susceptibility of ML models against a form of adversarial attack and develop a suite of inference tools for the adversarial cost function. The tools allow auditors to assess the individual fairness of ML models in a statistically-principled way: form confidence intervals for the worst-case performance differential between similar individuals and test hypotheses of model fairness with (asymptotic) non-coverage/Type I error rate control. We demonstrate the utility of our tools in a real-world case study.

1. INTRODUCTION

The problem of bias in machine learning systems is at the forefront of contemporary ML research. Numerous media outlets have scrutinized machine learning systems deployed in practice for violations of basic societal equality principles (Angwin et al., 2016; Dastin, 2018; Vigdor, 2019) . In response researchers developed many formal definitions of algorithmic fairness along with algorithms for enforcing these definitions in ML models (Dwork et al., 2011; Hardt et al., 2016; Berk et al., 2017; Kusner et al., 2018; Ritov et al., 2017; Yurochkin et al., 2020) . Despite the flurry of ML fairness research, the basic question of assessing fairness of a given ML model in a statistically principled way remains largely unexplored. In this paper we propose a statistically principled approach to assessing individual fairness (Dwork et al., 2011) of ML models. One of the main benefits of our approach is it allows the investigator to calibrate the method; i.e. it allows the investigator to prescribe a Type I error rate. Passing a test that has a guaranteed small Type I error rate is the usual standard of proof in scientific investigations because it guarantees the results are reproducible (to a certain degree). This is also highly desirable in detecting bias in ML models because it allows us to certify whether an ML model will behave fairly at test time. Our method for auditing ML models abides by this standard. There are two main challenges associated with developing a hypothesis test for individual fairness. First, how to formalize the notion of individual fairness in an interpretable null hypothesis? Second, how to devise a test statistic and calibrate it so that auditors can control the Type I error rate? In this paper we propose a test motivated by the relation between individual fairness and adversarial robustness (Yurochkin et al., 2020) . At a high-level, our approach consists of two parts: 2020) also use the difference between the empirical and distributionally robust risks as a test statistic, but their test is only applicable to ML problems with finite feature spaces. This limitation severely restricts the applicability of their test. On the other hand, our test is suitable for ML problems with continuous features spaces. We note that the technical exposition in Xue et al. ( 2020) is dependant on the finite feature space assumption and in this work we develop a novel perspective of the problem that allows us to handle continuous feature spaces.

2. GRADIENT FLOW FOR FINDING UNFAIR EXAMPLES

In this section, we describe a gradient flow-based approach to finding unfair examples that form the basis of our suite of inferential tools. Imagine an auditor assessing whether an ML model is fair or not  d y (f (x 1 ), f (x 2 )) ≤ L fair d x (x 1 , x 2 ) for all x 1 , x 2 ∈ X (2.1) for some Lipschitz constant L fair > 0. Here d x and d y are metrics on X and Y respectively. Intuitively, individually fair ML model treats similar samples similarly, and the fair metric d x encodes our intuition of which samples should be treated similarly. We should point out that d x (x 1 , x 2 ) being small does not imply x 1 and x 2 are similar in all aspects. Even if d x (x 1 , x 2 ) is small, x 1 and x 2 may differ much in certain attributes, e.g., protected/sensitive attributes. Before moving on, we comment on the choice of the fair metric (2.3)



. The auditor aims to detect violations of individual fairness in the ML model. Recall Dwork et al. (2011)'s definition of individual fairness. Let X ⊂ R d and Y ⊂ R d be the input and output spaces respectively, and f : X → Y be an ML model to audit. The ML model f is known as individually fair if

d x . This metric is picked by the auditor and reflects the auditor's intuition about what is fair and what is unfair for the ML task at hand. It can be provided by a subject expert (this is Dwork et al. (2011)'s original recommendation) or learned from data (this is a recent approach advocated by Ilvento (2019); Wang et al. (2019); Mukherjee et al. (2020)). Section 4 provides details of picking a fair metric in our empirical studies. To motivate our approach, we recall the distributionally robust optimization (DRO) approach to training individually fair ML models (Yurochkin et al., 2020). Let f : X → Y be an ML model and (f (x), y) : Z → R + be any smooth loss (e.g. cross-entropy loss). To search for differential treatment in the ML model, Yurochkin et al. (2020) solve the optimization problem max P :W (P,Pn)≤ Z (f (x), y)dP (z), (2.2) where W is the Wasserstein distance on probability distributions on feature space induced by the fair metric, P n is the empirical distribution of the training data, and is a moving budget that ensures the adversarial examples are close to the (original) training examples in the fair metric. Formally, this search for differential treatment checks for violations of distributionally robust fairness. Definition 2.1 (distributionally robust fairness (DRF) (Yurochkin et al., 2020)). An ML model h : X → Y is ( , δ)-distributionally robustly fair (DRF) WRT the fair metric d x iff sup P :W (P,Pn)≤ Z (z, h)dP (z) -Z (z, h)dP n (z) ≤ δ.

. summarizing the behavior of the ML model on unfair examples: We propose a loss-ratio based approach that is not only scale-free, but also interpretable. For classification problems, we propose a variation of our test based on the error rates ratio.1.1 RELATED WORKAt a high level, our approach is to use the difference between the empirical risk and the distributionally robust risk as a test statistic. The distributionally robust risk is the maximum risk of the ML model on similar training examples. Here similarity is measured by a fair metric that encodes our intuition of which inputs should be treated similarly by the ML model. We note that DRO has been extensively studied in the recent literature(Duchi et al., 2016; Blanchet & Murthy, 2016; Hashimoto et al., 2018), however outside of the fairness context with the exception of Yurochkin et al. (2020); Xue et al. (2020). Yurochkin et al. (2020) focus on training fair or robust ML models instead of auditing ML models.

