TIER BALANCING: TOWARDS DYNAMIC FAIRNESS OVER UNDERLYING CAUSAL FACTORS

Abstract

The pursuit of long-term fairness involves the interplay between decision-making and the underlying data generating process. In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. We propose Tier Balancing, a technically more challenging but more natural notion to achieve in the context of long-term, dynamic fairness analysis. Different from previous fairness notions that are defined purely on observed variables, our notion goes one step further, capturing behind-the-scenes situation changes on the unobserved latent causal factors that directly carry out the influence from the current decision to the future data distribution. Under the specified dynamics, we prove that in general one cannot achieve the long-term fairness goal only through one-step interventions. Furthermore, in the effort of approaching long-term fairness, we consider the mission of "getting closer to" the long-term fairness goal and present possibility and impossibility results accordingly.

1. INTRODUCTION

The long-term fairness endeavor inevitably involves the interplay between decision policies and the underlying data generating process: when deriving a decision-making system, one usually makes use of data at hand; when we deploy such a system, the decision would impact how data will look in the future (Perdomo et al., 2020; Liu et al., 2021) . To understand why and how a data distribution responds to decision-making strategies, the investigation has to resort to causal modeling. The pursuit of long-term fairness, in turn, should also consider the changes in the underlying causal factors. Various fairness notions with different flavors have been proposed in the literature: associative fairness notions that capture the correlation or dependence between variables, e.g., Demographic Parity (Calders et al., 2009 ), Equalized Odds (Hardt et al., 2016) ; causal fairness notions that involve modeling causal relations between variables, e.g., Counterfactual Fairness (Kusner et al., 2017; Russell et al., 2017 ), Path-Specific Counterfactual Fairness (Chiappa, 2019; Wu et al., 2019) , Causal Multi-Level Fairness (Mhasawade & Chunara, 2021). The previously proposed fairness notions are with respect to a snapshot of the static reality, and do not have a built-in capacity to model the distribution-decision interplay in the long-term fairness pursuit. In the effort of enforcing fairness in the dynamic setting, researchers have approached the problem from different angles: they provide causal modeling for fairness notions (Creager et al., 2020) , analyze the delayed impact or downstream effect on utilities (Liu et al., 2018; Heidari et al., 2019; Kannan et al., 2019; Nilforoshan et al., 2022) , enforce fairness in sequential or online decision-making (Joseph et al., 2016; Liu et al., 2017; Hashimoto et al., 2018; Heidari & Krause, 2018; Bechavod et al., 2019) , investigate the relation between the long-term population qualification and fair decisions (Zhang et al., 2020) , take into consideration the user behavior/action when deriving a decision policy (Zhang et al., 2019; Ustun et al., 2019; Miller et al., 2020; von Kügelgen et al., 2022) , provide fairness transferability guarantee across domains (Schumann et al., 2019; Singh et al., 2021) , or derive robust fair predictors (Coston et al., 2019; Rezaei et al., 2021) . The proposed dynamic fairness enforcing procedures usually limit their scope of consideration to only observed variables, and the fairness audit is performed directly on the decision or statistics defined on observed data. In order to have a built-in capacity to capture the influence from the current decision to future data distributions, and more importantly, to induce a fair future in the long run, in this paper, we propose Tier Balancing, a long-term fairness notion that characterizes the interplay between decision-making and data dynamics through a detailed causal modeling with a directed acyclic graph (DAG). For example, the latent socio-economic status (whose estimation can be the output of a FICO credit score model), although not directly measurable, plays an important role in credit applications. We are motivated by the goal of inducing a fair future by actually balancing the inherent socio-economic status, i.e., the "tier", of agents from different groups. We summarize our contributions as follows: • We formulate Tier Balancing, a fairness notion from the dynamic and long-term perspective that characterizes the decision-distribution interplay with a detailed causal modeling over both observed variables and latent causal factors. • Under the specified data dynamics, we prove that in general, one cannot directly achieve the long-term fairness goal only through a one-step intervention, i.e., static decision-making. • We consider the possibility of getting closer to the long-term fairness goal through a sequence of algorithmic interventions, and present possibility and impossibility results derived from the one-step analysis of the decision-distribution interplay.

2. PROBLEM SETUP

In this section, we present the formulation of the problem of interest. We first demonstrate in Section 2.1 a detailed causal modeling of the interplay between decision-making and data dynamics. Then in Section 2.2, we formulate Tier Balancing, a long-term fairness notion that captures the decision-distribution interplay with the presented causal modeling.

2.1. CAUSAL MODELING OF DECISION-DISTRIBUTION INTERPLAY ON DAG

Let us denote the time step as T with domain of value N + . At time step T , let us denote the protected feature as A T with domain of value A = {0, 1}, additional feature(s) as X T,i with domain of value X i , the (unmeasured) underlying causal factor H T (we call it "tier") with domain of value H = (0, 1], the (unobserved) ground truth label Y (ori)

T

and the observed label Y (obs) , with domain of value Y = {0, 1}, and the decision D T with domain of value D = {0, 1}. Figure 1 shows the causal modeling of the interplay between decision-making and underlying data generating processes, which involves multiple dynamics (from T = t to T = t + 1).foot_0  Underlying data dynamics (stationary components) Considering the fact that the underlying data dynamics are relatively stable with respect to the timescale of decision-making (e.g., the societal changes happen at a much larger time scale compared to a particular credit application decision), we assume that processes governing how (Y (ori) t , X t,i ) are generated from (H t , A t ) for each individual in the population are stationary and do not change over different T = t. We also assume that the underlying data generating process that governs how H t+1 is updated from (H t , Y (ori) t , D t ) across time steps is stationary, and so are the process governing the observation of Y (obs) t+1 given (D t , Y (ori) t+1 ) and the process governing the update of A t+1 from A t . The tier H t fully captures the individual's key property that is directly relevant to the scenario of interest, and therefore is the cause of Y (ori) t and X t,i 's instead of the other way around. For example, the improvement in the socio-economic status can be reflected through an increase in income, while manipulating one's income only by changing the recorded number does not affect the actual ability to repay the loan. The determination of causal direction aligns with causal modelings in previous literature (see, e.g., Zhang et al. 2020) . Decision-making dynamics (non-stationary components) The institution (decision maker) assigns decision D t to each individual according to the observed features (A t , X t,i ) and the outcome



Due to the space limit, we provide additional discussions on decision-distribution interplay in Appendix B.1.

