FAIRER: FAIRNESS AS DECISION RATIONALE ALIGNMENT

Abstract

Deep neural networks (DNNs) have achieved remarkable accuracy, but they often suffer from fairness issues, as deep models typically show distinct accuracy differences among some specific subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define neuron parity scores to characterize the fair decision process of networks by analyzing neuron behaviors in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can improve fairness while maintaining high accuracy and outperforming other baselines by a large margin. We have released our codes at https://anonymous.4open.

1. INTRODUCTION

In the current society, there is a desperate desire for social fairness among individuals. However, as deep learning is increasingly adopted for many applications that have brought convenience to our daily lives (He et al., 2016; Devlin et al., 2019; Deng et al., 2013) , DNNs still suffer from the fairness problem and often exhibit undesirable discrimination behaviors (News, 2021; 2020) . For example, for an intelligent task (e.g., salary prediction), a trained DNN easily presents distinct accuracy values in different subgroups (e.g., male and female). The discriminatory behaviors contradict with people's growing demand for fairness, which would cause severe social consequences. To alleviate such fairness problems, a line of mitigation strategies has been constantly proposed (Zemel et al., 2013; Sarhan et al., 2020; Wang et al., 2019) . A direct regularization method to improve fairness is to relax fairness metrics as constraints in the training process (Madras et al., 2018) . This regularization method is designed to reduce the disparities between different subgroups in the training and testing data (See Fig. 1 (a) vs. (b)). Although this method easily improves the fairness of DNN models, it is still unclear how the trained network makes a fair decision. For example, we do not know how the fairness regularization terms actually affect the final network parameters and let them make a fair prediction. Without such an understanding, we would not know the effective direction for further fairness enhancement. Existing work does not address this question and the majority of them concentrate on the last-layer outputs (i.e., predictions) while ignoring the internal process. In this work, we propose to study the fairness from the perspective of decision rationale and analyze existing fairness-regularized methods through a decision-rationale-aware analysis method. The term 'decision rationale' is known as the reason for making a decision and could be represented as the behaviors of neurons in a DNN (Khakzar et al., 2021). Specifically, for each intermediate neuron (i.e., a parameter of the DNN), we can calculate the loss change on a subgroup before and after removing the neuron. As a result, we can characterize the decision rationale of a network on the subgroup by collecting the loss changes of all neurons. For example, the solid green and yellow lines in Fig. 1 represent the neurons leading to high loss changes at each layer and characterize the decision rationales of the two subgroups. Then, we define the neuron parity score as the decision rationale shifting across different subgroups, which actually reveals the influences of intermediate neurons (i.e., parameters) to the decision rationale changes. With the new analysis tool, we find that the network fairness is directly related to the consistency of the decision rationales on different subgroups and existing fairness regularization terms could partially achieve this goal (Compare the solid lines in Fig. 1 (b )) since they only add constraints to the final outputs. Intuitively, we could define new regularization terms to minimize parity scores of all neurons and encourage them to have similar behaviors across subgroups. We name this new task as the decision rationale alignment that requires DNNs to have consistent decision rationales as well as final predictions on different subgroups. Although straightforward, the task is challenging for two reasons: First, the decision rationale and parity score are defined based on a dataset and it is impractical to calculate them at each iteration during the training process. Second, different neurons have different effects on fairness and such differences should be carefully considered. To address the above two challenges, we propose the gradient-guided parity alignment method by relaxing the calculation of decision rationale from the dataset-based strategy to the sample-based one. As a result, the corresponding regularization term is compatible with the epoch-based training process. Moreover, we use the first-order Taylor expansion to approximate the parity score between decision rationales and the effects of different neurons to the fairness are weighted via their gradient magnitudes automatically. Overall, the proposed method can achieve much higher fairness than state-of-the-art methods. In summary, the work makes the following contributions: 1. To understand how a network makes a fair decision, we define neuron parity score to characterize the decision rationales of the network on different subgroups. We reveal that the fairness of a network is directly related to the consistency of its decision rationales on different subgroups and existing regularization terms cannot achieve this goal. 2. To train a fairer network, we formulate the decision rationale alignment task and propose the gradient-guided parity alignment method to solve it by addressing the complex optimization challenges. 3. Extensive experiments on three public datasets, i.e., Adult, CelebA, and Credit, demonstrate that our method can enhance the fairness of DNNs effectively and outperform others largely.

2. RELATED WORK

Fairness in deep learning. There are different methods to evaluate fairness in deep learning, among which individual fairness (Zhang et al., 2020; 2021; George John et al., 2020) , group fairness (Louppe et al., 2016; Moyer et al., 2018; Gupta et al., 2021; Garg et al., 2020) , and counterfactual fairness (Kusner et al., 2017) are the mainstream. We focus on group fairness which is derived by calculating and comparing the predictions for each group. There is a line of work dedicated to alleviating unjustified



Figure 1: Schematic diagrams of two existing solutions and the proposed one. (a) and (b) represent results of the standard trained network and the regularized fairness network. (c) show the results of the decision rationalealigned network.

funding

science/r/fairer_submission-F176

