FAIRER: FAIRNESS AS DECISION RATIONALE ALIGNMENT

Abstract

Deep neural networks (DNNs) have achieved remarkable accuracy, but they often suffer from fairness issues, as deep models typically show distinct accuracy differences among some specific subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define neuron parity scores to characterize the fair decision process of networks by analyzing neuron behaviors in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can improve fairness while maintaining high accuracy and outperforming other baselines by a large margin. We have released our codes at https://anonymous.4open.

1. INTRODUCTION

In the current society, there is a desperate desire for social fairness among individuals. However, as deep learning is increasingly adopted for many applications that have brought convenience to our daily lives (He et al., 2016; Devlin et al., 2019; Deng et al., 2013) , DNNs still suffer from the fairness problem and often exhibit undesirable discrimination behaviors (News, 2021; 2020) . For example, for an intelligent task (e.g., salary prediction), a trained DNN easily presents distinct accuracy values in different subgroups (e.g., male and female). The discriminatory behaviors contradict with people's growing demand for fairness, which would cause severe social consequences. To alleviate such fairness problems, a line of mitigation strategies has been constantly proposed (Zemel et al., 2013; Sarhan et al., 2020; Wang et al., 2019) . A direct regularization method to improve fairness is to relax fairness metrics as constraints in the training process (Madras et al., 2018) . This regularization method is designed to reduce the disparities between different subgroups in the training and testing data (See Fig. 1 (a) vs. (b)). Although this method easily improves the fairness of DNN models, it is still unclear how the trained network makes a fair decision. For example, we do not know how the fairness regularization terms actually affect the final network parameters and let them make a fair prediction. Without such an understanding, we would not know the effective direction for further fairness enhancement. Existing work does not address this question and the majority of them concentrate on the last-layer outputs (i.e., predictions) while ignoring the internal process. In this work, we propose to study the fairness from the perspective of decision rationale and analyze existing fairness-regularized methods through a decision-rationale-aware analysis method. The term 'decision rationale' is known as the reason for making a decision and could be represented as the behaviors of neurons in a DNN (Khakzar et al., 

funding

science/r/fairer_submission-F176

