ON THE IMPORTANCE OF ARCHITECTURES AND HYPER-PARAMETERS FOR FAIRNESS IN FACE RECOGNITION Anonymous

Abstract

Face recognition systems are deployed across the world by government agencies and contractors for sensitive and impactful tasks, such as surveillance and database matching. Despite their widespread use, these systems are known to exhibit bias across a range of sociodemographic dimensions, such as gender and race. Nonetheless, an array of works proposing pre-processing, training, and post-processing methods have failed to close these gaps. Here, we take a very different approach to this problem, identifying that both architectures and hyperparameters of neural networks are instrumental in reducing bias. We first run a large-scale analysis of the impact of architectures and training hyperparameters on several common fairness metrics and show that the implicit convention of choosing high-accuracy architectures may be suboptimal for fairness. Motivated by our findings, we run the first neural architecture search for fairness, jointly with a search for hyperparameters. We output a suite of models which Pareto-dominate all other competitive architectures in terms of accuracy and fairness. Furthermore, we show that these models transfer well to other face recognition datasets with similar and distinct protected attributes. We release our code and raw result files so that researchers and practitioners can replace our fairness metrics with a bias measure of their choice.

1. INTRODUCTION

Face recognition is regularly deployed across the world by government agencies for tasks including surveillance, employment, and housing decisions. However, recent studies have shown that face recognition systems exhibit disparity in accuracy based on race and gender (Grother et al., 2019; Raji et al., 2020; Raji & Fried, 2021; Learned-Miller et al., 2020) . For example, some face recognition models were 10 or 100 times more likely to give false positives for Black or Asian people, compared to white people (Allyn, 2020) . This bias has already led to multiple false arrests and jail time for innocent Black men in the USA (Hill, 2020a) . Motivated by the discovery of bias in face recognition and other models deployed in real-world applications, dozens of definitions for fairness have been proposed (Verma & Rubin, 2018) , and many pre-processing, training, and post-processing techniques have been developed to mitigate model bias. However, these techniques have fallen short of de-biasing face recognition systems, and training fair models in this setting demands addressing several technical challenges (Cherepanova et al., 2021b) . While existing methods for de-biasing face recognition systems use a fixed neural network architecture and training hyperparameter setting, we instead ask a fundamental question which has received little attention: does model-bias stem from the architecture and hyperparameters? We further ask whether we can we exploit the extensive research in the fields of neural architecture search (NAS) (Elsken et al., 2019) and hyperparameter optimization (HPO) (Feurer & Hutter, 2019) to search for models that achieve a desired trade-off between model-bias and accuracy. In this work, we take the first step towards answering these questions. To this end, we conduct the first large-scale analysis of the relationship between hyperparameters, architectures, and bias. We train a diverse set of 29 architectures, ranging from ResNets (He et al., 2016b) to vision transformers (Dosovitskiy et al., 2020; Liu et al., 2021) to Gluon Inception V3 (Szegedy et al., 2016 ) to MobileNetV3 (Howard et al., 2019 ) on CelebA (Liu et al., 2015) , for a total of 88 493 GPU hours. We train each of these architectures across different head, optimizer, and learning rate combinations. Our results show that different architectures learn different inductive biases from the same dataset. We conclude

