LEVERAGING DOUBLE DESCENT FOR SCIENTIFIC DATA ANALYSIS: FACE-BASED SOCIAL BEHAVIOR AS A CASE STUDY

Abstract

Scientific data analysis often involves making use of a large number of correlated predictor variables to predict multiple response variables. Understanding how the predictor and response variables relate to one another, especially in the presence of relatively scarce data, is a common and challenging problem. Here, we leverage the recently popular concept of "double descent" to develop a particular treatment of the problem, including a set of key theoretical results. We also apply the proposed method to a novel experimental dataset consisting of human ratings of social traits and social decision making tendencies based on the facial features of strangers, and resolve a scientific debate regarding the existence of a "beauty premium" or "attractiveness halo," which refers to a (presumed) advantage attractive people enjoy in social situations. We demonstrate that more attractive faces indeed enjoy a social advantage, but this is indirectly due to the facial features that contribute to both perceived attractiveness and trustworthiness, and that the component of attractiveness perception due to facial features (unrelated to trustworthiness) actually elicit a "beauty penalty.". Conversely, the facial features that contribute to trustworthiness and not to attractiveness still contribute positively to pro-social trait perception and decision making. Thus, what was previously thought to be an attractiveness halo/beauty premium is actually a trustworthiness halo/premium plus a "beauty penalty." Moreover, we see that the facial features that contribute to the trustworthiness halo primarily have to do with how smiley a face is, while the facial features that contribute to attractiveness but actually acts as a beauty penalty is related to anti-correlated with age. In other words, youthfulness and smiley-ness both contribute to attractiveness, but only smiley-ness positively contributes to pro-social perception and decision making, while youthfulness actually negatively contribute to them. A further interesting wrinkle is that youthfulness as a whole does not negatively contribute to social traits/decision-making, only the component of youthfulness contributing to attractiveness does.

1. INTRODUCTION

Scientific data analysis often involves building a linear regression model between a large number of predictor variables and multiple response variables. Understanding how the predictor and response variables relate to one another, especially in the presence of relatively scarce data, is an important but challenging problem. For example, a geneticist might have a genomic dataset with many genetic features as predictor variables and disease prevalence data as response variables: the geneticist may want to know how the different types of disease are related to each other through their genetic underpinnings. Another example is that a social psychologist might have a set of face images (with many facial features) that have been rated by a relatively small set of subjects for perceived social traits and social decision making tendencies, and wants to discover how the different social traits and decision making tendencies relate to each other through the underlying facial features. A common problem encountered in these types of problems is that the large number of features relative to the number of data points typically entails some kind of dimensionality reduction and feature selection, and this process needs to be differently parameterized in order to optimize for each response variable, making direct comparison of the features underlying different response variables challenging. In the worst case, there may not be any subset of features that can predict all response variables better than chance level. Here, we leverage the "double descent" phenomenon to develop and present a novel analysis framework that obviates such issues by relying on a universal, overly parameterized feature representation. As a case study, we apply the framework to better understand the underlying facial features that contribute separately and conjointly to human trait perception and social decision making. Humans readily infer social traits, such as attractiveness and trustworthiness, from as little as a 100 ms exposure to a stranger's face (Willis & Todorov, 2006) . Though the veracity of such judgments is still an area of active research (Valla et al., 2011; Todorov et al., 2015) , such trait evaluations have been found to predict important social outcomes, ranging from electoral success (Todorov et al., 2005; Ballew & Todorov, 2007; Little et al., 2007) to prison sentencing decisions (Blair et al., 2004; Eberhardt et al., 2006) . In particular, psychologists have observed an "attractiveness halo", whereby humans tend to ascribe more positive attributes to more attractive individuals (Eagly et al., 1991; Langlois et al., 2000) , and economists have observed a related phenomenon, the "beauty premium", whereby more attractive individuals out-earn less attractive individuals in economics games (Mobius & Rosenblat, 2006) . However, these claims are not without controversy (Andreoni & Petrie, 2008; Willis & Todorov, 2006) , as more attractive people can also incur a "beauty penalty" in certain situation. Moreover, a robust correlation between attractiveness and trustworthiness (Willis & Todorov, 2006; Oosterhof & Todorov, 2008; Xu et al., 2012; Ryali et al., 2020) has also been reported, making it unclear how much of the attractiveness halo effect might be indirectly due to perceived trustworthiness. To tease apart the contributions of trustworthiness and attractiveness to social perception and decision-making, we perform linear regression of different responses variables, consisting of subjects' ratings of social perception and social decision-making tendencies, against features of the Active Appearance Model (AAM), a well-established computer vision model (Cootes et al., 2001) , whose features have been found to be linearly encoded by macaque face-processing neurons (Chang & Tsao, 2017) . A similar regression framework has been adopted by previous work modeling human face trait perception (Oosterhof & Todorov, 2008; Said & Todorov, 2011; Song et al., 2017; Guan et al., 2018; Ryali et al., 2020) , using features either from AAM or deep neural networks. Because the number of features is typically quite large, usually larger than the number of rated faces, previous approaches have all used some combination of dimensionality reduction and feature selection. This approach gives rise to a dilemma when one wants to compare the facial features contributing to different types of social perceptions (response variables), since the number of features that optimizes prediction accuracy for each task can be quite different (see Figure 1 ). Either one optimizes this quantity separately for each task, thus not having a common set of features to compare across; or one can fix a particular set of features for all tasks, but then having suboptimal prediction accuracy (in the worst case, perhaps worse than chance level performance). To overcome this challenge, we appeal to 'the 'double descent" (Belkin et al., 2019; 2020) trick, the use of a highly overparameterized representation (more features than data points) to achieve good performance. In particular, if we use the original AAM feature representation, while foregoing any kind of dimensionality reduction or feature selection, then we have a universal representation that may also have great performance on all tasks, even novel tasks not seen before, or responses corresponding to predictor variable settings totally different than previously seen. While overparameterized linear regression has chiefly been used as an analytically tractable case study (Belkin et al., 2019; Xu & Hsu, 2019; Belkin et al., 2020) to gain insight into the theoretical basis and properties of "double descent", we use it as a practical setting for scientific data analysis. Notably, while previous papers on overparameterized regression defined statistical assumptions and constraints in the generative sense, we work for pragmatic reasons purely with sample statistics (e.g. whether two features are "truly" decorrelated (Xu & Hsu, 2019)), we work directly with sample statistics (e.g. whether two feature vectors across a set of data points have a correlation coefficient of 0). For this reason, our theoretical results are distinct from and novel with respect to those prior results. Finally, it is noteworthy that the human visual pathway also exhibits feature expansion rather than feature reduction, from the sensory periphery to higher cortical areas (Wandell, 1995) -this raises the intriguing possibility that the brain has also discovered an overparameterized representation as a universal representation for learning to perform well on many tasks, including novel ones not previously encountered.

