LEVERAGING DOUBLE DESCENT FOR SCIENTIFIC DATA ANALYSIS: FACE-BASED SOCIAL BEHAVIOR AS A CASE STUDY

Abstract

Scientific data analysis often involves making use of a large number of correlated predictor variables to predict multiple response variables. Understanding how the predictor and response variables relate to one another, especially in the presence of relatively scarce data, is a common and challenging problem. Here, we leverage the recently popular concept of "double descent" to develop a particular treatment of the problem, including a set of key theoretical results. We also apply the proposed method to a novel experimental dataset consisting of human ratings of social traits and social decision making tendencies based on the facial features of strangers, and resolve a scientific debate regarding the existence of a "beauty premium" or "attractiveness halo," which refers to a (presumed) advantage attractive people enjoy in social situations. We demonstrate that more attractive faces indeed enjoy a social advantage, but this is indirectly due to the facial features that contribute to both perceived attractiveness and trustworthiness, and that the component of attractiveness perception due to facial features (unrelated to trustworthiness) actually elicit a "beauty penalty.". Conversely, the facial features that contribute to trustworthiness and not to attractiveness still contribute positively to pro-social trait perception and decision making. Thus, what was previously thought to be an attractiveness halo/beauty premium is actually a trustworthiness halo/premium plus a "beauty penalty." Moreover, we see that the facial features that contribute to the trustworthiness halo primarily have to do with how smiley a face is, while the facial features that contribute to attractiveness but actually acts as a beauty penalty is related to anti-correlated with age. In other words, youthfulness and smiley-ness both contribute to attractiveness, but only smiley-ness positively contributes to pro-social perception and decision making, while youthfulness actually negatively contribute to them. A further interesting wrinkle is that youthfulness as a whole does not negatively contribute to social traits/decision-making, only the component of youthfulness contributing to attractiveness does.

1. INTRODUCTION

Scientific data analysis often involves building a linear regression model between a large number of predictor variables and multiple response variables. Understanding how the predictor and response variables relate to one another, especially in the presence of relatively scarce data, is an important but challenging problem. For example, a geneticist might have a genomic dataset with many genetic features as predictor variables and disease prevalence data as response variables: the geneticist may want to know how the different types of disease are related to each other through their genetic underpinnings. Another example is that a social psychologist might have a set of face images (with many facial features) that have been rated by a relatively small set of subjects for perceived social traits and social decision making tendencies, and wants to discover how the different social traits and decision making tendencies relate to each other through the underlying facial features. A common problem encountered in these types of problems is that the large number of features relative to the number of data points typically entails some kind of dimensionality reduction and feature selection, and this process needs to be differently parameterized in order to optimize for each response variable, making direct comparison of the features underlying different response variables

