HUMAN ALIGNMENT OF NEURAL NETWORK REPRE-SENTATIONS

Abstract

Today's computer vision models achieve human or near-human level performance across a wide variety of vision tasks. However, their architectures, data, and learning algorithms differ in numerous ways from those that give rise to human vision. In this paper, we investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses. We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses, whereas the training dataset and objective function both have a much larger impact. These findings are consistent across three datasets of human similarity judgments collected using two different tasks. Linear transformations of neural network representations learned from behavioral responses from one dataset substantially improve alignment with human similarity judgments on the other two datasets. In addition, we find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not. Overall, although models trained on larger, more diverse datasets achieve better alignment with humans than models trained on ImageNet alone, our results indicate that scaling alone is unlikely to be sufficient to train neural networks with conceptual representations that match those used by humans.

1. INTRODUCTION

Representation learning is a fundamental part of modern computer vision systems, but the paradigm has its roots in cognitive science. When Rumelhart et al. (1986) developed backpropagation, their goal was to find a method that could learn representations of concepts that are distributed across neurons, similarly to the human brain. The discovery that representations learned by backpropagation could replicate nontrivial aspects of human concept learning was a key factor in its rise to popularity in the late 1980s (Sutherland, 1986; Ng & Hinton, 2017) . A string of empirical successes has since shifted the primary focus of representation learning research away from its similarities to human cognition and toward practical applications. This shift has been fruitful. By some metrics, the best computer vision models now outperform the best individual humans on benchmarks such as ImageNet (Shankar et al., 2020; Beyer et al., 2020; Vasudevan et al., 2022) . As computer vision systems become increasingly widely used outside of research, we would like to know if they see the world in the same way that humans do. However, the extent to which the conceptual representations learned by these systems align with those used by humans remains unclear. Do models that are better at classifying images naturally learn more human-like conceptual representations? Prior work has investigated this question indirectly, by measuring models' error consistency with humans (Geirhos et al., 2018; Rajalingham et al., 2018; Geirhos et al., 2021) and the ability of their representations to predict neural activity in primate brains (Yamins et al., 2014; Güc ¸lü & van Gerven, 2015; Schrimpf et al., 2020) , with mixed results. Networks trained on more data make somewhat more human-like errors (Geirhos et al., 2021) , but do not necessarily obtain a better fit to brain data (Schrimpf et al., 2020) . Here, we approach the question of alignment between human and machine representation spaces more directly. We focus primarily on human similarity judgments

