VISUAL EXPERTISE AND THE LOG-POLAR TRANS-FORM EXPLAIN IMAGE INVERSION EFFECTS

Abstract

Visual expertise can be defined as the ability to discriminate among subordinatelevel objects in homogeneous classes, such as identities of faces within the class "face". Despite being able to discriminate many faces, subjects perform poorly at recognizing even familiar faces once inverted. This face-inversion effect is in contrast to subjects' performance identifying inverted objects for which their experience is at a basic level, which results in less impairment. Experimental results have suggested that when identifying mono-oriented objects, such as cars, car novices' performance is between that of faces and other objects. We build an anatomicallyinspired neurocomputational model to explore this effect. Our model includes a foveated retina and the log-polar mapping from the visual field to V1. This transformation causes changes in scale to appear as horizontal translations, leading to scale equivariance. Rotation is similarly equivariant, leading to vertical translations. When fed into a standard convolutional network, this provides rotation and scale invariance. It may be surprising that a rotation-invariant network shows any inversion effect at all. This is because there is a crucial topological difference between scale and rotation: Rotational invariance is discontinuous, with V1 ranging from 90°(vertically up) to 270°(vertically down). Hence when a face is inverted, the configural information in the face is disrupted while feature information is relatively unaffected. We show that the inversion effect arises as a result of visual expertise, where configural information becomes relevant as more identities are learned at the subordinate level. Our model matches the classic result: faces suffer more from inversion than mono-oriented objects, which are more disrupted than non-mono-oriented objects when objects are only familiar at a basic level.

1. INTRODUCTION

Since 1969, researchers have been studying the effects of inverting images (Yin, 1969) . Some researchers have focused on defining the bounds of inversion effects: what the measurable effect is for what types of images (Farah et al., 1995; Yin, 1969; Jacques et al., 2007; Rezlescu et al., 2017) . Others looked to explain how inversion effects arise: what part of the brain was active during inversion tasks or what level of experience a participant had with the stimuli in the experiment (Gauthier et al., 2000; Gauthier & Bukach, 2007; Gauthier et al., 2014; Kanwisher et al., 1997; 1998; Richler et al., 2011; Wang et al., 2014) . In Yin (1969) , participants studied a set of images during the training phase, and then they were shown pairs of images in testing and asked to select the image that was in the study set. Trials with upright images and trials with inverted images were compared to determine the inversion effect. Using images of faces resulted in a strong and significant inversion effect -performance was much worse for inverted faces. Images of houses -a mono-oriented category -had a lesser, but still significant effect. Images of airplanes had an insignificant inversion effect. We draw two conclusions from this work: the effects of inversion on performance are greater when images of faces are used as the stimuli and insignificant when images of certain objects (e.g., planes, that are less mono-oriented than houses) are used as the stimuli (Yin, 1969) . The second conclusion is that not all objects produce the same inversion effects. Mono-oriented objects, which are objects that are typically seen in only one orientation such as the houses in Yin's 1969 work, do show an inversion effect, though it is smaller than that of faces (Yin, 1969) .

