IMPORTANCE OF CLASS SELECTIVITY IN EARLY EPOCHS OF TRAINING Anonymous

Abstract

Deep networks trained for classification exhibit class-selective neurons in intermediate layers. Intriguingly, recent studies have shown that class-selective neurons are not strictly necessary for network function. But if class-selective neurons are not necessary, why do they exist? We attempt to answer this question in a series of experiments on ResNet-50 trained on ImageNet. We begin by showing that class-selective neurons emerge in the first few epochs of training before receding rapidly. Single-neuron ablation experiments show that class-selective neurons are important for network function during this early phase of training. The network is close to a linear regime during this early training phase, which may explain the emergence of these class-selective neurons in intermediate layers. Finally, by regularizing against class selectivity at different points in training, we show that the emergence of these class-selective neurons during the first few epochs of training is essential to the successful training of the network. Altogether, our results indicate that class-selective neurons in intermediate layers are vestigial remains of early epochs of training, during which they appear as quasi-linear shortcut solutions to the classification task which are essential to the successful training of the network.

1. INTRODUCTION

A significant body of research has attempted to understand the role of single neuron class-selectivity in the function of artificial (Zhou et al., 2015; Radford et al., 2017; Bau et al., 2017; Morcos et al., 2018; Olah et al., 2018; Rafegas et al., 2019; Dalvi et al., 2019; Meyes et al., 2019; Dhamdhere et al., 2019; Leavitt & Morcos, 2020a; Kanda et al., 2020; Leavitt & Morcos, 2020b), and biological (Sherrington, 1906; Adrian, 1926; Granit, 1955; Hubel & Wiesel, 1959; Barlow, 1972) neural networks. Neurons responding selectively to specific classes are typically found throughout networks trained for image classification, even in early and intermediate layers. Interestingly, these classselective neurons can be ablated (i.e. their activation set to 0; Morcos et al. 2018) or class selectivity substantially reduced via regularization Leavitt & Morcos (2020a) with little consequence to overall network accuracy-sometimes even improving it. These findings demonstrate that class selectivity is not necessary for network function, but it remains unknown why class selectivity is learned if it is largely not necessary for network function. One notable limitation of many previous studies examining selectivity is that they have largely overlooked the temporal dimension of neural network training; single unit ablations are performed only at the end of training (Morcos et al., 2018; Amjad et al., 2018; Zhou et al., 2018; Meyes et al., 2019; Kanda et al., 2020) , and selectivity regularization is mostly constant throughout training (Leavitt & Morcos, 2020a; b) . However, there are numerous studies demonstrating substantial differences in training dynamics during the early vs. later phases of neural network training (Sagun et al., 2018; Gur-Ari et al., 2018; Golatkar et al., 2019; Frankle et al., 2020b; Jastrzebski et al., 2020) . Motivated by these studies, we asked a series of questions about the dynamics of class selectivity during training in an attempt to elucidate why neural networks learn class selectivity: When in training do class-selective neurons emerge? Where in networks do class-selective neurons first emerge? Is class selectivity uniformly (ir)relevant for the entirety of training, or are there "critical periods" during which class selectivity impacts later network function? We addressed these questions in experiments conducted in ResNet-50 trained on ImageNet, which led to the following results:

