LLBOOST: LAST LAYER PERTURBATION TO BOOST PRE-TRAINED NEURAL NETWORKS

Abstract

While deep networks have produced state-of-the-art results in several domains from image classification to machine translation, hyper-parameter selection remains a significant computational bottleneck. In order to produce the best possible model, practitioners often search across random seeds or use ensemble methods. As models get larger, any method to improve neural network performance that involves re-training becomes intractable. For example, computing the training accuracy of FixResNext-101 (829 million parameters) on ImageNet takes roughly 1 day when using 1 GPU. In this work, we present LLBoost, a theoretically-grounded, computationallyefficient method to boost the validation accuracy of pre-trained overparameterized models without impacting the original training accuracy. LLBoost adjusts the last layer of a neural network by adding a term that is orthogonal to the training feature matrix, which is constructed by applying all layers but the last to the training data. We provide an efficient implementation of LLBoost on the GPU and demonstrate that LLBoost, run using only 1 GPU, improves the test/validation accuracy of pre-trained models on CIFAR10, ImageNet32, and ImageNet. In the over-parameterized linear regression setting, we prove that LLBoost reduces the generalization error of any interpolating solution with high probability without affecting training error.

1. INTRODUCTION

Over the past decade, deep networks have produced a number of state-of-the-art results including surpassing human performance on the ImageNet classification task (26; 14). However, tuning hyperparameters to produce the best possible neural network in a given application remains a computationally expensive task. State-of-the-art results often involve selecting the best model using multiple random seeds (14; 27; 12; 28) or ensembling (15), and training even a single model can take several days even when using multiple GPUs. Hence, it is critical to identify computationally efficient approaches to improve the performance of pre-trained models without the need of re-training them. In this work, we present LLBoost, a theoretically-grounded, computationally-efficient method to boost the validation accuracy of pre-trained, over-parameterized models without impacting the training accuracy. Figure 1 provides an overview of our method as well as the main results. As shown in Figure 1A , our method adjusts the last fully-connected layer of a neural network by selecting the best performing perturbation out of several orthogonal to the training feature matrix, which is constructed by applying all but the last layer to the training data. In Figure 1B , we provide an example showing how our method applied to a model trained under a poorly chosen random seed can boost validation accuracy comparable to that of a model trained under a better seed. Lastly, Figure 1C shows that our method can significantly improve the validation accuracy of pre-trained neural networks on large datasets using a fraction of the training time. The intuition for our method is based on characterizing the benefit of random initialization in overparameterized linear regression. In particular, consider a dataset (X, y) ⊂ R d×n ×R 1×n with n < d for which there exists w * ∈ R 1×d such that y = w * X. In order to estimate w * from the data, we use gradient descent with learning rate η and initialization w (0) to minimize the squared loss, i.e. to solve: arg min w∈R d 1 2 y -wX 2 .

