ZERO-COST PROXIES FOR LIGHTWEIGHT NAS

Abstract

Neural Architecture Search (NAS) is quickly becoming the standard methodology to design neural network models. However, NAS is typically compute-intensive because multiple models need to be evaluated before choosing the best one. To reduce the computational power and time needed, a proxy task is often used for evaluating each model instead of full training. In this paper, we evaluate conventional reduced-training proxies and quantify how well they preserve ranking between neural network models during search when compared with the rankings produced by final trained accuracy. We propose a series of zero-cost proxies, based on recent pruning literature, that use just a single minibatch of training data to compute a model's score. Our zero-cost proxies use 3 orders of magnitude less computation but can match and even outperform conventional proxies. For example, Spearman's rank correlation coefficient between final validation accuracy and our best zero-cost proxy on NAS-Bench-201 is 0.82, compared to 0.61 for EcoNAS (a recently proposed reduced-training proxy). Finally, we use these zerocost proxies to enhance existing NAS search algorithms such as random search, reinforcement learning, evolutionary search and predictor-based search. For all search methodologies and across three different NAS datasets, we are able to significantly improve sample efficiency, and thereby decrease computation, by using our zero-cost proxies. For example on NAS-Bench-101, we achieved the same accuracy 4× quicker than the best previous result.

1. INTRODUCTION

Instead of manually designing neural networks, neural architecture search (NAS) algorithms are used to automatically discover the best ones (Tan & Le, 2019a; Liu et al., 2019; Bender et al., 2018) . Early work by Zoph & Le (2017) proposed using a reinforcement learning (RL) controller that constructs candidate architectures, these are evaluated and then feedback is provided to the controller based on the performance of the candidate. One major problem with this basic NAS methodology is that each evaluation is very costly -typically on the order of hours or days to train a single neural network fully. We focus on this evaluation phase -we propose using proxies that require a single minibatch of data and a single forward/backward propagation pass to score a neural network. This is inspired by recent pruning-at-initialization work by Lee et al. ( 2019 2020) wherein a per-parameter saliency metric is computed before training to inform parameter pruning. Can we use such saliency metrics to score an entire neural network? Furthermore, can we use these "single minibatch" metrics to rank and compare multiple neural networks for use within NAS? If so, how do we best integrate these metrics within existing NAS algorithms such as RL or evolutionary search? These are the questions that we hope to (empirically) tackle in this work with the goal of making NAS less compute-hungry. Our contributions are: • Zero-cost proxies We adapt pruning-at-initialization metrics for use with NAS. This requires these metrics to operate at the granularity of an entire network rather than individual parameters -we devise and validate approaches that aggregate parameter-level metrics in a manner suitable for ranking candidates during NAS search. • Comparison to conventional proxies We perform a detailed comparison between zerocost and conventional NAS proxies that use a form of reduced-computation training. First, we quantify the rank consistency of conventional proxies on large-scale datasets: 15k models vs. 50 models used in (Zhou et al., 2020) . Second, we show that zero-cost proxies can match or exceed the rank consistency of conventional proxies.



), Wang et al. (2020) and Tanaka et al. (

availability

https://github.com/mohsaied

