HOW PREDICTORS AFFECT SEARCH STRATEGIES IN NEURAL ARCHITECTURE SEARCH?

Abstract

Predictor-based Neural Architecture Search is an important topic since it can efficiently reduce the computational cost of evaluating candidate architectures. Most existing predictor-based NAS algorithms aim to design different predictors to improve the prediction performance. Unfortunately, even a promising performance predictor may suffer from the accuracy decline due to long-term and continuous usage, thus leading to the degraded performance of the search strategy. That naturally gives rise to the following problems: how predictors affect search strategies and how to appropriately use the predictor? In this paper, we take reinforcement learning (RL) based search strategy to study theoretically and empirically the impact of predictors on search strategies. We first formulate a predictor-RL-based NAS algorithm as model-based RL and analyze it with a guarantee of monotonic improvement at each trail. Then, based on this analysis, we propose a simple procedure of predictor usage, named mixed batch, which contains ground-truth data and prediction data. The proposed procedure can efficiently reduce the impact of predictor errors on search strategies with maintaining performance growth. Our algorithm, Predictor-based Neural Architecture Search with Mixed batch (PNASM), outperforms traditional NAS algorithms and prior state-of-the-art predictor-based NAS algorithms on three NAS-Bench-201 tasks and one NAS-Bench-ASR task .

1. INTRODUCTION

Neural Architecture Search (NAS) aims to automatically find out effective architectures in a pre-defined search space for a given dataset (Baker et al., 2016; Zoph & Le, 2016) , which has shown to generate architectures that achieve promising results in many domains (Zoph et al., 2018; Tan & Le, 2019; Howard et al., 2019; Chen et al., 2020) . However, due to the high computational cost for evaluating the generated architecture performance, traditional NAS methods are prohibitively costly in real-world deployment. Recently, many approaches have been proposed to reduce the evaluation cost, which can be categorized into training-free predictors (Pham et al., 2018; Mellor et al., 2021) and training-based predictors (Wei et al., 2022; Springenberg et al., 2016; Shi et al., 2020; White et al., 2021a; Lu et al., 2021; Wen et al., 2020; Tang et al., 2020; Luo et al., 2018) . Training-based methods, which require training a performance predictor to predict the final validation accuracy based on the feature of architecture, have received much more attention due to their better generalization ability. Recent efforts on training-based methods focus on improving the prediction performance by designing models to precisely capture features of network architectures, e.g., GCN and Transformer. Several works demonstrate their robust predictions and combine them with the traditional search strategy such as Bayesian Optimization (BO) (Springenberg et al., 2016; Shi et al., 2020; White et al., 2021a) and Evolutionary Algorithms (EA) (Wei et al., 1



Figure 1: Cumulative error between true val and predicted val over sampled architectures. REINFORCE+Predictor means long and continuous usage of predictor without updating it.

