EFFICIENT ONE-SHOT NEURAL ARCHITECTURE SEARCH WITH PROGRESSIVE CHOICE FREEZING EVOLUTIONARY SEARCH

Abstract

Neural Architecture Search (NAS) is a fast-developing research field to promote automatic machine learning. Among the recently populated NAS methods, oneshot NAS has attracted significant attention since it greatly reduces the training cost compared with the previous NAS methods. In one-shot NAS, the best candidate network architecture is searched within a supernet, which is trained only once. In practice, the searching process involves numerous inference processes for each user case, which causes high overhead in terms of latency and energy consumption. To tackle this problem, we first observe that the choices of the first few blocks that belong to different candidate networks will become similar at the early search stage. Furthermore, these choices are already close to the optimal choices obtained at the end of the search. Leveraging this interesting feature, we propose a progressive choice freezing evolutionary search (PCF-ES) method that gradually freezes block choices for all candidate networks during the searching process. This approach gives us an opportunity to reuse intermediate data produced by the frozen blocks instead of re-computing them. The experiment results show that the proposed PCF-ES provides up to 55% speedup and reduces energy consumption by 51% during the searching stage.

1. INTRODUCTION

Neural Architecture Search (NAS) has been proposed and extensively studied as an efficient tool for designing state-of-the-art neural networks (Elsken et al., 2019; Wistuba et al., 2019; Ren et al., 2020) . NAS approaches automate the architecture design process and can achieve higher accuracy compared to human-designed architectures (Liu et al., 2019; Xie et al., 2019; Cai et al., 2019) . However, the early NAS methods, such as reinforcement NAS (Zoph & Le, 2016) , came with the price of expensive computation costs since every searched architecture needs to be trained from scratch, which makes the total search time unacceptable. To reduce the search cost of earlier NAS methods, the weight sharing technique has been proposed (Yu et al., 2020; Chen et al., 2020) , among which the one-shot NAS method has attracted a lot of attention recently (Bender et al., 2018; Li et al., 2020) . The one-shot NAS method is known as cost-efficient as it requires training a supernet only once. A supernet is a stack of basic blocks, each of which contains multiple choices. A candidate network architecture (defined as subnet) can be formed by selecting one choice for each block in the supernet, and its corresponding weights can be inherited from the supernet. During the architecture searching stage, candidate architectures are evaluated on the validation dataset and the best architecture, i.e., the architecture with the highest validation accuracy, is updated in every searching epoch of Evolutionary Algorithm (EA) (Real et al., 2019) . Surprisingly, although training is commonly deemed as a lengthy and energy-consuming task, the architecture searching stage in one-shot NAS is much more costly (Cai et al., 2020) than training a supernet. The reason is that a new searching stage should be performed whenever a different searching scenario is given, e.g., different hardware constraints, learning tasks, and workloads, while the trained supernet can be reused. Hence, the numerous inferences on the subnets can take a much longer time than training a supernet only once. According to (You et al., 2020) , searching can be 10 GPU days longer than supernet training when 10 different constraints/platforms are required. 1

