HW-NAS-BENCH: HARDWARE-AWARE NEURAL AR-CHITECTURE SEARCH BENCHMARK

Abstract

HardWare-aware Neural Architecture Search (HW-NAS) has recently gained tremendous attention by automating the design of deep neural networks deployed in more resource-constrained daily life devices. Despite its promising performance, developing optimal HW-NAS solutions can be prohibitively challenging as it requires cross-disciplinary knowledge in the algorithm, micro-architecture, and device-specific compilation. First, to determine the hardware-cost to be incorporated into the NAS process, existing works mostly adopt either pre-collected hardware-cost look-up tables or device-specific hardware-cost models. The former can be time-consuming due to the required knowledge of the device's compilation method and how to set up the measurement pipeline, while building the latter is often a barrier for non-hardware experts like NAS researchers. Both of them limit the development of HW-NAS innovations and impose a barrier-to-entry to non-hardware experts. Second, similar to generic NAS, it can be notoriously difficult to benchmark HW-NAS algorithms due to their significant required computational resources and the differences in adopted search spaces, hyperparameters, and hardware devices. To this end, we develop HW-NAS-Bench, the first public dataset for HW-NAS research which aims to democratize HW-NAS research to non-hardware experts and make HW-NAS research more reproducible and accessible. To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e.g., energy cost and latency) of all the networks in the search spaces of both NAS-Bench-201 and FBNet, on six hardware devices that fall into three categories (i.e., commercial edge devices, FPGA, and ASIC). Furthermore, we provide a comprehensive analysis of the collected measurements in HW-NAS-Bench to provide insights for HW-NAS research. Finally, we demonstrate exemplary user cases to (1) show that HW-NAS-Bench allows non-hardware experts to perform HW-NAS by simply querying our premeasured dataset and (2) verify that dedicated device-specific HW-NAS can indeed lead to optimal accuracy-cost trade-offs.

1. INTRODUCTION

The recent performance breakthroughs of deep neural networks (DNNs) have attracted an explosion of research in designing efficient DNNs, aiming to bring powerful yet power-hungry DNNs into more resource-constrained daily life devices for enabling various DNN-powered intelligent functions (Ross, 2020; Liu et al., 2018b; Shen et al., 2020; You et al., 2020a) . Among them, HardWareaware Neural Architecture Search (HW-NAS) has emerged as one of the most promising techniques as it can automate the process of designing optimal DNN structures for the target applications, each of which often adopts a different hardware device and requires a different hardware-cost metric (e.g., prioritizes latency or energy). For example, HW-NAS in (Wu et al., 2019) develops a differentiable neural architecture search (DNAS) framework and discovers state-of-the-art (SOTA) DNNs balancing both accuracy and hardware efficiency, by incorporating a loss consisting of both the cross-entropy loss that leads to better accuracy and the latency loss that penalizes the network's latency on a target device. Despite the promising performance achieved by SOTA HW-NAS, there exist paramount challenges that limit the development of HW-NAS innovations. First, HW-NAS requires the collection of hardware efficiency data corresponding to (all) the networks in the search space. To do so, current practice either pre-collects these data to construct a hardware-cost look-up table or adopts device-specific hardware-cost estimators/models, both of which can be time-consuming to obtain and impose a barrier-to-entry to non-hardware experts. This is because it requires knowledge about device-specific compilation and properly setting up the hardware measurement pipeline to collect hardware-cost data. Second, similar to generic NAS, it can be notoriously difficult to benchmark HW-NAS algorithms due to the required significant computational resources and the differences in their (1) hardware devices, which are specific for HW-NAS, (2) adopted search spaces, and (3) hyperparameters. Such a difficulty is even higher for HW-NAS considering the numerous choices of hardware devices, each of which can favor very different network structures even under the same target hardware efficiency, as discussed in (Chu et al., 2020) . While the number of floating-point operations (FLOPs) has been commonly used to estimate the hardware-cost, many works have pointed out that DNNs with fewer FLOPs are not necessarily faster or more efficient (Wu et al., 2019; 2018; Wang et al., 2019b) . For example, NasNet-A (Zoph et al., 2018) has a comparable complexity in terms of FLOPs as MobileNetV1 (Howard et al., 2017) , yet can have a larger latency than the latter due to NasNet-A (Zoph et al., 2018)'s adopted hardware-unfriendly structure. It is thus imperative to address the aforementioned challenges in order to make HW-NAS more accessible and reproducible to unfold HW-NAS's full potential. Note that although pioneering NAS benchmark datasets (Ying et al., 2019; Dong & Yang, 2020; Klyuchnikov et al., 2020; Siems et al., 2020; Dong et al., 2020) have made a significant step towards providing a unified benchmark dataset for generic NAS works, all of them either merely provide the latency on server-level GPUs (e.g., GTX 1080Ti) or do not provide any hardware-cost data on real hardware, limiting their applicability to HW-NAS (Wu et al., 2019; Wan et al., 2020; Cai et al., 2018) which primarily targets commercial edge devices, FPGA, and ASIC. To this end, as shown in Figure 1 , we develop HW-NAS-Bench and make the following contributions in this paper: • We have developed HW-NAS-Bench, the first public dataset for HW-NAS research aiming to (1) democratize HW-NAS research to non-hardware experts and (2) facilitate a unified benchmark for HW-NAS to make HW-NAS research more reproducible and accessible, covering two SOTA NAS search spaces including NAS-Bench-201 and FBNet, with the former being one of the most popular NAS search spaces and the latter having been shown to be one of the most hardware friendly NAS search spaces. • We provide hardware-cost data collection pipelines for six commonly used hardware devices that fall into three categories (i.e., commercial edge devices, FPGA, and ASIC), in addition to the measured/estimated hardware-cost (e.g., energy cost and latency) on these devices for all the networks in the search spaces of both NAS-Bench-201 and FBNet. • We conduct comprehensive analysis of the collected data in HW-NAS-Bench, such as studying the correlation between the collected hardware-cost and accuracy-cost data of all



Figure 1: An illustration of our proposed HW-NAS-Bench

availability

The codes and all collected data are available at https://github.com/RICE-EIC/

