FIVE-MINUTE NEURAL ARCHITEC-TURE SEARCH FOR IMAGE CLASSIFICATION, OBJECT-DETECTION, AND SUPER-RESOLUTION

Abstract

Neural network models have become more sophisticated with the explosive development of AI and its applications. Automating the model search process is essential to explore a full range of neural architectures for satisfactory performance. However, most current NAS algorithms consume significant time and computing resources, and many cater only to image classification applications. This paper proposes the total path count (TPC) score, which requires only simple calculation based on the architecture information, as an efficient accuracy predictor. TPC score is not only simple to come by but also very effective. The Kendall rank correlation coefficient of the TPC scores and the accuracies of 20 architectures for the CIFAR100 problem is as high as 0.87. This paper also proposes TPC-NAS, a zero-shot NAS method leveraging the novel TPC score. TPC-NAS requires no training and inference, and can complete a NAS task for Imagenet and other vision applications in less than five CPU minutes. Then, we apply TPC-NAS to image classification, object detection, and super-resolution applications for further validation. In image classification, TPC-NAS finds an architecture that achieves 76.4% top-1 accuracy in ImageNet with 355M FLOPs, outperforming other NAS solutions. Starting with yolov4-p5, TPC-NAS comes up with a highperformance architecture with at least 2% mAP improvement over other NAS algorithms' results in object detection. Finally, in the super-resolution application, TPC-NAS discovers an architecture with fewer than 300K parameters and generates images with 32.09dB PSNR in the Urban100 dataset. These three experiments convince us that the TPC-NAS method can swiftly deliver high-quality CNN architectures in diverse applications. The related source

1. INTRODUCTION

The complexity of high-performance machine learning models has skyrocketed, and manual tuning of hyperparameters and neural network (NN) architecture has become laborious and timeconsuming. More efficient methodologies for the design, training, and deployment of NN models are required. Toward this end, recently, we have witnessed rapid growth in research on neural (network) architecture search (NAS) that automates the model search process. Early NAS algorithms use evolutionary search (Real et al., 2017; 2019) or reinforcement learning (Zoph & Le, 2017; Tan et al., 2019) . However, such methods typically require multiple training of different architectures, which consumes significant computational resources and time. To reduce search time, differentiable NAS employs gradient descent (Mei et al., 2020; Chen et al., 2021b; Xu et al., 2020) that decides which architectures to keep by updating the weights between different operations. DARTS (Liu et al., 2019) , for example, makes the search space continuous by applying a softmax function to all possible operations. After training, only the operations with the highest softmax output will be retained as the final searched model. Later, Wang et al. (2021c) discovered that deciding on the final model based on its contribution to supernet performance outperforms deciding on the final model solely on the softmax output between architectures. Although gradientbased algorithms speed up the search process, they require the construction of a supernet that can cover all search spaces, which typically involves a large amount of memory, making this method unsuitable for large and complex problems. Around the same time as the gradient-descent-based methods were developed, one-shot NAS methods (Guo et al., 2020; Cai et al., 2020; Zela et al., 2020; Stamoulis et al., 2019) were proposed. In contrast to the gradient-descent-based methods that require training the overall supernet once, the one-shot methods typically have two steps: training and searching. The one-shot methods applied the weight-sharing technique in training, thus significantly reducing the number of times the model needs to be trained. Furthermore, the one-shot methods only sample and train one subnet from the supernet at a time. For example, Wang et al. (2021b) samples the model with the best or the worst performance to improve the supernet's overall performance. To ensure that individual models can be trained fairly, Chu et al. (2021b) proposed that all architectures should be sampled equally. Since only one subnet's data are stored at a time, the one-shot methods have better memory usage. During the search process, the one-shot methods set the hardware constraints and select the subnet that achieves the highest performance while meeting those constraints. However, the subnets are interconnected, and it is difficult to ensure that a single subnet can be trained appropriately. Although few-shot methods (Hu et al., 2022; Zhao et al., 2021) effectively mitigate this problem by dividing a large supernet into several smaller sub-supernets, it is still difficult to ensure that the sampled subnet with the highest accuracy will still perform as expected when trained separately. In addition to the shortcomings above, most NAS algorithms share a common flaw: it takes too much time and memory to complete the architecture search. This daunting requirement on computing resources often poses a high barrier to entry for the average NN users. Hence, this paper proposes a novel zero-shot NAS algorithm with an accuracy predictor based on a neural network's total path count (TPC) between the first layer's input nodes and the final layer's output nodes. The more paths a NN model has, the greater the expressive power of the NN model to perform different tasks and achieve higher accuracy. Most importantly, the TPC score is determined solely by the NN structure, and no weight training of models is required, which significantly reduces the search complexity. TPC scores can be computed in as little as 10 microseconds of CPU time and correlate well with the NN architecture performances. As a result, the proposed TPC-NAS method, which uses the TPC score in the standard zero-shot NAS search, can complete a NAS task in a matter of minutes. Many previous NAS researches validated their approaches with a few image classification tasks. Whether or not their proposed NAS solutions will generalize well in other applications remains to be assessed. Furthermore, building and training a supernet from scratch takes enormous effort, limiting the feasibility of a broader range of applications. Toward this end, TPC-NAS has been applied to image classification, object detection, and super-resolution applications and has achieved overwhelming success. In all three applications, the architectures found by TPC-NAS outperform all the manually-designed and most NAS-based architectures with comparable complexity. Our contributions are summarized as follows: 1. We propose the TPC score, a simple yet effective accuracy predictor. This score requires only the knowledge of a model's structure parameters to predict its expressivity and performance; thus, the score computation time is a few microseconds. 2. The TPC score correlates very well with the NN model's accuracy. The TPC-based zero-shot NAS algorithm we propose can be implemented on CPUs or edge devices. Typical search times are within five minutes on CPU. 3. TPC-NAS is the first zero-shot NAS algorithm applied to image classification, object detection, and super-resolution. TPC-NAS can swiftly find architectures outperforming the hand-crafted and NAS-discovered architectures in all three applications. This paper is organized as follows. Section 2 discusses related works in the field of NAS. The principle of our TPC score and the TPC-NAS algorithm are described in detail in Section 3. Section 4 explains how we apply TPC-NAS to image classification, object detection, and super-resolution and shows the experimental results. Then, several issues are discussed in Section 5, and Section 6 concludes this paper.

availability

code is available at https://github.com/TPC-NAS/TPC.

