RESPERFNET: DEEP RESIDUAL LEARNING FOR RE-GRESSIONAL PERFORMANCE MODELING OF DEEP NEURAL NETWORKS

Abstract

The rapid advancements of computing technology facilitate the development of diverse deep learning applications. Unfortunately, the efficiency of parallel computing infrastructures varies widely with neural network models, which hinders the exploration of the design space to find high-performance neural network architectures on specific computing platforms for a given application. To address such a challenge, we propose a deep learning-based method, ResPerfNet, which trains a residual neural network with representative datasets obtained on the target platform to predict the performance for a deep neural network. Our experimental results show that ResPerfNet can accurately predict the execution time of individual neural network layers and full network models on a variety of platforms. In particular, ResPerfNet achieves 8.4% of mean absolute percentage error for LeNet, AlexNet and VGG16 on the NVIDIA GTX 1080Ti, which is substantially lower than the previously published works.

1. INTRODUCTION

Deep learning (DL) has exploded successfully and is applied to many application domains, such as image recognition and object detection Thus, a lot of human experts design high-accuracy neural network architectures for different applications. However, for Internet of Things (IoT) applications, large neural network models cannot fit into resource-constrained devices. On the other hand, a system designer often tries to find a proper computing platform or a deep learning accelerator (DLA) to execute a DL application with acceptable responsiveness. An exhaustive way to optimize the system design is to evaluate the cost and performance of desired DL models on all the available hardware/software options, but it is not only tedious but costly and lengthy in practice. Since DL frameworks and accelerators are evolving rapidly, and even some slight changes could significantly impact the performance of DL applications, it may be necessary to update the performance models frequently. Therefore, we need a systematic and efficient approach to produce accurate performance models when changes occur. While several works (Qi et al.; Justus et al. (2018) ; Wang et al.) have been proposed to estimate the delivered performance of a given DL model on a specific computing platform, so as to rapidly evaluate design alternatives, the estimates from these efforts are not very accurate. For example, the mean absolute percentage error (MAPE) for estimating full neural network models such as LeNet (LeCun et al. 2019)), which use residual neural networks to solve regression problems. The proposed model can be trained with performance data collected from many system configurations to establish a unified performance predictor which assists the users in selecting the DL model, the DL framework, and the DLA for their applications. Extensive experiments have been done to show that our unified approach not only provides more accurate performance estimates than the previous works, but also enables the users to quickly pre-dict the performance of their DL applications executed with various models-framework-accelerator configurations. The contributions of this paper are summarized as follows. • An unified DL-based approach for estimating the computing performance of DL applications on a variety of models-framework-accelerator configurations, which enables the users to explore the hardware/software design space quickly. • A novel deep residual neural architecture is proposed to deliver the most accurate performance predictions that we are aware of. Experimental results confirm that our approach yields lower prediction errors on across various platforms. The remaining of this paper is organized as follows. Section 2 presents the related work. Section 3 describes the architecture of ResPerfNet. Section 4 shows the proposed systematic modeling method. Section 5 elaborates the dataset and training mechanism to train the ResPerfNet models within a reasonable time span. Section 6 evaluates the efficiency of our approach. Section 7 concludes the paper.

2. BACKGROUND AND RELATED WORK

With the rapid evolving of both hardware accelerators and DL models, the performance measure/estimation of the DL models on the DLA platforms is an important task to evaluate the effectiveness of the software/hardware solutions to the given problems. Different approaches have been proposed to serve the purposes. Benchmarking approaches, such as DAWNbench (Coleman et al. ( 2017)) and MLPerf (Reddi et al. ( 2020)), aim at the measurements of the training and inference performance of the machine-learning (ML) models on certain software/hardware combinations. By offering a set of standardized machine learning workloads and the instructions for performance benchmarking, these benchmarks are able to measure how fast a system can perform the training and inference for ML models. Analytical approach, as reported in PALEO (Qi et al.) , constructs the analytical performance model for DL systems. The execution time is decomposed into the total time for the computation and communication parts, which are derived from the utilization of the computing and communication resources on the target hardware, respectively. For instance, the computation time is estimated by dividing the total floating-point operations required by the DL model to the actual processing speed (i.e., the processed floating-point operations per second for the DL model) delivered by the computing hardware. The communication time is calculated by the similar approach.This approach highly relies on the accuracy of the benchmarking results (i.e., to provide the actual processing speed of the target model on the hardware), which requires its users to choose the benchmarks wisely to perfectly match the program characteristics of their target deep learning models, so as to give a proper estimate of the actual processing speed. However, the manual process (of the benchmarks selection) limit its widespread adoption. By aggregating the prediction results for the three phases, their proposed work is able to predict the total execution time of a given DL model. Nevertheless, the MLP network has its own limitation, i.e., it is hard to further enhance its performance since a deeper MLP network will lead to lower prediction accuracy.



(1998)), AlexNet (Krizhevsky et al. (2012)) and VGG16 (Simonyan & Zisserman) on the NVIDIA GTX 1080Ti is as high as 24% in Wang et al., whose accuracy is the best among the previous works, but still has room for improvement. In this paper, we propose a deep residual network architecture, called ResPerfNet, to efficiently and accurately model the performance of DL models running on a wide range of DL frameworks and DLAs. It is based on the residual function approach proposed by (He et al. (2016) and inspired by the prior works Liu & Yang (2018); Jha et al. (2019); Wan et al. (

DNNs for estimating the DL models' performance by learning the relationships between the characteristics of the DL models and the specifications of the accelerating hardware. The following works focus on TensorFlow-based DL models. Justus et al. (2018) use a fully-connected multiple-layer perceptron (MLP) network for performance prediction, using the configurations of the DL model and the specification of the hardware accelerator, and the training data of the DL model as the input features to the MLP network. However, due to the simplified communication time estimation model, where the communications from GPU to CPU for each of the DL layers are counted repeatedly for estimating the communication time, their model tends to provide over-estimated results. Wang et al. use PerfNet (an MLP network) to learn the relationships between the configurations and the execution time of the target DL model. They further decompose the execution of a DL model into three phases, preprocessing, execution, and postprocessing, and train multiple PerfNet network instances, each of which learns the relationships between the model configurations and the model execution time for a specific phase.

